Pandas: Keep Column, Count, Drop Duplicates2019 Community Moderator ElectionSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
Does splitting a potentially monolithic application into several smaller ones help prevent bugs?
Best approach to update all entries in a list that is paginated?
Is "history" a male-biased word ("his+story")?
Adding an additional "order by" column gives me a much worse plan
How could our ancestors have domesticated a solitary predator?
Built-In Shelves/Bookcases - IKEA vs Built
Make a transparent 448*448 image
Is there an elementary proof that there are infinitely many primes that are *not* completely split in an abelian extension?
Do f-stop and exposure time perfectly cancel?
How do I deal with a powergamer in a game full of beginners in a school club?
Good for you! in Russian
Are the terms "stab" and "staccato" synonyms?
Should I take out a loan for a friend to invest on my behalf?
How strictly should I take "Candidates must be local"?
How to pass a string to a command that expects a file?
Are babies of evil humanoid species inherently evil?
Is it possible to have an Abelian group under two different binary operations but the binary operations are not distributive?
Can someone explain what is being said here in color publishing in the American Mathematical Monthly?
Reverse string, can I make it faster?
Is there a window switcher for GNOME that shows the actual window?
Rejected in 4th interview round citing insufficient years of experience
Force user to remove USB token
Should I tell my boss the work he did was worthless
What are some noteworthy "mic-drop" moments in math?
Pandas: Keep Column, Count, Drop Duplicates
2019 Community Moderator ElectionSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
I'm currently trying to drop duplicates according to two columns, but count the duplicates before they are dropped. I've managed to do this via
df_interactions = df_interactions.groupby(['user_id','item_tag_ids']).size().reset_index()
.rename(columns=0:'interactions')
but this leaves me with
user_id item_tag_ids interactions
0 170 71 1
1 170 325 1
2 170 387 1
3 170 474 1
4 170 526 2
It does what I want with respect to counting, adding as a column and dropping the duplicates but how would I do this with retaining the original structure (plus a new column). Adding more to groupby
changes its behaviour.
Here is the original structure, I only want to group by IDs:
user_id item_tag_ids item_timestamp
0 406225 7271 1483229353
1 406225 1183 1483229350
2 406225 5930 1483229350
3 406225 7162 1483229350
4 406225 7271 1483229350
I would like to have the new item_timestamp
field in the smaller dataframe to contain the first occurring timestamp for that combination.
python pandas
add a comment |
I'm currently trying to drop duplicates according to two columns, but count the duplicates before they are dropped. I've managed to do this via
df_interactions = df_interactions.groupby(['user_id','item_tag_ids']).size().reset_index()
.rename(columns=0:'interactions')
but this leaves me with
user_id item_tag_ids interactions
0 170 71 1
1 170 325 1
2 170 387 1
3 170 474 1
4 170 526 2
It does what I want with respect to counting, adding as a column and dropping the duplicates but how would I do this with retaining the original structure (plus a new column). Adding more to groupby
changes its behaviour.
Here is the original structure, I only want to group by IDs:
user_id item_tag_ids item_timestamp
0 406225 7271 1483229353
1 406225 1183 1483229350
2 406225 5930 1483229350
3 406225 7162 1483229350
4 406225 7271 1483229350
I would like to have the new item_timestamp
field in the smaller dataframe to contain the first occurring timestamp for that combination.
python pandas
1
What was the original structure?
– micric
Mar 6 at 16:27
@micric I'm trying to retain a column,item_timestamp
after duplicate removal. So basically group by these IDs, count the interactions (duplicates before removal), additem_timestamps
after duplicates are removed.
– kuomi
Mar 6 at 16:35
@kuomi understand that we cannot help you if you dont include example of original data beforegroupby
.
– Erfan
Mar 6 at 16:36
From your Original structure what is the expected output?
– Scott Boston
Mar 6 at 16:53
add a comment |
I'm currently trying to drop duplicates according to two columns, but count the duplicates before they are dropped. I've managed to do this via
df_interactions = df_interactions.groupby(['user_id','item_tag_ids']).size().reset_index()
.rename(columns=0:'interactions')
but this leaves me with
user_id item_tag_ids interactions
0 170 71 1
1 170 325 1
2 170 387 1
3 170 474 1
4 170 526 2
It does what I want with respect to counting, adding as a column and dropping the duplicates but how would I do this with retaining the original structure (plus a new column). Adding more to groupby
changes its behaviour.
Here is the original structure, I only want to group by IDs:
user_id item_tag_ids item_timestamp
0 406225 7271 1483229353
1 406225 1183 1483229350
2 406225 5930 1483229350
3 406225 7162 1483229350
4 406225 7271 1483229350
I would like to have the new item_timestamp
field in the smaller dataframe to contain the first occurring timestamp for that combination.
python pandas
I'm currently trying to drop duplicates according to two columns, but count the duplicates before they are dropped. I've managed to do this via
df_interactions = df_interactions.groupby(['user_id','item_tag_ids']).size().reset_index()
.rename(columns=0:'interactions')
but this leaves me with
user_id item_tag_ids interactions
0 170 71 1
1 170 325 1
2 170 387 1
3 170 474 1
4 170 526 2
It does what I want with respect to counting, adding as a column and dropping the duplicates but how would I do this with retaining the original structure (plus a new column). Adding more to groupby
changes its behaviour.
Here is the original structure, I only want to group by IDs:
user_id item_tag_ids item_timestamp
0 406225 7271 1483229353
1 406225 1183 1483229350
2 406225 5930 1483229350
3 406225 7162 1483229350
4 406225 7271 1483229350
I would like to have the new item_timestamp
field in the smaller dataframe to contain the first occurring timestamp for that combination.
python pandas
python pandas
edited Mar 6 at 17:05
kuomi
asked Mar 6 at 16:20
kuomikuomi
978
978
1
What was the original structure?
– micric
Mar 6 at 16:27
@micric I'm trying to retain a column,item_timestamp
after duplicate removal. So basically group by these IDs, count the interactions (duplicates before removal), additem_timestamps
after duplicates are removed.
– kuomi
Mar 6 at 16:35
@kuomi understand that we cannot help you if you dont include example of original data beforegroupby
.
– Erfan
Mar 6 at 16:36
From your Original structure what is the expected output?
– Scott Boston
Mar 6 at 16:53
add a comment |
1
What was the original structure?
– micric
Mar 6 at 16:27
@micric I'm trying to retain a column,item_timestamp
after duplicate removal. So basically group by these IDs, count the interactions (duplicates before removal), additem_timestamps
after duplicates are removed.
– kuomi
Mar 6 at 16:35
@kuomi understand that we cannot help you if you dont include example of original data beforegroupby
.
– Erfan
Mar 6 at 16:36
From your Original structure what is the expected output?
– Scott Boston
Mar 6 at 16:53
1
1
What was the original structure?
– micric
Mar 6 at 16:27
What was the original structure?
– micric
Mar 6 at 16:27
@micric I'm trying to retain a column,
item_timestamp
after duplicate removal. So basically group by these IDs, count the interactions (duplicates before removal), add item_timestamps
after duplicates are removed.– kuomi
Mar 6 at 16:35
@micric I'm trying to retain a column,
item_timestamp
after duplicate removal. So basically group by these IDs, count the interactions (duplicates before removal), add item_timestamps
after duplicates are removed.– kuomi
Mar 6 at 16:35
@kuomi understand that we cannot help you if you dont include example of original data before
groupby
.– Erfan
Mar 6 at 16:36
@kuomi understand that we cannot help you if you dont include example of original data before
groupby
.– Erfan
Mar 6 at 16:36
From your Original structure what is the expected output?
– Scott Boston
Mar 6 at 16:53
From your Original structure what is the expected output?
– Scott Boston
Mar 6 at 16:53
add a comment |
1 Answer
1
active
oldest
votes
You want to use transform
like the following to keep your original data's shape.
And to get a list of the values of all the item_stamps you can use groupby
in combination with agg(list)
# First we create count column with transform
df['count'] = df.groupby(['user_id', 'item_tag_ids']).user_id.transform('size')
# AFter that we merge our groupby with apply list back to our original dataframe
df = df.merge(df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index(),
on=['user_id', 'item_tag_ids'],
how='left',
suffixes=['_1', '']).drop('item_timestamp_1', axis=1)
print(df)
user_id item_tag_ids count item_timestamp
0 406225 7271 2 [1483229353, 1483229350]
1 406225 1183 1 [1483229350]
2 406225 5930 1 [1483229350]
3 406225 7162 1 [1483229350]
4 406225 7271 2 [1483229353, 1483229350]
Explanation of .agg(list)
it aggregates the values of the group to a list like the following:
df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index()
Out[39]:
user_id item_tag_ids item_timestamp
0 406225 1183 [1483229350]
1 406225 5930 [1483229350]
2 406225 7162 [1483229350]
3 406225 7271 [1483229353, 1483229350]
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
I can transform the size but this drops the rest of the columns, I want theitem_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat
– kuomi
Mar 6 at 16:44
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
|
show 3 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55027703%2fpandas-keep-column-count-drop-duplicates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You want to use transform
like the following to keep your original data's shape.
And to get a list of the values of all the item_stamps you can use groupby
in combination with agg(list)
# First we create count column with transform
df['count'] = df.groupby(['user_id', 'item_tag_ids']).user_id.transform('size')
# AFter that we merge our groupby with apply list back to our original dataframe
df = df.merge(df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index(),
on=['user_id', 'item_tag_ids'],
how='left',
suffixes=['_1', '']).drop('item_timestamp_1', axis=1)
print(df)
user_id item_tag_ids count item_timestamp
0 406225 7271 2 [1483229353, 1483229350]
1 406225 1183 1 [1483229350]
2 406225 5930 1 [1483229350]
3 406225 7162 1 [1483229350]
4 406225 7271 2 [1483229353, 1483229350]
Explanation of .agg(list)
it aggregates the values of the group to a list like the following:
df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index()
Out[39]:
user_id item_tag_ids item_timestamp
0 406225 1183 [1483229350]
1 406225 5930 [1483229350]
2 406225 7162 [1483229350]
3 406225 7271 [1483229353, 1483229350]
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
I can transform the size but this drops the rest of the columns, I want theitem_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat
– kuomi
Mar 6 at 16:44
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
|
show 3 more comments
You want to use transform
like the following to keep your original data's shape.
And to get a list of the values of all the item_stamps you can use groupby
in combination with agg(list)
# First we create count column with transform
df['count'] = df.groupby(['user_id', 'item_tag_ids']).user_id.transform('size')
# AFter that we merge our groupby with apply list back to our original dataframe
df = df.merge(df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index(),
on=['user_id', 'item_tag_ids'],
how='left',
suffixes=['_1', '']).drop('item_timestamp_1', axis=1)
print(df)
user_id item_tag_ids count item_timestamp
0 406225 7271 2 [1483229353, 1483229350]
1 406225 1183 1 [1483229350]
2 406225 5930 1 [1483229350]
3 406225 7162 1 [1483229350]
4 406225 7271 2 [1483229353, 1483229350]
Explanation of .agg(list)
it aggregates the values of the group to a list like the following:
df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index()
Out[39]:
user_id item_tag_ids item_timestamp
0 406225 1183 [1483229350]
1 406225 5930 [1483229350]
2 406225 7162 [1483229350]
3 406225 7271 [1483229353, 1483229350]
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
I can transform the size but this drops the rest of the columns, I want theitem_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat
– kuomi
Mar 6 at 16:44
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
|
show 3 more comments
You want to use transform
like the following to keep your original data's shape.
And to get a list of the values of all the item_stamps you can use groupby
in combination with agg(list)
# First we create count column with transform
df['count'] = df.groupby(['user_id', 'item_tag_ids']).user_id.transform('size')
# AFter that we merge our groupby with apply list back to our original dataframe
df = df.merge(df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index(),
on=['user_id', 'item_tag_ids'],
how='left',
suffixes=['_1', '']).drop('item_timestamp_1', axis=1)
print(df)
user_id item_tag_ids count item_timestamp
0 406225 7271 2 [1483229353, 1483229350]
1 406225 1183 1 [1483229350]
2 406225 5930 1 [1483229350]
3 406225 7162 1 [1483229350]
4 406225 7271 2 [1483229353, 1483229350]
Explanation of .agg(list)
it aggregates the values of the group to a list like the following:
df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index()
Out[39]:
user_id item_tag_ids item_timestamp
0 406225 1183 [1483229350]
1 406225 5930 [1483229350]
2 406225 7162 [1483229350]
3 406225 7271 [1483229353, 1483229350]
You want to use transform
like the following to keep your original data's shape.
And to get a list of the values of all the item_stamps you can use groupby
in combination with agg(list)
# First we create count column with transform
df['count'] = df.groupby(['user_id', 'item_tag_ids']).user_id.transform('size')
# AFter that we merge our groupby with apply list back to our original dataframe
df = df.merge(df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index(),
on=['user_id', 'item_tag_ids'],
how='left',
suffixes=['_1', '']).drop('item_timestamp_1', axis=1)
print(df)
user_id item_tag_ids count item_timestamp
0 406225 7271 2 [1483229353, 1483229350]
1 406225 1183 1 [1483229350]
2 406225 5930 1 [1483229350]
3 406225 7162 1 [1483229350]
4 406225 7271 2 [1483229353, 1483229350]
Explanation of .agg(list)
it aggregates the values of the group to a list like the following:
df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index()
Out[39]:
user_id item_tag_ids item_timestamp
0 406225 1183 [1483229350]
1 406225 5930 [1483229350]
2 406225 7162 [1483229350]
3 406225 7271 [1483229353, 1483229350]
edited Mar 6 at 17:07
answered Mar 6 at 16:37
ErfanErfan
890214
890214
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
I can transform the size but this drops the rest of the columns, I want theitem_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat
– kuomi
Mar 6 at 16:44
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
|
show 3 more comments
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
I can transform the size but this drops the rest of the columns, I want theitem_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat
– kuomi
Mar 6 at 16:44
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
Apologies, I've attached the original structure to my question
– kuomi
Mar 6 at 16:39
I can transform the size but this drops the rest of the columns, I want the
item_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat– kuomi
Mar 6 at 16:44
I can transform the size but this drops the rest of the columns, I want the
item_timestamp
to also have their duplicates dropped but if I group by all three, I get a different size of structure as some timestamps repeat– kuomi
Mar 6 at 16:44
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
Editted answer, is this what you want? @kuomi
– Erfan
Mar 6 at 16:47
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
This seems to retain the original structure, but add a count. What I'm looking for is to group by the first two columns and then get the timestamps for what is remaining. The grouping trims my dataframe from 236268 to 31548 so what I'm looking for is the associated timestamps for each index in the new dataframe.
– kuomi
Mar 6 at 16:53
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
Sorry if I wasn't clear, I want a grouping of unique user_id, item_tag_ids combinations but a counter of how many times duplicates appeared. I then want the first occurring timestamps for each of the unique combinations from the original DF
– kuomi
Mar 6 at 16:58
|
show 3 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55027703%2fpandas-keep-column-count-drop-duplicates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What was the original structure?
– micric
Mar 6 at 16:27
@micric I'm trying to retain a column,
item_timestamp
after duplicate removal. So basically group by these IDs, count the interactions (duplicates before removal), additem_timestamps
after duplicates are removed.– kuomi
Mar 6 at 16:35
@kuomi understand that we cannot help you if you dont include example of original data before
groupby
.– Erfan
Mar 6 at 16:36
From your Original structure what is the expected output?
– Scott Boston
Mar 6 at 16:53