How to get cumulative sum of unique IDs with group by? The Next CEO of Stack OverflowHow to merge two dictionaries in a single expression?How do I check if a list is empty?How do I check whether a file exists without exceptions?How can I safely create a nested directory in Python?How to get the current time in PythonHow do I sort a dictionary by value?How to make a chain of function decorators?How to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I list all files of a directory?

"Eavesdropping" vs "Listen in on"

Is there a rule of thumb for determining the amount one should accept for a settlement offer?

How do I secure a TV wall mount?

How can I separate the number from the unit in argument?

Masking layers by a vector polygon layer in QGIS

Oldie but Goldie

Create custom note boxes

How seriously should I take size and weight limits of hand luggage?

Is it possible to create a QR code using text?

Incomplete cube

Simplify trigonometric expression using trigonometric identities

Cannot restore registry to default in Windows 10?

Traveling with my 5 year old daughter (as the father) without the mother from Germany to Mexico

Why did the Drakh emissary look so blurred in S04:E11 "Lines of Communication"?

Why doesn't Shulchan Aruch include the laws of destroying fruit trees?

Man transported from Alternate World into ours by a Neutrino Detector

logical reads on global temp table, but not on session-level temp table

Another proof that dividing by 0 does not exist -- is it right?

Early programmable calculators with RS-232

Car headlights in a world without electricity

Which acid/base does a strong base/acid react when added to a buffer solution?

How does a dynamic QR code work?

Calculate the Mean mean of two numbers

How to find if SQL server backup is encrypted with TDE without restoring the backup



How to get cumulative sum of unique IDs with group by?



The Next CEO of Stack OverflowHow to merge two dictionaries in a single expression?How do I check if a list is empty?How do I check whether a file exists without exceptions?How can I safely create a nested directory in Python?How to get the current time in PythonHow do I sort a dictionary by value?How to make a chain of function decorators?How to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I list all files of a directory?










2















I am very new to python and pandas working on a pandas dataframe which looks like



Date Time ID Weight
Jul-1 12:00 A 10
Jul-1 12:00 B 20
Jul-1 12:00 C 100
Jul-1 12:10 C 100
Jul-1 12:10 D 30
Jul-1 12:20 C 100
Jul-1 12:20 D 30
Jul-1 12:30 A 10
Jul-1 12:40 E 40
Jul-1 12:50 F 50
Jul-1 1:00 A 40


I am trying to achieve group by date, Time and ids and apply cumulative sum such that if an id is present in the next time-slot the weight is only added once(uniquely). The resulting data frame would look like this



Date Time Weight 
Jul-1 12:00 130 (10+20+100)
Jul-1 12:10 160 (10+20+100+30)
Jul-1 12:20 160 (10+20+100+30)
Jul-1 12:30 160 (10+20+100+30)
Jul-1 12:40 200 (10+20+100+30+40)
Jul-1 12:50 250 (10+20+100+30+40+50)
Jul-1 01:00 250 (10+20+100+30+40+50)


This is what I tried below, however this is still counting the weights multiple times:



df=df.groupby(['date','time','ID'])['Wt'].apply(lambda x: x.unique().sum()).reset_index()
df['cumWt']=df['Wt'].cumsum()


Any help would be really appreciated!



Thanks a lot in advance!!










share|improve this question
























  • check groupby and agg

    – Wen-Ben
    Mar 7 at 19:59















2















I am very new to python and pandas working on a pandas dataframe which looks like



Date Time ID Weight
Jul-1 12:00 A 10
Jul-1 12:00 B 20
Jul-1 12:00 C 100
Jul-1 12:10 C 100
Jul-1 12:10 D 30
Jul-1 12:20 C 100
Jul-1 12:20 D 30
Jul-1 12:30 A 10
Jul-1 12:40 E 40
Jul-1 12:50 F 50
Jul-1 1:00 A 40


I am trying to achieve group by date, Time and ids and apply cumulative sum such that if an id is present in the next time-slot the weight is only added once(uniquely). The resulting data frame would look like this



Date Time Weight 
Jul-1 12:00 130 (10+20+100)
Jul-1 12:10 160 (10+20+100+30)
Jul-1 12:20 160 (10+20+100+30)
Jul-1 12:30 160 (10+20+100+30)
Jul-1 12:40 200 (10+20+100+30+40)
Jul-1 12:50 250 (10+20+100+30+40+50)
Jul-1 01:00 250 (10+20+100+30+40+50)


This is what I tried below, however this is still counting the weights multiple times:



df=df.groupby(['date','time','ID'])['Wt'].apply(lambda x: x.unique().sum()).reset_index()
df['cumWt']=df['Wt'].cumsum()


Any help would be really appreciated!



Thanks a lot in advance!!










share|improve this question
























  • check groupby and agg

    – Wen-Ben
    Mar 7 at 19:59













2












2








2


0






I am very new to python and pandas working on a pandas dataframe which looks like



Date Time ID Weight
Jul-1 12:00 A 10
Jul-1 12:00 B 20
Jul-1 12:00 C 100
Jul-1 12:10 C 100
Jul-1 12:10 D 30
Jul-1 12:20 C 100
Jul-1 12:20 D 30
Jul-1 12:30 A 10
Jul-1 12:40 E 40
Jul-1 12:50 F 50
Jul-1 1:00 A 40


I am trying to achieve group by date, Time and ids and apply cumulative sum such that if an id is present in the next time-slot the weight is only added once(uniquely). The resulting data frame would look like this



Date Time Weight 
Jul-1 12:00 130 (10+20+100)
Jul-1 12:10 160 (10+20+100+30)
Jul-1 12:20 160 (10+20+100+30)
Jul-1 12:30 160 (10+20+100+30)
Jul-1 12:40 200 (10+20+100+30+40)
Jul-1 12:50 250 (10+20+100+30+40+50)
Jul-1 01:00 250 (10+20+100+30+40+50)


This is what I tried below, however this is still counting the weights multiple times:



df=df.groupby(['date','time','ID'])['Wt'].apply(lambda x: x.unique().sum()).reset_index()
df['cumWt']=df['Wt'].cumsum()


Any help would be really appreciated!



Thanks a lot in advance!!










share|improve this question
















I am very new to python and pandas working on a pandas dataframe which looks like



Date Time ID Weight
Jul-1 12:00 A 10
Jul-1 12:00 B 20
Jul-1 12:00 C 100
Jul-1 12:10 C 100
Jul-1 12:10 D 30
Jul-1 12:20 C 100
Jul-1 12:20 D 30
Jul-1 12:30 A 10
Jul-1 12:40 E 40
Jul-1 12:50 F 50
Jul-1 1:00 A 40


I am trying to achieve group by date, Time and ids and apply cumulative sum such that if an id is present in the next time-slot the weight is only added once(uniquely). The resulting data frame would look like this



Date Time Weight 
Jul-1 12:00 130 (10+20+100)
Jul-1 12:10 160 (10+20+100+30)
Jul-1 12:20 160 (10+20+100+30)
Jul-1 12:30 160 (10+20+100+30)
Jul-1 12:40 200 (10+20+100+30+40)
Jul-1 12:50 250 (10+20+100+30+40+50)
Jul-1 01:00 250 (10+20+100+30+40+50)


This is what I tried below, however this is still counting the weights multiple times:



df=df.groupby(['date','time','ID'])['Wt'].apply(lambda x: x.unique().sum()).reset_index()
df['cumWt']=df['Wt'].cumsum()


Any help would be really appreciated!



Thanks a lot in advance!!







python pandas data-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 7 at 20:14









petezurich

3,76581936




3,76581936










asked Mar 7 at 19:45









AnalyticsTeamAnalyticsTeam

678




678












  • check groupby and agg

    – Wen-Ben
    Mar 7 at 19:59

















  • check groupby and agg

    – Wen-Ben
    Mar 7 at 19:59
















check groupby and agg

– Wen-Ben
Mar 7 at 19:59





check groupby and agg

– Wen-Ben
Mar 7 at 19:59












1 Answer
1






active

oldest

votes


















1














The code below uses pandas.duplicate(), pandas.merge(), pandas.groupby/sum and pandas.cumsum() to come to the desired output:



# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]


Produces the following output:



enter image description here






share|improve this answer

























  • Thanks a lot @Daniel!

    – AnalyticsTeam
    Mar 7 at 23:37











  • You are welcome, @AnalyticsTeam :)

    – Daniel Labbe
    Mar 8 at 8:47











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55051707%2fhow-to-get-cumulative-sum-of-unique-ids-with-group-by%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The code below uses pandas.duplicate(), pandas.merge(), pandas.groupby/sum and pandas.cumsum() to come to the desired output:



# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]


Produces the following output:



enter image description here






share|improve this answer

























  • Thanks a lot @Daniel!

    – AnalyticsTeam
    Mar 7 at 23:37











  • You are welcome, @AnalyticsTeam :)

    – Daniel Labbe
    Mar 8 at 8:47















1














The code below uses pandas.duplicate(), pandas.merge(), pandas.groupby/sum and pandas.cumsum() to come to the desired output:



# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]


Produces the following output:



enter image description here






share|improve this answer

























  • Thanks a lot @Daniel!

    – AnalyticsTeam
    Mar 7 at 23:37











  • You are welcome, @AnalyticsTeam :)

    – Daniel Labbe
    Mar 8 at 8:47













1












1








1







The code below uses pandas.duplicate(), pandas.merge(), pandas.groupby/sum and pandas.cumsum() to come to the desired output:



# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]


Produces the following output:



enter image description here






share|improve this answer















The code below uses pandas.duplicate(), pandas.merge(), pandas.groupby/sum and pandas.cumsum() to come to the desired output:



# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]


Produces the following output:



enter image description here







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 7 at 21:01

























answered Mar 7 at 20:55









Daniel LabbeDaniel Labbe

1,0241615




1,0241615












  • Thanks a lot @Daniel!

    – AnalyticsTeam
    Mar 7 at 23:37











  • You are welcome, @AnalyticsTeam :)

    – Daniel Labbe
    Mar 8 at 8:47

















  • Thanks a lot @Daniel!

    – AnalyticsTeam
    Mar 7 at 23:37











  • You are welcome, @AnalyticsTeam :)

    – Daniel Labbe
    Mar 8 at 8:47
















Thanks a lot @Daniel!

– AnalyticsTeam
Mar 7 at 23:37





Thanks a lot @Daniel!

– AnalyticsTeam
Mar 7 at 23:37













You are welcome, @AnalyticsTeam :)

– Daniel Labbe
Mar 8 at 8:47





You are welcome, @AnalyticsTeam :)

– Daniel Labbe
Mar 8 at 8:47



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55051707%2fhow-to-get-cumulative-sum-of-unique-ids-with-group-by%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved