How to display the particular max row in pyspark dataframesGroupBy column and filter rows with maximum value in PysparkAdd one row to pandas DataFrameHow to change the order of DataFrame columns?How to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasExtract values before and after position in a python listpython port scanner snippet return unexpected resultGroup/Cluster K-Fold CV with SklearnComparing values of two keys in a dictionary and saving the highest value
What is GPS' 19 year rollover and does it present a cybersecurity issue?
Could a US political party gain complete control over the government by removing checks & balances?
Does the radius of the Spirit Guardians spell depend on the size of the caster?
Should I join an office cleaning event for free?
Can I make popcorn with any corn?
How to use Pandas to get the count of every combination inclusive
how to create a data type and make it available in all Databases?
Why is the design of haulage companies so “special”?
What are these boxed doors outside store fronts in New York?
Concept of linear mappings are confusing me
Is Social Media Science Fiction?
Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).
least quadratic residue under GRH: an EXPLICIT bound
Prevent a directory in /tmp from being deleted
Are tax years 2016 & 2017 back taxes deductible for tax year 2018?
Download, install and reboot computer at night if needed
Why is "Reports" in sentence down without "The"
The use of multiple foreign keys on same column in SQL Server
Finding files for which a command fails
Copycat chess is back
I see my dog run
How old can references or sources in a thesis be?
Why do we use polarized capacitor?
Calculus Optimization - Point on graph closest to given point
How to display the particular max row in pyspark dataframes
GroupBy column and filter rows with maximum value in PysparkAdd one row to pandas DataFrameHow to change the order of DataFrame columns?How to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasExtract values before and after position in a python listpython port scanner snippet return unexpected resultGroup/Cluster K-Fold CV with SklearnComparing values of two keys in a dictionary and saving the highest value
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have the following code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()
The above groups the sum of age_specific_birth_rate by Period
So the output will be like
Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|
But I wanna display the maximum among this by Period
so when I type in the follwing code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()
I get the output
> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+
But I wanna get something like
+------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|
What shuold I do ?
Thank you
python pyspark apache-spark-sql
add a comment |
I have the following code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()
The above groups the sum of age_specific_birth_rate by Period
So the output will be like
Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|
But I wanna display the maximum among this by Period
so when I type in the follwing code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()
I get the output
> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+
But I wanna get something like
+------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|
What shuold I do ?
Thank you
python pyspark apache-spark-sql
1
That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.
– MoreFreeze
Mar 8 at 6:23
Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?
– Daniel Sobrado
Mar 8 at 6:25
Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark
– pault
Mar 8 at 14:51
add a comment |
I have the following code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()
The above groups the sum of age_specific_birth_rate by Period
So the output will be like
Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|
But I wanna display the maximum among this by Period
so when I type in the follwing code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()
I get the output
> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+
But I wanna get something like
+------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|
What shuold I do ?
Thank you
python pyspark apache-spark-sql
I have the following code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()
The above groups the sum of age_specific_birth_rate by Period
So the output will be like
Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|
But I wanna display the maximum among this by Period
so when I type in the follwing code
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()
I get the output
> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+
But I wanna get something like
+------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|
What shuold I do ?
Thank you
python pyspark apache-spark-sql
python pyspark apache-spark-sql
edited Mar 8 at 6:31
howie
919920
919920
asked Mar 8 at 6:13
RudyRudy
6
6
1
That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.
– MoreFreeze
Mar 8 at 6:23
Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?
– Daniel Sobrado
Mar 8 at 6:25
Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark
– pault
Mar 8 at 14:51
add a comment |
1
That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.
– MoreFreeze
Mar 8 at 6:23
Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?
– Daniel Sobrado
Mar 8 at 6:25
Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark
– pault
Mar 8 at 14:51
1
1
That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.
– MoreFreeze
Mar 8 at 6:23
That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.
– MoreFreeze
Mar 8 at 6:23
Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?
– Daniel Sobrado
Mar 8 at 6:25
Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?
– Daniel Sobrado
Mar 8 at 6:25
Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark
– pault
Mar 8 at 14:51
Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark
– pault
Mar 8 at 14:51
add a comment |
1 Answer
1
active
oldest
votes
You can try
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.orderBy(functions.col('Total').desc())
.limit(1)
.select('Period', 'Total')
.show()
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057694%2fhow-to-display-the-particular-max-row-in-pyspark-dataframes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can try
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.orderBy(functions.col('Total').desc())
.limit(1)
.select('Period', 'Total')
.show()
add a comment |
You can try
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.orderBy(functions.col('Total').desc())
.limit(1)
.select('Period', 'Total')
.show()
add a comment |
You can try
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.orderBy(functions.col('Total').desc())
.limit(1)
.select('Period', 'Total')
.show()
You can try
ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.orderBy(functions.col('Total').desc())
.limit(1)
.select('Period', 'Total')
.show()
answered Mar 8 at 6:27
howiehowie
919920
919920
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057694%2fhow-to-display-the-particular-max-row-in-pyspark-dataframes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.
– MoreFreeze
Mar 8 at 6:23
Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?
– Daniel Sobrado
Mar 8 at 6:25
Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark
– pault
Mar 8 at 14:51