How to display the particular max row in pyspark dataframesGroupBy column and filter rows with maximum value in PysparkAdd one row to pandas DataFrameHow to change the order of DataFrame columns?How to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasExtract values before and after position in a python listpython port scanner snippet return unexpected resultGroup/Cluster K-Fold CV with SklearnComparing values of two keys in a dictionary and saving the highest value

What is GPS' 19 year rollover and does it present a cybersecurity issue?

Could a US political party gain complete control over the government by removing checks & balances?

Does the radius of the Spirit Guardians spell depend on the size of the caster?

Should I join an office cleaning event for free?

Can I make popcorn with any corn?

How to use Pandas to get the count of every combination inclusive

how to create a data type and make it available in all Databases?

Why is the design of haulage companies so “special”?

What are these boxed doors outside store fronts in New York?

Concept of linear mappings are confusing me

Is Social Media Science Fiction?

Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).

least quadratic residue under GRH: an EXPLICIT bound

Prevent a directory in /tmp from being deleted

Are tax years 2016 & 2017 back taxes deductible for tax year 2018?

Download, install and reboot computer at night if needed

Why is "Reports" in sentence down without "The"

The use of multiple foreign keys on same column in SQL Server

Finding files for which a command fails

Copycat chess is back

I see my dog run

How old can references or sources in a thesis be?

Why do we use polarized capacitor?

Calculus Optimization - Point on graph closest to given point



How to display the particular max row in pyspark dataframes


GroupBy column and filter rows with maximum value in PysparkAdd one row to pandas DataFrameHow to change the order of DataFrame columns?How to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasExtract values before and after position in a python listpython port scanner snippet return unexpected resultGroup/Cluster K-Fold CV with SklearnComparing values of two keys in a dictionary and saving the highest value






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have the following code



ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()


The above groups the sum of age_specific_birth_rate by Period



So the output will be like



Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|


But I wanna display the maximum among this by Period



so when I type in the follwing code



 ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()


I get the output



> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+


But I wanna get something like



 +------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|


What shuold I do ?



Thank you










share|improve this question



















  • 1





    That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.

    – MoreFreeze
    Mar 8 at 6:23











  • Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?

    – Daniel Sobrado
    Mar 8 at 6:25











  • Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark

    – pault
    Mar 8 at 14:51

















0















I have the following code



ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()


The above groups the sum of age_specific_birth_rate by Period



So the output will be like



Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|


But I wanna display the maximum among this by Period



so when I type in the follwing code



 ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()


I get the output



> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+


But I wanna get something like



 +------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|


What shuold I do ?



Thank you










share|improve this question



















  • 1





    That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.

    – MoreFreeze
    Mar 8 at 6:23











  • Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?

    – Daniel Sobrado
    Mar 8 at 6:25











  • Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark

    – pault
    Mar 8 at 14:51













0












0








0








I have the following code



ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()


The above groups the sum of age_specific_birth_rate by Period



So the output will be like



Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|


But I wanna display the maximum among this by Period



so when I type in the follwing code



 ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()


I get the output



> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+


But I wanna get something like



 +------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|


What shuold I do ?



Thank you










share|improve this question
















I have the following code



ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
.show()


The above groups the sum of age_specific_birth_rate by Period



So the output will be like



Period|Total Births|
+------+------------+
| 2000| 395.5|
| 2001| 393.4|
| 2002| 377.3|
| 2003| 386.2|
| 2004| 395.9|
| 2005| 391.9|
| 2006| 400.4|
| 2007| 434.0|
| 2008| 437.8|
| 2009| 425.7|
| 2010| 434.0|
| 2011| 417.8|
| 2012| 418.2|
| 2013| 400.4|
| 2014| 384.3|
| 2015| 398.7|
| 2016| 374.8|
| 2017| 362.7|
| 2018| 342.2|


But I wanna display the maximum among this by Period



so when I type in the follwing code



 ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.select('Period', 'Total')
.agg(max('Total'))
.show()


I get the output



> +----------+
|max(Total)|
+----------+
| 437.8|
+----------+


But I wanna get something like



 +------+------------+
|Period|max(Total) |
+------+------------+
| 2008| 395.5|


What shuold I do ?



Thank you







python pyspark apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 8 at 6:31









howie

919920




919920










asked Mar 8 at 6:13









RudyRudy

6




6







  • 1





    That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.

    – MoreFreeze
    Mar 8 at 6:23











  • Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?

    – Daniel Sobrado
    Mar 8 at 6:25











  • Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark

    – pault
    Mar 8 at 14:51












  • 1





    That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.

    – MoreFreeze
    Mar 8 at 6:23











  • Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?

    – Daniel Sobrado
    Mar 8 at 6:25











  • Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark

    – pault
    Mar 8 at 14:51







1




1





That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.

– MoreFreeze
Mar 8 at 6:23





That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer.

– MoreFreeze
Mar 8 at 6:23













Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?

– Daniel Sobrado
Mar 8 at 6:25





Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case?

– Daniel Sobrado
Mar 8 at 6:25













Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark

– pault
Mar 8 at 14:51





Possible duplicate of GroupBy column and filter rows with maximum value in Pyspark

– pault
Mar 8 at 14:51












1 Answer
1






active

oldest

votes


















0














You can try



ageDF.sort('Period')
.groupBy('Period')
.agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
.orderBy(functions.col('Total').desc())
.limit(1)
.select('Period', 'Total')
.show()





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057694%2fhow-to-display-the-particular-max-row-in-pyspark-dataframes%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You can try



    ageDF.sort('Period')
    .groupBy('Period')
    .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
    .orderBy(functions.col('Total').desc())
    .limit(1)
    .select('Period', 'Total')
    .show()





    share|improve this answer



























      0














      You can try



      ageDF.sort('Period')
      .groupBy('Period')
      .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
      .orderBy(functions.col('Total').desc())
      .limit(1)
      .select('Period', 'Total')
      .show()





      share|improve this answer

























        0












        0








        0







        You can try



        ageDF.sort('Period')
        .groupBy('Period')
        .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
        .orderBy(functions.col('Total').desc())
        .limit(1)
        .select('Period', 'Total')
        .show()





        share|improve this answer













        You can try



        ageDF.sort('Period')
        .groupBy('Period')
        .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
        .orderBy(functions.col('Total').desc())
        .limit(1)
        .select('Period', 'Total')
        .show()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 8 at 6:27









        howiehowie

        919920




        919920





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55057694%2fhow-to-display-the-particular-max-row-in-pyspark-dataframes%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            1928 у кіно

            Захаров Федір Захарович

            Ель Греко