pyspark join two Dataframe and keep row by the recent date2019 Community Moderator ElectionSpark Equivalent of IF Then ELSEHow to convert a DataFrame back to normal RDD in pyspark?Joining two DataFrames from the same sourceHow to join two DataFrames in Scala and Apache Spark?pyspark row number dataframeRetrieve top n in each group of a DataFrame in pysparkCumulate arrays from earlier rows (PySpark dataframe)How to join two DataFrames and update missing values?PySpark adding values to one DataFrame based on columns of 2nd DataFramePySpark join dataframes and merge contents of specific columnsGenerating monthly timestamps between two dates in pyspark dataframe

Is the differential, dp, exact or not?

Is it appropriate to ask a former professor to order a library book for me through ILL?

Boss Telling direct supervisor I snitched

After Brexit, will the EU recognize British passports that are valid for more than ten years?

Interpretation of linear regression interaction term plot

Short story about an infectious indestructible metal bar?

School performs periodic password audits. Is my password compromised?

Why does a car's steering wheel get lighter with increasing speed

Averaging over columns while ignoring zero entries

Propulsion Systems

Are small insurances worth it?

3.5% Interest Student Loan or use all of my savings on Tuition?

Why do we call complex numbers “numbers” but we don’t consider 2-vectors numbers?

What is the purpose of a disclaimer like "this is not legal advice"?

A running toilet that stops itself

Tabular environment - text vertically positions itself by bottom of tikz picture in adjacent cell

Does an unused member variable take up memory?

How to educate team mate to take screenshots for bugs with out unwanted stuff

How to make sure I'm assertive enough in contact with subordinates?

How does a sound wave propagate?

Too soon for a plot twist?

What is the orbit and expected lifetime of Crew Dragon trunk?

Unidentified signals on FT8 frequencies

What would be the most expensive material to an intergalactic society?



pyspark join two Dataframe and keep row by the recent date



2019 Community Moderator ElectionSpark Equivalent of IF Then ELSEHow to convert a DataFrame back to normal RDD in pyspark?Joining two DataFrames from the same sourceHow to join two DataFrames in Scala and Apache Spark?pyspark row number dataframeRetrieve top n in each group of a DataFrame in pysparkCumulate arrays from earlier rows (PySpark dataframe)How to join two DataFrames and update missing values?PySpark adding values to one DataFrame based on columns of 2nd DataFramePySpark join dataframes and merge contents of specific columnsGenerating monthly timestamps between two dates in pyspark dataframe










0

















I have two Dataframes A and B.



A



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 5|2018-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


B



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


and I must create a new Dataframe where the score is updated by looking the date



result



+---+------+-----+----------+
|id |player|score|date |
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+









share|improve this question









New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • what have you tried by far?

    – Alexander Dmitriev
    2 days ago











  • I use A.join(B,'id','player',"outer") but it's not the good way

    – Chemssii
    2 days ago















0

















I have two Dataframes A and B.



A



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 5|2018-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


B



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


and I must create a new Dataframe where the score is updated by looking the date



result



+---+------+-----+----------+
|id |player|score|date |
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+









share|improve this question









New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • what have you tried by far?

    – Alexander Dmitriev
    2 days ago











  • I use A.join(B,'id','player',"outer") but it's not the good way

    – Chemssii
    2 days ago













0












0








0










I have two Dataframes A and B.



A



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 5|2018-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


B



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


and I must create a new Dataframe where the score is updated by looking the date



result



+---+------+-----+----------+
|id |player|score|date |
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+









share|improve this question









New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














I have two Dataframes A and B.



A



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 5|2018-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


B



+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


and I must create a new Dataframe where the score is updated by looking the date



result



+---+------+-----+----------+
|id |player|score|date |
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+






apache-spark pyspark apache-spark-sql






share|improve this question









New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









pault

16k32552




16k32552






New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









ChemssiiChemssii

62




62




New contributor




Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Chemssii is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • what have you tried by far?

    – Alexander Dmitriev
    2 days ago











  • I use A.join(B,'id','player',"outer") but it's not the good way

    – Chemssii
    2 days ago

















  • what have you tried by far?

    – Alexander Dmitriev
    2 days ago











  • I use A.join(B,'id','player',"outer") but it's not the good way

    – Chemssii
    2 days ago
















what have you tried by far?

– Alexander Dmitriev
2 days ago





what have you tried by far?

– Alexander Dmitriev
2 days ago













I use A.join(B,'id','player',"outer") but it's not the good way

– Chemssii
2 days ago





I use A.join(B,'id','player',"outer") but it's not the good way

– Chemssii
2 days ago












2 Answers
2






active

oldest

votes


















1
















You can join the two dataframes, and use pyspark.sql.functions.when() to pick the values for the score and date columns.



from pyspark.sql.functions import col, when

df_A.alias("a").join(df_B.alias("b"), on=["id", "player"], how="inner")
.select(
"id",
"player",
when(
col("b.date") > col("a.date"),
col("b.score")
).otherwise(col("a.score")).alias("score"),
when(
col("b.date") > col("a.date"),
col("b.date")
).otherwise(col("a.date")).alias("date")
)
.show()
#+---+------+-----+----------+
#| id|player|score| date|
#+---+------+-----+----------+
#| 1| alpha| 100|2019-02-13|
#| 2| beta| 6|2018-02-13|
#+---+------+-----+----------+


Read more on when: Spark Equivalent of IF Then ELSE






share|improve this answer























  • This will be more efficient than union, distinct, and grouping.

    – pault
    2 days ago


















0














I am making an assumption that every player is allocated an id and it doesn't change. OP wants that the resulting dataframe should contain the score from the most current date.



# Creating both the DataFrames.
df_A = sqlContext.createDataFrame([(1,'alpha',5,'2018-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
df_A = df_A.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))

df_B = sqlContext.createDataFrame([(1,'alpha',100,'2019-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
df_B = df_B.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))


The idea is to make a union(), of these two dataframes and then take the distinct rows. The reason behind taking distinct rows afterwards is the following - Suppose there was no update for a player, then in the B dataframe, it's corresponding values will be the same as in dataframe A. So, we remove such duplicates.



# Importing the requisite packages.
from pyspark.sql.functions import col, max
from pyspark.sql import Window
df = df_A.union(df_B).distinct()
df.show()
+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 5|2018-02-13|
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+


Now, as a final step, use Window() function to loop over the unioned dataframe df and find the latestDate and filter out only those rows where the date is same as the latestDate. That way, all those rows corresponding to those players will be removed where there was an update (manifested by an updated date in dataframe B).



w = Window.partitionBy('id','player')
df = df.withColumn('latestDate', max('date').over(w))
.where(col('date') == col('latestDate')).drop('latestDate')
df.show()
+---+------+-----+----------+
| id|player|score| date|
+---+------+-----+----------+
| 1| alpha| 100|2019-02-13|
| 2| beta| 6|2018-02-13|
+---+------+-----+----------+





share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Chemssii is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55023505%2fpyspark-join-two-dataframe-and-keep-row-by-the-recent-date%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1
















    You can join the two dataframes, and use pyspark.sql.functions.when() to pick the values for the score and date columns.



    from pyspark.sql.functions import col, when

    df_A.alias("a").join(df_B.alias("b"), on=["id", "player"], how="inner")
    .select(
    "id",
    "player",
    when(
    col("b.date") > col("a.date"),
    col("b.score")
    ).otherwise(col("a.score")).alias("score"),
    when(
    col("b.date") > col("a.date"),
    col("b.date")
    ).otherwise(col("a.date")).alias("date")
    )
    .show()
    #+---+------+-----+----------+
    #| id|player|score| date|
    #+---+------+-----+----------+
    #| 1| alpha| 100|2019-02-13|
    #| 2| beta| 6|2018-02-13|
    #+---+------+-----+----------+


    Read more on when: Spark Equivalent of IF Then ELSE






    share|improve this answer























    • This will be more efficient than union, distinct, and grouping.

      – pault
      2 days ago















    1
















    You can join the two dataframes, and use pyspark.sql.functions.when() to pick the values for the score and date columns.



    from pyspark.sql.functions import col, when

    df_A.alias("a").join(df_B.alias("b"), on=["id", "player"], how="inner")
    .select(
    "id",
    "player",
    when(
    col("b.date") > col("a.date"),
    col("b.score")
    ).otherwise(col("a.score")).alias("score"),
    when(
    col("b.date") > col("a.date"),
    col("b.date")
    ).otherwise(col("a.date")).alias("date")
    )
    .show()
    #+---+------+-----+----------+
    #| id|player|score| date|
    #+---+------+-----+----------+
    #| 1| alpha| 100|2019-02-13|
    #| 2| beta| 6|2018-02-13|
    #+---+------+-----+----------+


    Read more on when: Spark Equivalent of IF Then ELSE






    share|improve this answer























    • This will be more efficient than union, distinct, and grouping.

      – pault
      2 days ago













    1












    1








    1









    You can join the two dataframes, and use pyspark.sql.functions.when() to pick the values for the score and date columns.



    from pyspark.sql.functions import col, when

    df_A.alias("a").join(df_B.alias("b"), on=["id", "player"], how="inner")
    .select(
    "id",
    "player",
    when(
    col("b.date") > col("a.date"),
    col("b.score")
    ).otherwise(col("a.score")).alias("score"),
    when(
    col("b.date") > col("a.date"),
    col("b.date")
    ).otherwise(col("a.date")).alias("date")
    )
    .show()
    #+---+------+-----+----------+
    #| id|player|score| date|
    #+---+------+-----+----------+
    #| 1| alpha| 100|2019-02-13|
    #| 2| beta| 6|2018-02-13|
    #+---+------+-----+----------+


    Read more on when: Spark Equivalent of IF Then ELSE






    share|improve this answer















    You can join the two dataframes, and use pyspark.sql.functions.when() to pick the values for the score and date columns.



    from pyspark.sql.functions import col, when

    df_A.alias("a").join(df_B.alias("b"), on=["id", "player"], how="inner")
    .select(
    "id",
    "player",
    when(
    col("b.date") > col("a.date"),
    col("b.score")
    ).otherwise(col("a.score")).alias("score"),
    when(
    col("b.date") > col("a.date"),
    col("b.date")
    ).otherwise(col("a.date")).alias("date")
    )
    .show()
    #+---+------+-----+----------+
    #| id|player|score| date|
    #+---+------+-----+----------+
    #| 1| alpha| 100|2019-02-13|
    #| 2| beta| 6|2018-02-13|
    #+---+------+-----+----------+


    Read more on when: Spark Equivalent of IF Then ELSE







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 2 days ago









    paultpault

    16k32552




    16k32552












    • This will be more efficient than union, distinct, and grouping.

      – pault
      2 days ago

















    • This will be more efficient than union, distinct, and grouping.

      – pault
      2 days ago
















    This will be more efficient than union, distinct, and grouping.

    – pault
    2 days ago





    This will be more efficient than union, distinct, and grouping.

    – pault
    2 days ago













    0














    I am making an assumption that every player is allocated an id and it doesn't change. OP wants that the resulting dataframe should contain the score from the most current date.



    # Creating both the DataFrames.
    df_A = sqlContext.createDataFrame([(1,'alpha',5,'2018-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
    df_A = df_A.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))

    df_B = sqlContext.createDataFrame([(1,'alpha',100,'2019-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
    df_B = df_B.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))


    The idea is to make a union(), of these two dataframes and then take the distinct rows. The reason behind taking distinct rows afterwards is the following - Suppose there was no update for a player, then in the B dataframe, it's corresponding values will be the same as in dataframe A. So, we remove such duplicates.



    # Importing the requisite packages.
    from pyspark.sql.functions import col, max
    from pyspark.sql import Window
    df = df_A.union(df_B).distinct()
    df.show()
    +---+------+-----+----------+
    | id|player|score| date|
    +---+------+-----+----------+
    | 1| alpha| 5|2018-02-13|
    | 1| alpha| 100|2019-02-13|
    | 2| beta| 6|2018-02-13|
    +---+------+-----+----------+


    Now, as a final step, use Window() function to loop over the unioned dataframe df and find the latestDate and filter out only those rows where the date is same as the latestDate. That way, all those rows corresponding to those players will be removed where there was an update (manifested by an updated date in dataframe B).



    w = Window.partitionBy('id','player')
    df = df.withColumn('latestDate', max('date').over(w))
    .where(col('date') == col('latestDate')).drop('latestDate')
    df.show()
    +---+------+-----+----------+
    | id|player|score| date|
    +---+------+-----+----------+
    | 1| alpha| 100|2019-02-13|
    | 2| beta| 6|2018-02-13|
    +---+------+-----+----------+





    share|improve this answer



























      0














      I am making an assumption that every player is allocated an id and it doesn't change. OP wants that the resulting dataframe should contain the score from the most current date.



      # Creating both the DataFrames.
      df_A = sqlContext.createDataFrame([(1,'alpha',5,'2018-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
      df_A = df_A.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))

      df_B = sqlContext.createDataFrame([(1,'alpha',100,'2019-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
      df_B = df_B.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))


      The idea is to make a union(), of these two dataframes and then take the distinct rows. The reason behind taking distinct rows afterwards is the following - Suppose there was no update for a player, then in the B dataframe, it's corresponding values will be the same as in dataframe A. So, we remove such duplicates.



      # Importing the requisite packages.
      from pyspark.sql.functions import col, max
      from pyspark.sql import Window
      df = df_A.union(df_B).distinct()
      df.show()
      +---+------+-----+----------+
      | id|player|score| date|
      +---+------+-----+----------+
      | 1| alpha| 5|2018-02-13|
      | 1| alpha| 100|2019-02-13|
      | 2| beta| 6|2018-02-13|
      +---+------+-----+----------+


      Now, as a final step, use Window() function to loop over the unioned dataframe df and find the latestDate and filter out only those rows where the date is same as the latestDate. That way, all those rows corresponding to those players will be removed where there was an update (manifested by an updated date in dataframe B).



      w = Window.partitionBy('id','player')
      df = df.withColumn('latestDate', max('date').over(w))
      .where(col('date') == col('latestDate')).drop('latestDate')
      df.show()
      +---+------+-----+----------+
      | id|player|score| date|
      +---+------+-----+----------+
      | 1| alpha| 100|2019-02-13|
      | 2| beta| 6|2018-02-13|
      +---+------+-----+----------+





      share|improve this answer

























        0












        0








        0







        I am making an assumption that every player is allocated an id and it doesn't change. OP wants that the resulting dataframe should contain the score from the most current date.



        # Creating both the DataFrames.
        df_A = sqlContext.createDataFrame([(1,'alpha',5,'2018-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
        df_A = df_A.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))

        df_B = sqlContext.createDataFrame([(1,'alpha',100,'2019-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
        df_B = df_B.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))


        The idea is to make a union(), of these two dataframes and then take the distinct rows. The reason behind taking distinct rows afterwards is the following - Suppose there was no update for a player, then in the B dataframe, it's corresponding values will be the same as in dataframe A. So, we remove such duplicates.



        # Importing the requisite packages.
        from pyspark.sql.functions import col, max
        from pyspark.sql import Window
        df = df_A.union(df_B).distinct()
        df.show()
        +---+------+-----+----------+
        | id|player|score| date|
        +---+------+-----+----------+
        | 1| alpha| 5|2018-02-13|
        | 1| alpha| 100|2019-02-13|
        | 2| beta| 6|2018-02-13|
        +---+------+-----+----------+


        Now, as a final step, use Window() function to loop over the unioned dataframe df and find the latestDate and filter out only those rows where the date is same as the latestDate. That way, all those rows corresponding to those players will be removed where there was an update (manifested by an updated date in dataframe B).



        w = Window.partitionBy('id','player')
        df = df.withColumn('latestDate', max('date').over(w))
        .where(col('date') == col('latestDate')).drop('latestDate')
        df.show()
        +---+------+-----+----------+
        | id|player|score| date|
        +---+------+-----+----------+
        | 1| alpha| 100|2019-02-13|
        | 2| beta| 6|2018-02-13|
        +---+------+-----+----------+





        share|improve this answer













        I am making an assumption that every player is allocated an id and it doesn't change. OP wants that the resulting dataframe should contain the score from the most current date.



        # Creating both the DataFrames.
        df_A = sqlContext.createDataFrame([(1,'alpha',5,'2018-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
        df_A = df_A.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))

        df_B = sqlContext.createDataFrame([(1,'alpha',100,'2019-02-13'),(2,'beta',6,'2018-02-13')],('id','player','score','date'))
        df_B = df_B.withColumn('date',to_date(col('date'), 'yyyy-MM-dd'))


        The idea is to make a union(), of these two dataframes and then take the distinct rows. The reason behind taking distinct rows afterwards is the following - Suppose there was no update for a player, then in the B dataframe, it's corresponding values will be the same as in dataframe A. So, we remove such duplicates.



        # Importing the requisite packages.
        from pyspark.sql.functions import col, max
        from pyspark.sql import Window
        df = df_A.union(df_B).distinct()
        df.show()
        +---+------+-----+----------+
        | id|player|score| date|
        +---+------+-----+----------+
        | 1| alpha| 5|2018-02-13|
        | 1| alpha| 100|2019-02-13|
        | 2| beta| 6|2018-02-13|
        +---+------+-----+----------+


        Now, as a final step, use Window() function to loop over the unioned dataframe df and find the latestDate and filter out only those rows where the date is same as the latestDate. That way, all those rows corresponding to those players will be removed where there was an update (manifested by an updated date in dataframe B).



        w = Window.partitionBy('id','player')
        df = df.withColumn('latestDate', max('date').over(w))
        .where(col('date') == col('latestDate')).drop('latestDate')
        df.show()
        +---+------+-----+----------+
        | id|player|score| date|
        +---+------+-----+----------+
        | 1| alpha| 100|2019-02-13|
        | 2| beta| 6|2018-02-13|
        +---+------+-----+----------+






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        cph_stocph_sto

        2,3012422




        2,3012422




















            Chemssii is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            Chemssii is a new contributor. Be nice, and check out our Code of Conduct.












            Chemssii is a new contributor. Be nice, and check out our Code of Conduct.











            Chemssii is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55023505%2fpyspark-join-two-dataframe-and-keep-row-by-the-recent-date%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            1928 у кіно

            Захаров Федір Захарович

            Ель Греко