Search the rest columns of pyspark dataframe for values in column12019 Community Moderator ElectionDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasCreating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?Adding a column on row based operations in PySparkSplitting a row in a PySpark Dataframe into multiple rowsCreate a new column based on other columns as indices for another dataframeIn pandas, how to concatenate horizontally and then remove the redundant columnsHow to find row with same value in 2 columns between 2 dataframes but different values in other columns pandasTranspose in python leads to Wrong number of items passed ,placement implies 2How to get mean of the values of one column based on the similarity of the corresponds values in another columns

PTIJ: Who should I vote for? (21st Knesset Edition)

Official degrees of earth’s rotation per day

Why doesn't using two cd commands in bash script execute the second command?

Why do Australian milk farmers need to protest supermarkets' milk price?

SOQL: Populate a Literal List in WHERE IN Clause

Interplanetary conflict, some disease destroys the ability to understand or appreciate music

Why would a flight no longer considered airworthy be redirected like this?

Are all passive ability checks floors for active ability checks?

What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?

It's a yearly task, alright

Life insurance that covers only simultaneous/dual deaths

Can I use USB data pins as power source

What's the meaning of “spike” in the context of “adrenaline spike”?

How to terminate ping <dest> &

How to deal with taxi scam when on vacation?

The difference between「N分で」and「後N分で」

What approach do we need to follow for projects without a test environment?

How could a scammer know the apps on my phone / iTunes account?

Gravity magic - How does it work?

What is the significance behind "40 days" that often appears in the Bible?

Why did it take so long to abandon sail after steamships were demonstrated?

Does someone need to be connected to my network to sniff HTTP requests?

My adviser wants to be the first author

Recruiter wants very extensive technical details about all of my previous work



Search the rest columns of pyspark dataframe for values in column1



2019 Community Moderator ElectionDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasCreating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?Adding a column on row based operations in PySparkSplitting a row in a PySpark Dataframe into multiple rowsCreate a new column based on other columns as indices for another dataframeIn pandas, how to concatenate horizontally and then remove the redundant columnsHow to find row with same value in 2 columns between 2 dataframes but different values in other columns pandasTranspose in python leads to Wrong number of items passed ,placement implies 2How to get mean of the values of one column based on the similarity of the corresponds values in another columns










0















Suppose there is a pyspark dataframe of the form:



id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9


Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:



In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)


Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique










share|improve this question




























    0















    Suppose there is a pyspark dataframe of the form:



    id col1 col2 col3 col4
    ------------------------
    as1 4 10 4 6
    as2 6 3 6 1
    as3 6 0 2 1
    as4 8 8 6 1
    as5 9 6 6 9


    Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
    For instance:



    In col1, 4 is found in (as1, col3)
    In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
    In col1, 8 is found in (as4,col2)
    In col1, 9 is found in (as5,col4)


    Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique










    share|improve this question


























      0












      0








      0








      Suppose there is a pyspark dataframe of the form:



      id col1 col2 col3 col4
      ------------------------
      as1 4 10 4 6
      as2 6 3 6 1
      as3 6 0 2 1
      as4 8 8 6 1
      as5 9 6 6 9


      Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
      For instance:



      In col1, 4 is found in (as1, col3)
      In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
      In col1, 8 is found in (as4,col2)
      In col1, 9 is found in (as5,col4)


      Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique










      share|improve this question
















      Suppose there is a pyspark dataframe of the form:



      id col1 col2 col3 col4
      ------------------------
      as1 4 10 4 6
      as2 6 3 6 1
      as3 6 0 2 1
      as4 8 8 6 1
      as5 9 6 6 9


      Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
      For instance:



      In col1, 4 is found in (as1, col3)
      In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
      In col1, 8 is found in (as4,col2)
      In col1, 9 is found in (as5,col4)


      Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique







      python search pyspark






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 8 at 2:00







      Taiwo O. Adetiloye

















      asked Mar 6 at 19:49









      Taiwo O. AdetiloyeTaiwo O. Adetiloye

      458720




      458720






















          1 Answer
          1






          active

          oldest

          votes


















          1














          Yes, you can leverage the Spark SQL .isin operator.



          Let's first create the DataFrame in your example



          Part 1- Creating the DataFrame



          cSchema = StructType([StructField("id", IntegerType()),
          StructField("col1", IntegerType()),
          StructField("col2", IntegerType()),
          StructField("col3", IntegerType()),
          StructField("col4", IntegerType())])


          test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


          df = spark.createDataFrame(test_data,schema=cSchema)

          df.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 3| 6| 0| 2| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          Part 2 -Function To Search for Matching Values



          isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
          http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html



          def search(col1,col3):
          col1_list = df.select(col1).rdd
          .map(lambda x: x[0]).collect()
          search_results = df[df[col3].isin(col1_list)]
          return search_results

          search_results.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!






          share|improve this answer























          • Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

            – Taiwo O. Adetiloye
            Mar 7 at 18:09











          • I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

            – Taiwo O. Adetiloye
            Mar 8 at 1:12











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55031126%2fsearch-the-rest-columns-of-pyspark-dataframe-for-values-in-column1%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Yes, you can leverage the Spark SQL .isin operator.



          Let's first create the DataFrame in your example



          Part 1- Creating the DataFrame



          cSchema = StructType([StructField("id", IntegerType()),
          StructField("col1", IntegerType()),
          StructField("col2", IntegerType()),
          StructField("col3", IntegerType()),
          StructField("col4", IntegerType())])


          test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


          df = spark.createDataFrame(test_data,schema=cSchema)

          df.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 3| 6| 0| 2| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          Part 2 -Function To Search for Matching Values



          isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
          http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html



          def search(col1,col3):
          col1_list = df.select(col1).rdd
          .map(lambda x: x[0]).collect()
          search_results = df[df[col3].isin(col1_list)]
          return search_results

          search_results.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!






          share|improve this answer























          • Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

            – Taiwo O. Adetiloye
            Mar 7 at 18:09











          • I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

            – Taiwo O. Adetiloye
            Mar 8 at 1:12
















          1














          Yes, you can leverage the Spark SQL .isin operator.



          Let's first create the DataFrame in your example



          Part 1- Creating the DataFrame



          cSchema = StructType([StructField("id", IntegerType()),
          StructField("col1", IntegerType()),
          StructField("col2", IntegerType()),
          StructField("col3", IntegerType()),
          StructField("col4", IntegerType())])


          test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


          df = spark.createDataFrame(test_data,schema=cSchema)

          df.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 3| 6| 0| 2| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          Part 2 -Function To Search for Matching Values



          isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
          http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html



          def search(col1,col3):
          col1_list = df.select(col1).rdd
          .map(lambda x: x[0]).collect()
          search_results = df[df[col3].isin(col1_list)]
          return search_results

          search_results.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!






          share|improve this answer























          • Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

            – Taiwo O. Adetiloye
            Mar 7 at 18:09











          • I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

            – Taiwo O. Adetiloye
            Mar 8 at 1:12














          1












          1








          1







          Yes, you can leverage the Spark SQL .isin operator.



          Let's first create the DataFrame in your example



          Part 1- Creating the DataFrame



          cSchema = StructType([StructField("id", IntegerType()),
          StructField("col1", IntegerType()),
          StructField("col2", IntegerType()),
          StructField("col3", IntegerType()),
          StructField("col4", IntegerType())])


          test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


          df = spark.createDataFrame(test_data,schema=cSchema)

          df.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 3| 6| 0| 2| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          Part 2 -Function To Search for Matching Values



          isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
          http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html



          def search(col1,col3):
          col1_list = df.select(col1).rdd
          .map(lambda x: x[0]).collect()
          search_results = df[df[col3].isin(col1_list)]
          return search_results

          search_results.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!






          share|improve this answer













          Yes, you can leverage the Spark SQL .isin operator.



          Let's first create the DataFrame in your example



          Part 1- Creating the DataFrame



          cSchema = StructType([StructField("id", IntegerType()),
          StructField("col1", IntegerType()),
          StructField("col2", IntegerType()),
          StructField("col3", IntegerType()),
          StructField("col4", IntegerType())])


          test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


          df = spark.createDataFrame(test_data,schema=cSchema)

          df.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 3| 6| 0| 2| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          Part 2 -Function To Search for Matching Values



          isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
          http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html



          def search(col1,col3):
          col1_list = df.select(col1).rdd
          .map(lambda x: x[0]).collect()
          search_results = df[df[col3].isin(col1_list)]
          return search_results

          search_results.show()

          +---+----+----+----+----+
          | id|col1|col2|col3|col4|
          +---+----+----+----+----+
          | 1| 4| 10| 4| 6|
          | 2| 6| 3| 6| 1|
          | 4| 8| 8| 6| 1|
          | 5| 9| 6| 6| 9|
          +---+----+----+----+----+


          This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 7 at 2:22









          Nadim YounesNadim Younes

          252210




          252210












          • Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

            – Taiwo O. Adetiloye
            Mar 7 at 18:09











          • I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

            – Taiwo O. Adetiloye
            Mar 8 at 1:12


















          • Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

            – Taiwo O. Adetiloye
            Mar 7 at 18:09











          • I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

            – Taiwo O. Adetiloye
            Mar 8 at 1:12

















          Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

          – Taiwo O. Adetiloye
          Mar 7 at 18:09





          Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

          – Taiwo O. Adetiloye
          Mar 7 at 18:09













          I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

          – Taiwo O. Adetiloye
          Mar 8 at 1:12






          I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

          – Taiwo O. Adetiloye
          Mar 8 at 1:12




















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55031126%2fsearch-the-rest-columns-of-pyspark-dataframe-for-values-in-column1%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          1928 у кіно

          Захаров Федір Захарович

          Ель Греко