Search the rest columns of pyspark dataframe for values in column12019 Community Moderator ElectionDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasCreating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?Adding a column on row based operations in PySparkSplitting a row in a PySpark Dataframe into multiple rowsCreate a new column based on other columns as indices for another dataframeIn pandas, how to concatenate horizontally and then remove the redundant columnsHow to find row with same value in 2 columns between 2 dataframes but different values in other columns pandasTranspose in python leads to Wrong number of items passed ,placement implies 2How to get mean of the values of one column based on the similarity of the corresponds values in another columns

PTIJ: Who should I vote for? (21st Knesset Edition)

Official degrees of earth’s rotation per day

Why doesn't using two cd commands in bash script execute the second command?

Why do Australian milk farmers need to protest supermarkets' milk price?

SOQL: Populate a Literal List in WHERE IN Clause

Interplanetary conflict, some disease destroys the ability to understand or appreciate music

Why would a flight no longer considered airworthy be redirected like this?

Are all passive ability checks floors for active ability checks?

What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?

It's a yearly task, alright

Life insurance that covers only simultaneous/dual deaths

Can I use USB data pins as power source

What's the meaning of “spike” in the context of “adrenaline spike”?

How to terminate ping <dest> &

How to deal with taxi scam when on vacation?

The difference between「N分で」and「後N分で」

What approach do we need to follow for projects without a test environment?

How could a scammer know the apps on my phone / iTunes account?

Gravity magic - How does it work?

What is the significance behind "40 days" that often appears in the Bible?

Why did it take so long to abandon sail after steamships were demonstrated?

Does someone need to be connected to my network to sniff HTTP requests?

My adviser wants to be the first author

Recruiter wants very extensive technical details about all of my previous work

Search the rest columns of pyspark dataframe for values in column1

2019 Community Moderator ElectionDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasCreating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?Adding a column on row based operations in PySparkSplitting a row in a PySpark Dataframe into multiple rowsCreate a new column based on other columns as indices for another dataframeIn pandas, how to concatenate horizontally and then remove the redundant columnsHow to find row with same value in 2 columns between 2 dataframes but different values in other columns pandasTranspose in python leads to Wrong number of items passed ,placement implies 2How to get mean of the values of one column based on the similarity of the corresponds values in another columns

Suppose there is a pyspark dataframe of the form:

id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9

Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:

In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)

Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique

edited Mar 8 at 2:00

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

add a comment |

Suppose there is a pyspark dataframe of the form:

id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9

Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:

In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)

Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique

edited Mar 8 at 2:00

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

add a comment |

Suppose there is a pyspark dataframe of the form:

id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9

Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:

In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)

Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique

edited Mar 8 at 2:00

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

Suppose there is a pyspark dataframe of the form:

id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9

Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:

In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)

Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique

python search pyspark

edited Mar 8 at 2:00

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

edited Mar 8 at 2:00

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

edited Mar 8 at 2:00

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

asked Mar 6 at 19:49

Taiwo O. Adetiloye

458720

add a comment |

1 Answer
1

active

oldest

votes

Yes, you can leverage the Spark SQL .isin operator.

Let's first create the DataFrame in your example

Part 1- Creating the DataFrame

cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])


test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


df = spark.createDataFrame(test_data,schema=cSchema)

df.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

Part 2 -Function To Search for Matching Values

isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html

def search(col1,col3):
 col1_list = df.select(col1).rdd
 .map(lambda x: x[0]).collect()
 search_results = df[df[col3].isin(col1_list)]
 return search_results

search_results.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!

answered Mar 7 at 2:22

Nadim Younes

252210

Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

– Taiwo O. Adetiloye
Mar 7 at 18:09

I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

– Taiwo O. Adetiloye
Mar 8 at 1:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55031126%2fsearch-the-rest-columns-of-pyspark-dataframe-for-values-in-column1%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Yes, you can leverage the Spark SQL .isin operator.

Let's first create the DataFrame in your example

Part 1- Creating the DataFrame

cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])


test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


df = spark.createDataFrame(test_data,schema=cSchema)

df.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

Part 2 -Function To Search for Matching Values

def search(col1,col3):
 col1_list = df.select(col1).rdd
 .map(lambda x: x[0]).collect()
 search_results = df[df[col3].isin(col1_list)]
 return search_results

search_results.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

answered Mar 7 at 2:22

Nadim Younes

252210

Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

– Taiwo O. Adetiloye
Mar 7 at 18:09

I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

– Taiwo O. Adetiloye
Mar 8 at 1:12

add a comment |

Yes, you can leverage the Spark SQL .isin operator.

Let's first create the DataFrame in your example

Part 1- Creating the DataFrame

cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])


test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


df = spark.createDataFrame(test_data,schema=cSchema)

df.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

Part 2 -Function To Search for Matching Values

def search(col1,col3):
 col1_list = df.select(col1).rdd
 .map(lambda x: x[0]).collect()
 search_results = df[df[col3].isin(col1_list)]
 return search_results

search_results.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

answered Mar 7 at 2:22

Nadim Younes

252210

Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

– Taiwo O. Adetiloye
Mar 7 at 18:09

I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

– Taiwo O. Adetiloye
Mar 8 at 1:12

add a comment |

Yes, you can leverage the Spark SQL .isin operator.

Let's first create the DataFrame in your example

Part 1- Creating the DataFrame

cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])


test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


df = spark.createDataFrame(test_data,schema=cSchema)

df.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

Part 2 -Function To Search for Matching Values

def search(col1,col3):
 col1_list = df.select(col1).rdd
 .map(lambda x: x[0]).collect()
 search_results = df[df[col3].isin(col1_list)]
 return search_results

search_results.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

answered Mar 7 at 2:22

Nadim Younes

252210

Yes, you can leverage the Spark SQL .isin operator.

Let's first create the DataFrame in your example

Part 1- Creating the DataFrame

cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])


test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]


df = spark.createDataFrame(test_data,schema=cSchema)

df.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

Part 2 -Function To Search for Matching Values

def search(col1,col3):
 col1_list = df.select(col1).rdd
 .map(lambda x: x[0]).collect()
 search_results = df[df[col3].isin(col1_list)]
 return search_results

search_results.show()

+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+

answered Mar 7 at 2:22

Nadim Younes

252210

answered Mar 7 at 2:22

Nadim Younes

252210

answered Mar 7 at 2:22

Nadim Younes

252210

answered Mar 7 at 2:22

Nadim Younes

252210

Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

– Taiwo O. Adetiloye
Mar 7 at 18:09

I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

– Taiwo O. Adetiloye
Mar 8 at 1:12

add a comment |

Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

– Taiwo O. Adetiloye
Mar 7 at 18:09

I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

– Taiwo O. Adetiloye
Mar 8 at 1:12

Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through

– Taiwo O. Adetiloye
Mar 7 at 18:09

I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.

– Taiwo O. Adetiloye
Mar 8 at 1:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Гладіатор

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Гладіатор

1 Answer
1

1 Answer
1

1 Answer
1