Search the rest columns of pyspark dataframe for values in column12019 Community Moderator ElectionDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasCreating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?Adding a column on row based operations in PySparkSplitting a row in a PySpark Dataframe into multiple rowsCreate a new column based on other columns as indices for another dataframeIn pandas, how to concatenate horizontally and then remove the redundant columnsHow to find row with same value in 2 columns between 2 dataframes but different values in other columns pandasTranspose in python leads to Wrong number of items passed ,placement implies 2How to get mean of the values of one column based on the similarity of the corresponds values in another columns
PTIJ: Who should I vote for? (21st Knesset Edition)
Official degrees of earth’s rotation per day
Why doesn't using two cd commands in bash script execute the second command?
Why do Australian milk farmers need to protest supermarkets' milk price?
SOQL: Populate a Literal List in WHERE IN Clause
Interplanetary conflict, some disease destroys the ability to understand or appreciate music
Why would a flight no longer considered airworthy be redirected like this?
Are all passive ability checks floors for active ability checks?
What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?
It's a yearly task, alright
Life insurance that covers only simultaneous/dual deaths
Can I use USB data pins as power source
What's the meaning of “spike” in the context of “adrenaline spike”?
How to terminate ping <dest> &
How to deal with taxi scam when on vacation?
The difference between「N分で」and「後N分で」
What approach do we need to follow for projects without a test environment?
How could a scammer know the apps on my phone / iTunes account?
Gravity magic - How does it work?
What is the significance behind "40 days" that often appears in the Bible?
Why did it take so long to abandon sail after steamships were demonstrated?
Does someone need to be connected to my network to sniff HTTP requests?
My adviser wants to be the first author
Recruiter wants very extensive technical details about all of my previous work
Search the rest columns of pyspark dataframe for values in column1
2019 Community Moderator ElectionDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasCreating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?Adding a column on row based operations in PySparkSplitting a row in a PySpark Dataframe into multiple rowsCreate a new column based on other columns as indices for another dataframeIn pandas, how to concatenate horizontally and then remove the redundant columnsHow to find row with same value in 2 columns between 2 dataframes but different values in other columns pandasTranspose in python leads to Wrong number of items passed ,placement implies 2How to get mean of the values of one column based on the similarity of the corresponds values in another columns
Suppose there is a pyspark dataframe of the form:
id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9
Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:
In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)
Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique
python search pyspark
add a comment |
Suppose there is a pyspark dataframe of the form:
id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9
Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:
In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)
Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique
python search pyspark
add a comment |
Suppose there is a pyspark dataframe of the form:
id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9
Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:
In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)
Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique
python search pyspark
Suppose there is a pyspark dataframe of the form:
id col1 col2 col3 col4
------------------------
as1 4 10 4 6
as2 6 3 6 1
as3 6 0 2 1
as4 8 8 6 1
as5 9 6 6 9
Is there a way to search the col 2-4 of the pyspark dataframe for values in col1 and to return the (id row name, column name)?
For instance:
In col1, 4 is found in (as1, col3)
In col1, 6 is found in (as2,col3),(as1,col4),(as4, col3) (as5,col3)
In col1, 8 is found in (as4,col2)
In col1, 9 is found in (as5,col4)
Hint: Assume that col1 will be a set 4,6,8,9 i.e. unique
python search pyspark
python search pyspark
edited Mar 8 at 2:00
Taiwo O. Adetiloye
asked Mar 6 at 19:49
Taiwo O. AdetiloyeTaiwo O. Adetiloye
458720
458720
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Yes, you can leverage the Spark SQL .isin
operator.
Let's first create the DataFrame in your example
Part 1- Creating the DataFrame
cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])
test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]
df = spark.createDataFrame(test_data,schema=cSchema)
df.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
Part 2 -Function To Search for Matching Values
isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html
def search(col1,col3):
col1_list = df.select(col1).rdd
.map(lambda x: x[0]).collect()
search_results = df[df[col3].isin(col1_list)]
return search_results
search_results.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55031126%2fsearch-the-rest-columns-of-pyspark-dataframe-for-values-in-column1%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, you can leverage the Spark SQL .isin
operator.
Let's first create the DataFrame in your example
Part 1- Creating the DataFrame
cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])
test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]
df = spark.createDataFrame(test_data,schema=cSchema)
df.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
Part 2 -Function To Search for Matching Values
isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html
def search(col1,col3):
col1_list = df.select(col1).rdd
.map(lambda x: x[0]).collect()
search_results = df[df[col3].isin(col1_list)]
return search_results
search_results.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
add a comment |
Yes, you can leverage the Spark SQL .isin
operator.
Let's first create the DataFrame in your example
Part 1- Creating the DataFrame
cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])
test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]
df = spark.createDataFrame(test_data,schema=cSchema)
df.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
Part 2 -Function To Search for Matching Values
isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html
def search(col1,col3):
col1_list = df.select(col1).rdd
.map(lambda x: x[0]).collect()
search_results = df[df[col3].isin(col1_list)]
return search_results
search_results.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
add a comment |
Yes, you can leverage the Spark SQL .isin
operator.
Let's first create the DataFrame in your example
Part 1- Creating the DataFrame
cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])
test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]
df = spark.createDataFrame(test_data,schema=cSchema)
df.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
Part 2 -Function To Search for Matching Values
isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html
def search(col1,col3):
col1_list = df.select(col1).rdd
.map(lambda x: x[0]).collect()
search_results = df[df[col3].isin(col1_list)]
return search_results
search_results.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!
Yes, you can leverage the Spark SQL .isin
operator.
Let's first create the DataFrame in your example
Part 1- Creating the DataFrame
cSchema = StructType([StructField("id", IntegerType()),
StructField("col1", IntegerType()),
StructField("col2", IntegerType()),
StructField("col3", IntegerType()),
StructField("col4", IntegerType())])
test_data = [[1,4,10,4,6],[2,6,3,6,1],[3,6,0,2,1],[4,8,8,6,1],[5,9,6,6,9]]
df = spark.createDataFrame(test_data,schema=cSchema)
df.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 3| 6| 0| 2| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
Part 2 -Function To Search for Matching Values
isin: A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html
def search(col1,col3):
col1_list = df.select(col1).rdd
.map(lambda x: x[0]).collect()
search_results = df[df[col3].isin(col1_list)]
return search_results
search_results.show()
+---+----+----+----+----+
| id|col1|col2|col3|col4|
+---+----+----+----+----+
| 1| 4| 10| 4| 6|
| 2| 6| 3| 6| 1|
| 4| 8| 8| 6| 1|
| 5| 9| 6| 6| 9|
+---+----+----+----+----+
This should guide you in the right direction. You can select for just the Id Column etc.. or whatever you are attempting to return. The function can easily be changed to take more columns to search through. Hope this helps!
answered Mar 7 at 2:22
Nadim YounesNadim Younes
252210
252210
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
add a comment |
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
Thanks, Nadim. As you rightly stated it would be good if the function can be changed to take more columns to search through
– Taiwo O. Adetiloye
Mar 7 at 18:09
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
I have actually used the isin() method before. The drawback is it can be used for a one-to-one column match.
– Taiwo O. Adetiloye
Mar 8 at 1:12
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55031126%2fsearch-the-rest-columns-of-pyspark-dataframe-for-values-in-column1%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown