Filter in PySpark/Python RDDHow can I represent an 'Enum' in Python?Way to create multiline comments in Python?What is the Python 3 equivalent of “python -m SimpleHTTPServer”Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?Filter a large RDD by iterating over another large RDD - pySparkPyspark: Get indexes of an RDD elements from another RDDPyspark filter empty lines from RDD not workingPySpark RDD Filter with “not in” for multiple valuesPyspark unlist rddpyspark filtering list from RDD

What are the steps to solving this definite integral?

What term is being referred to with "reflected-sound-of-underground-spirits"?

'It addicted me, with one taste.' Can 'addict' be used transitively?

Why do games have consumables?

Read line from file and process something

Who was the lone kid in the line of people at the lake at the end of Avengers: Endgame?

Why must Chinese maps be obfuscated?

Can SQL Server create collisions in system generated constraint names?

Why boldmath fails in a tikz node?

What makes accurate emulation of old systems a difficult task?

How can I print the prosodic symbols in LaTeX?

"You've called the wrong number" or "You called the wrong number"

Don’t seats that recline flat defeat the purpose of having seatbelts?

Which big number is bigger?

How do I deal with a coworker that keeps asking to make small superficial changes to a report, and it is seriously triggering my anxiety?

How can I get this effect? Please see the attached image

Initiative: Do I lose my attack/action if my target moves or dies before my turn in combat?

Can't get 5V 3A DC constant

Critique of timeline aesthetic

Contradiction proof for inequality of P and NP?

Rivers without rain

Can an Area of Effect spell cast outside a Prismatic Wall extend inside it?

A Note on N!

Was there a Viking Exchange as well as a Columbian one?

Filter in PySpark/Python RDD

How can I represent an 'Enum' in Python?Way to create multiline comments in Python?What is the Python 3 equivalent of “python -m SimpleHTTPServer”Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?Filter a large RDD by iterating over another large RDD - pySparkPyspark: Get indexes of an RDD elements from another RDDPyspark filter empty lines from RDD not workingPySpark RDD Filter with “not in” for multiple valuesPyspark unlist rddpyspark filtering list from RDD

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

-1

I have a list like this:

["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]

After applying filter I want like this:

["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]

I tried out this way

m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

n = m.map(lambda k:k.split(' '))

o = n.map(lambda s:(s[0]))
o.collect()

['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']

q = n.map(lambda s:s[2])

q.collect()

['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']

edited Mar 10 at 17:29

asked Mar 9 at 8:48

Spark

What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57

add a comment |

-1

I have a list like this:

["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]

After applying filter I want like this:

["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]

I tried out this way

m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

n = m.map(lambda k:k.split(' '))

o = n.map(lambda s:(s[0]))
o.collect()

['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']

q = n.map(lambda s:s[2])

q.collect()

['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']

edited Mar 10 at 17:29

asked Mar 9 at 8:48

Spark

What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57

add a comment |

-1

I have a list like this:

["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]

After applying filter I want like this:

["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]

I tried out this way

m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

n = m.map(lambda k:k.split(' '))

o = n.map(lambda s:(s[0]))
o.collect()

['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']

q = n.map(lambda s:s[2])

q.collect()

['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']

edited Mar 10 at 17:29

asked Mar 9 at 8:48

Spark

I have a list like this:

["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]

After applying filter I want like this:

["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]

I tried out this way

m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

n = m.map(lambda k:k.split(' '))

o = n.map(lambda s:(s[0]))
o.collect()

['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']

q = n.map(lambda s:s[2])

q.collect()

['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']

python-3.x pyspark rdd

edited Mar 10 at 17:29

asked Mar 9 at 8:48

Spark

edited Mar 10 at 17:29

asked Mar 9 at 8:48

Spark

edited Mar 10 at 17:29

asked Mar 9 at 8:48

Spark

asked Mar 9 at 8:48

Spark

asked Mar 9 at 8:48

Spark

What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57

add a comment |

What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57

What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57

add a comment |

1 Answer
1

active

oldest

votes

Provided, all your list items are of same format, one way to achieve this is with map.

rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()

Output:

['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']

answered Mar 9 at 12:06

Jim Todd

944611

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075569%2ffilter-in-pyspark-python-rdd%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Provided, all your list items are of same format, one way to achieve this is with map.

rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()

Output:

['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']

answered Mar 9 at 12:06

Jim Todd

944611

add a comment |

Provided, all your list items are of same format, one way to achieve this is with map.

rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()

Output:

['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']

answered Mar 9 at 12:06

Jim Todd

944611

add a comment |

Provided, all your list items are of same format, one way to achieve this is with map.

rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()

Output:

['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']

answered Mar 9 at 12:06

Jim Todd

944611

Provided, all your list items are of same format, one way to achieve this is with map.

rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()

Output:

['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']

answered Mar 9 at 12:06

Jim Todd

944611

answered Mar 9 at 12:06

Jim Todd

944611

answered Mar 9 at 12:06

Jim Todd

944611

answered Mar 9 at 12:06

Jim Todd

944611

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer
1

1 Answer
1

1 Answer
1