Filter in PySpark/Python RDDHow can I represent an 'Enum' in Python?Way to create multiline comments in Python?What is the Python 3 equivalent of “python -m SimpleHTTPServer”Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?Filter a large RDD by iterating over another large RDD - pySparkPyspark: Get indexes of an RDD elements from another RDDPyspark filter empty lines from RDD not workingPySpark RDD Filter with “not in” for multiple valuesPyspark unlist rddpyspark filtering list from RDD
What are the steps to solving this definite integral?
What term is being referred to with "reflected-sound-of-underground-spirits"?
'It addicted me, with one taste.' Can 'addict' be used transitively?
Why do games have consumables?
Read line from file and process something
Who was the lone kid in the line of people at the lake at the end of Avengers: Endgame?
Why must Chinese maps be obfuscated?
Can SQL Server create collisions in system generated constraint names?
Why boldmath fails in a tikz node?
What makes accurate emulation of old systems a difficult task?
How can I print the prosodic symbols in LaTeX?
"You've called the wrong number" or "You called the wrong number"
Don’t seats that recline flat defeat the purpose of having seatbelts?
Which big number is bigger?
How do I deal with a coworker that keeps asking to make small superficial changes to a report, and it is seriously triggering my anxiety?
How can I get this effect? Please see the attached image
Initiative: Do I lose my attack/action if my target moves or dies before my turn in combat?
Can't get 5V 3A DC constant
Critique of timeline aesthetic
Contradiction proof for inequality of P and NP?
Rivers without rain
Can an Area of Effect spell cast outside a Prismatic Wall extend inside it?
A Note on N!
Was there a Viking Exchange as well as a Columbian one?
Filter in PySpark/Python RDD
How can I represent an 'Enum' in Python?Way to create multiline comments in Python?What is the Python 3 equivalent of “python -m SimpleHTTPServer”Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?Filter a large RDD by iterating over another large RDD - pySparkPyspark: Get indexes of an RDD elements from another RDDPyspark filter empty lines from RDD not workingPySpark RDD Filter with “not in” for multiple valuesPyspark unlist rddpyspark filtering list from RDD
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a list like this:
["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]
After applying filter I want like this:
["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]
I tried out this way
m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
n = m.map(lambda k:k.split(' '))
o = n.map(lambda s:(s[0]))
o.collect()
['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']
q = n.map(lambda s:s[2])
q.collect()
['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']
python-3.x pyspark rdd
add a comment |
I have a list like this:
["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]
After applying filter I want like this:
["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]
I tried out this way
m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
n = m.map(lambda k:k.split(' '))
o = n.map(lambda s:(s[0]))
o.collect()
['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']
q = n.map(lambda s:s[2])
q.collect()
['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']
python-3.x pyspark rdd
What have you tried so far? Please provide Minimal, Complete, and Verifiable Code
– Partho63
Mar 9 at 8:57
add a comment |
I have a list like this:
["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]
After applying filter I want like this:
["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]
I tried out this way
m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
n = m.map(lambda k:k.split(' '))
o = n.map(lambda s:(s[0]))
o.collect()
['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']
q = n.map(lambda s:s[2])
q.collect()
['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']
python-3.x pyspark rdd
I have a list like this:
["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]
After applying filter I want like this:
["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]
I tried out this way
m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
n = m.map(lambda k:k.split(' '))
o = n.map(lambda s:(s[0]))
o.collect()
['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']
q = n.map(lambda s:s[2])
q.collect()
['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']
python-3.x pyspark rdd
python-3.x pyspark rdd
edited Mar 10 at 17:29
Spark
asked Mar 9 at 8:48
SparkSpark
63
63
What have you tried so far? Please provide Minimal, Complete, and Verifiable Code
– Partho63
Mar 9 at 8:57
add a comment |
What have you tried so far? Please provide Minimal, Complete, and Verifiable Code
– Partho63
Mar 9 at 8:57
What have you tried so far? Please provide Minimal, Complete, and Verifiable Code
– Partho63
Mar 9 at 8:57
What have you tried so far? Please provide Minimal, Complete, and Verifiable Code
– Partho63
Mar 9 at 8:57
add a comment |
1 Answer
1
active
oldest
votes
Provided, all your list items are of same format, one way to achieve this is with map.
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
Output:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075569%2ffilter-in-pyspark-python-rdd%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Provided, all your list items are of same format, one way to achieve this is with map.
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
Output:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']
add a comment |
Provided, all your list items are of same format, one way to achieve this is with map.
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
Output:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']
add a comment |
Provided, all your list items are of same format, one way to achieve this is with map.
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
Output:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']
Provided, all your list items are of same format, one way to achieve this is with map.
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
Output:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']
answered Mar 9 at 12:06
Jim ToddJim Todd
944611
944611
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075569%2ffilter-in-pyspark-python-rdd%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What have you tried so far? Please provide Minimal, Complete, and Verifiable Code
– Partho63
Mar 9 at 8:57