Filter in PySpark/Python RDDHow can I represent an 'Enum' in Python?Way to create multiline comments in Python?What is the Python 3 equivalent of “python -m SimpleHTTPServer”Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?Filter a large RDD by iterating over another large RDD - pySparkPyspark: Get indexes of an RDD elements from another RDDPyspark filter empty lines from RDD not workingPySpark RDD Filter with “not in” for multiple valuesPyspark unlist rddpyspark filtering list from RDD

What are the steps to solving this definite integral?

What term is being referred to with "reflected-sound-of-underground-spirits"?

'It addicted me, with one taste.' Can 'addict' be used transitively?

Why do games have consumables?

Read line from file and process something

Who was the lone kid in the line of people at the lake at the end of Avengers: Endgame?

Why must Chinese maps be obfuscated?

Can SQL Server create collisions in system generated constraint names?

Why boldmath fails in a tikz node?

What makes accurate emulation of old systems a difficult task?

How can I print the prosodic symbols in LaTeX?

"You've called the wrong number" or "You called the wrong number"

Don’t seats that recline flat defeat the purpose of having seatbelts?

Which big number is bigger?

How do I deal with a coworker that keeps asking to make small superficial changes to a report, and it is seriously triggering my anxiety?

How can I get this effect? Please see the attached image

Initiative: Do I lose my attack/action if my target moves or dies before my turn in combat?

Can't get 5V 3A DC constant

Critique of timeline aesthetic

Contradiction proof for inequality of P and NP?

Rivers without rain

Can an Area of Effect spell cast outside a Prismatic Wall extend inside it?

A ​Note ​on ​N!

Was there a Viking Exchange as well as a Columbian one?



Filter in PySpark/Python RDD


How can I represent an 'Enum' in Python?Way to create multiline comments in Python?What is the Python 3 equivalent of “python -m SimpleHTTPServer”Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?Filter a large RDD by iterating over another large RDD - pySparkPyspark: Get indexes of an RDD elements from another RDDPyspark filter empty lines from RDD not workingPySpark RDD Filter with “not in” for multiple valuesPyspark unlist rddpyspark filtering list from RDD






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








-1















I have a list like this:



["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]


After applying filter I want like this:



["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]


I tried out this way



m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])



n = m.map(lambda k:k.split(' '))



o = n.map(lambda s:(s[0]))
o.collect()



['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']



q = n.map(lambda s:s[2])



q.collect()



['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']










share|improve this question
























  • What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

    – Partho63
    Mar 9 at 8:57

















-1















I have a list like this:



["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]


After applying filter I want like this:



["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]


I tried out this way



m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])



n = m.map(lambda k:k.split(' '))



o = n.map(lambda s:(s[0]))
o.collect()



['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']



q = n.map(lambda s:s[2])



q.collect()



['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']










share|improve this question
























  • What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

    – Partho63
    Mar 9 at 8:57













-1












-1








-1








I have a list like this:



["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]


After applying filter I want like this:



["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]


I tried out this way



m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])



n = m.map(lambda k:k.split(' '))



o = n.map(lambda s:(s[0]))
o.collect()



['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']



q = n.map(lambda s:s[2])



q.collect()



['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']










share|improve this question
















I have a list like this:



["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]


After applying filter I want like this:



["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]


I tried out this way



m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])



n = m.map(lambda k:k.split(' '))



o = n.map(lambda s:(s[0]))
o.collect()



['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']



q = n.map(lambda s:s[2])



q.collect()



['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']







python-3.x pyspark rdd






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 10 at 17:29







Spark

















asked Mar 9 at 8:48









SparkSpark

63




63












  • What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

    – Partho63
    Mar 9 at 8:57

















  • What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

    – Partho63
    Mar 9 at 8:57
















What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57





What have you tried so far? Please provide Minimal, Complete, and Verifiable Code

– Partho63
Mar 9 at 8:57












1 Answer
1






active

oldest

votes


















1














Provided, all your list items are of same format, one way to achieve this is with map.



rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()


Output:



['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075569%2ffilter-in-pyspark-python-rdd%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Provided, all your list items are of same format, one way to achieve this is with map.



    rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

    rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()


    Output:



    ['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']





    share|improve this answer



























      1














      Provided, all your list items are of same format, one way to achieve this is with map.



      rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

      rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()


      Output:



      ['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']





      share|improve this answer

























        1












        1








        1







        Provided, all your list items are of same format, one way to achieve this is with map.



        rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

        rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()


        Output:



        ['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']





        share|improve this answer













        Provided, all your list items are of same format, one way to achieve this is with map.



        rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])

        rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()


        Output:



        ['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 9 at 12:06









        Jim ToddJim Todd

        944611




        944611





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075569%2ffilter-in-pyspark-python-rdd%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            AWS Lex not identifying response if by a variable The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceEnforcing custom enumeration in AWS LEX for slot valuesHow to give response based on user response in Amazon Lex?Intercepting AWS Lambda Response to a AWS Lex QueryLex chat bot error: Reached second execution of fulfillment lambda on the same utteranceamazon lex showing invalid responseLambda response send back to Lex slot?Response card in Amazon lexAmazon Lex - Lambda response return HTML to botHow can I solve 424 (Failed Dependency) (python) obtained from Amazon lex?

            Алба-Юлія

            Захаров Федір Захарович