pandas: complex filter on rows of DataFrame Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live! Should we burninate the [wrap] tag?Complex Filtering of DataFrameHow to filter column values in the pandas dataframe with certain conditions?Count the number of observations between two datetimesAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

Extract all GPU name, model and GPU ram

Why are Kinder Surprise Eggs illegal in the USA?

Can a USB port passively 'listen only'?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

How to tell that you are a giant?

What does the "x" in "x86" represent?

Resolving to minmaj7

Why do we bend a book to keep it straight?

How to call a function with default parameter through a pointer to function that is the return of another function?

Apollo command module space walk?

Why did the IBM 650 use bi-quinary?

Dating a Former Employee

What LEGO pieces have "real-world" functionality?

Short Story with Cinderella as a Voo-doo Witch

Can a non-EU citizen traveling with me come with me through the EU passport line?

Can an alien society believe that their star system is the universe?

How to react to hostile behavior from a senior developer?

How to run gsettings for another user Ubuntu 18.04.2 LTS

Using et al. for a last / senior author rather than for a first author

What's the meaning of 間時肆拾貳 at a car parking sign

How discoverable are IPv6 addresses and AAAA names by potential attackers?

How to bypass password on Windows XP account?

Single word antonym of "flightless"

Is there a (better) way to access $wpdb results?



pandas: complex filter on rows of DataFrame



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!
Should we burninate the [wrap] tag?Complex Filtering of DataFrameHow to filter column values in the pandas dataframe with certain conditions?Count the number of observations between two datetimesAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








72















I would like to filter rows by a function of each row, e.g.



def f(row):
return sin(row['velocity'])/np.prod(['masses']) > 5

df = pandas.DataFrame(...)
filtered = df[apply_to_all_rows(df, f)]


Or for another more complex, contrived example,



def g(row):
if row['col1'].method1() == 1:
val = row['col1'].method2() / row['col1'].method3(row['col3'], row['col4'])
else:
val = row['col2'].method5(row['col6'])
return np.sin(val)

df = pandas.DataFrame(...)
filtered = df[apply_to_all_rows(df, g)]


How can I do so?










share|improve this question






























    72















    I would like to filter rows by a function of each row, e.g.



    def f(row):
    return sin(row['velocity'])/np.prod(['masses']) > 5

    df = pandas.DataFrame(...)
    filtered = df[apply_to_all_rows(df, f)]


    Or for another more complex, contrived example,



    def g(row):
    if row['col1'].method1() == 1:
    val = row['col1'].method2() / row['col1'].method3(row['col3'], row['col4'])
    else:
    val = row['col2'].method5(row['col6'])
    return np.sin(val)

    df = pandas.DataFrame(...)
    filtered = df[apply_to_all_rows(df, g)]


    How can I do so?










    share|improve this question


























      72












      72








      72


      12






      I would like to filter rows by a function of each row, e.g.



      def f(row):
      return sin(row['velocity'])/np.prod(['masses']) > 5

      df = pandas.DataFrame(...)
      filtered = df[apply_to_all_rows(df, f)]


      Or for another more complex, contrived example,



      def g(row):
      if row['col1'].method1() == 1:
      val = row['col1'].method2() / row['col1'].method3(row['col3'], row['col4'])
      else:
      val = row['col2'].method5(row['col6'])
      return np.sin(val)

      df = pandas.DataFrame(...)
      filtered = df[apply_to_all_rows(df, g)]


      How can I do so?










      share|improve this question
















      I would like to filter rows by a function of each row, e.g.



      def f(row):
      return sin(row['velocity'])/np.prod(['masses']) > 5

      df = pandas.DataFrame(...)
      filtered = df[apply_to_all_rows(df, f)]


      Or for another more complex, contrived example,



      def g(row):
      if row['col1'].method1() == 1:
      val = row['col1'].method2() / row['col1'].method3(row['col3'], row['col4'])
      else:
      val = row['col2'].method5(row['col6'])
      return np.sin(val)

      df = pandas.DataFrame(...)
      filtered = df[apply_to_all_rows(df, g)]


      How can I do so?







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 8 at 17:21









      JJJ

      73611221




      73611221










      asked Jul 10 '12 at 16:56









      duckworthdduckworthd

      6,206114461




      6,206114461






















          5 Answers
          5






          active

          oldest

          votes


















          100














          You can do this using DataFrame.apply, which applies a function along a given axis,



          In [3]: df = pandas.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'])

          In [4]: df
          Out[4]:
          a b c
          0 -0.001968 -1.877945 -1.515674
          1 -0.540628 0.793913 -0.983315
          2 -1.313574 1.946410 0.826350
          3 0.015763 -0.267860 -2.228350
          4 0.563111 1.195459 0.343168

          In [6]: df[df.apply(lambda x: x['b'] > x['c'], axis=1)]
          Out[6]:
          a b c
          1 -0.540628 0.793913 -0.983315
          2 -1.313574 1.946410 0.826350
          3 0.015763 -0.267860 -2.228350
          4 0.563111 1.195459 0.343168





          share|improve this answer


















          • 12





            There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

            – Ted Petrou
            Nov 6 '17 at 17:28











          • @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

            – D. Ror.
            Oct 23 '18 at 19:48


















          11














          Suppose I had a DataFrame as follows:



          In [39]: df
          Out[39]:
          mass1 mass2 velocity
          0 1.461711 -0.404452 0.722502
          1 -2.169377 1.131037 0.232047
          2 0.009450 -0.868753 0.598470
          3 0.602463 0.299249 0.474564
          4 -0.675339 -0.816702 0.799289


          I can use sin and DataFrame.prod to create a boolean mask:



          In [40]: mask = (np.sin(df.velocity) / df.ix[:, 0:2].prod(axis=1)) > 0

          In [41]: mask
          Out[41]:
          0 False
          1 False
          2 False
          3 True
          4 True


          Then use the mask to select from the DataFrame:



          In [42]: df[mask]
          Out[42]:
          mass1 mass2 velocity
          3 0.602463 0.299249 0.474564
          4 -0.675339 -0.816702 0.799289





          share|improve this answer


















          • 2





            actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

            – duckworthd
            Jul 10 '12 at 21:07


















          3














          Specify reduce=True to handle empty DataFrames as well.



          import pandas as pd

          t = pd.DataFrame(columns=['a', 'b'])
          t[t.apply(lambda x: x['a'] > 1, axis=1, reduce=True)]


          https://crosscompute.com/n/jAbsB6OIm6oCCJX9PBIbY5FECFKCClyV/-/apply-custom-filter-on-rows-of-dataframe






          share|improve this answer
































            2














            I canot comment on duckworthd's answer, but it is not perfectly working. It crashes when the dataframe is empty:



            df = pandas.DataFrame(columns=['a', 'b', 'c'])
            df[df.apply(lambda x: x['b'] > x['c'], axis=1)]


            Outputs:



            ValueError: Must pass DataFrame with boolean values only


            To me it looks like a bug in pandas, since is definitively a valid set of boolean values.






            share|improve this answer
































              0














              The best approach I've found is, instead of using reduce=True to avoid errors for empty df (since this arg is deprecated anyway), just check that df size > 0 before applying the filter:



              def my_filter(row):
              if row.columnA == something:
              return True

              return False

              if len(df.index) > 0:
              df[df.apply(my_filter, axis=1)]





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f11418192%2fpandas-complex-filter-on-rows-of-dataframe%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                100














                You can do this using DataFrame.apply, which applies a function along a given axis,



                In [3]: df = pandas.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'])

                In [4]: df
                Out[4]:
                a b c
                0 -0.001968 -1.877945 -1.515674
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168

                In [6]: df[df.apply(lambda x: x['b'] > x['c'], axis=1)]
                Out[6]:
                a b c
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168





                share|improve this answer


















                • 12





                  There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

                  – Ted Petrou
                  Nov 6 '17 at 17:28











                • @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

                  – D. Ror.
                  Oct 23 '18 at 19:48















                100














                You can do this using DataFrame.apply, which applies a function along a given axis,



                In [3]: df = pandas.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'])

                In [4]: df
                Out[4]:
                a b c
                0 -0.001968 -1.877945 -1.515674
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168

                In [6]: df[df.apply(lambda x: x['b'] > x['c'], axis=1)]
                Out[6]:
                a b c
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168





                share|improve this answer


















                • 12





                  There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

                  – Ted Petrou
                  Nov 6 '17 at 17:28











                • @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

                  – D. Ror.
                  Oct 23 '18 at 19:48













                100












                100








                100







                You can do this using DataFrame.apply, which applies a function along a given axis,



                In [3]: df = pandas.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'])

                In [4]: df
                Out[4]:
                a b c
                0 -0.001968 -1.877945 -1.515674
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168

                In [6]: df[df.apply(lambda x: x['b'] > x['c'], axis=1)]
                Out[6]:
                a b c
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168





                share|improve this answer













                You can do this using DataFrame.apply, which applies a function along a given axis,



                In [3]: df = pandas.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'])

                In [4]: df
                Out[4]:
                a b c
                0 -0.001968 -1.877945 -1.515674
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168

                In [6]: df[df.apply(lambda x: x['b'] > x['c'], axis=1)]
                Out[6]:
                a b c
                1 -0.540628 0.793913 -0.983315
                2 -1.313574 1.946410 0.826350
                3 0.015763 -0.267860 -2.228350
                4 0.563111 1.195459 0.343168






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jul 13 '12 at 17:33









                duckworthdduckworthd

                6,206114461




                6,206114461







                • 12





                  There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

                  – Ted Petrou
                  Nov 6 '17 at 17:28











                • @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

                  – D. Ror.
                  Oct 23 '18 at 19:48












                • 12





                  There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

                  – Ted Petrou
                  Nov 6 '17 at 17:28











                • @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

                  – D. Ror.
                  Oct 23 '18 at 19:48







                12




                12





                There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

                – Ted Petrou
                Nov 6 '17 at 17:28





                There is no need for apply in this situation. A regular boolean index will work just fine. df[df['b] > df['c']]. There are very few situations that actually require apply and even few that need it with axis=1

                – Ted Petrou
                Nov 6 '17 at 17:28













                @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

                – D. Ror.
                Oct 23 '18 at 19:48





                @TedPetrou What if your not sure that every element in your dataframe is of the right type. Does a regular boolean index support exception handling?

                – D. Ror.
                Oct 23 '18 at 19:48













                11














                Suppose I had a DataFrame as follows:



                In [39]: df
                Out[39]:
                mass1 mass2 velocity
                0 1.461711 -0.404452 0.722502
                1 -2.169377 1.131037 0.232047
                2 0.009450 -0.868753 0.598470
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289


                I can use sin and DataFrame.prod to create a boolean mask:



                In [40]: mask = (np.sin(df.velocity) / df.ix[:, 0:2].prod(axis=1)) > 0

                In [41]: mask
                Out[41]:
                0 False
                1 False
                2 False
                3 True
                4 True


                Then use the mask to select from the DataFrame:



                In [42]: df[mask]
                Out[42]:
                mass1 mass2 velocity
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289





                share|improve this answer


















                • 2





                  actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

                  – duckworthd
                  Jul 10 '12 at 21:07















                11














                Suppose I had a DataFrame as follows:



                In [39]: df
                Out[39]:
                mass1 mass2 velocity
                0 1.461711 -0.404452 0.722502
                1 -2.169377 1.131037 0.232047
                2 0.009450 -0.868753 0.598470
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289


                I can use sin and DataFrame.prod to create a boolean mask:



                In [40]: mask = (np.sin(df.velocity) / df.ix[:, 0:2].prod(axis=1)) > 0

                In [41]: mask
                Out[41]:
                0 False
                1 False
                2 False
                3 True
                4 True


                Then use the mask to select from the DataFrame:



                In [42]: df[mask]
                Out[42]:
                mass1 mass2 velocity
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289





                share|improve this answer


















                • 2





                  actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

                  – duckworthd
                  Jul 10 '12 at 21:07













                11












                11








                11







                Suppose I had a DataFrame as follows:



                In [39]: df
                Out[39]:
                mass1 mass2 velocity
                0 1.461711 -0.404452 0.722502
                1 -2.169377 1.131037 0.232047
                2 0.009450 -0.868753 0.598470
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289


                I can use sin and DataFrame.prod to create a boolean mask:



                In [40]: mask = (np.sin(df.velocity) / df.ix[:, 0:2].prod(axis=1)) > 0

                In [41]: mask
                Out[41]:
                0 False
                1 False
                2 False
                3 True
                4 True


                Then use the mask to select from the DataFrame:



                In [42]: df[mask]
                Out[42]:
                mass1 mass2 velocity
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289





                share|improve this answer













                Suppose I had a DataFrame as follows:



                In [39]: df
                Out[39]:
                mass1 mass2 velocity
                0 1.461711 -0.404452 0.722502
                1 -2.169377 1.131037 0.232047
                2 0.009450 -0.868753 0.598470
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289


                I can use sin and DataFrame.prod to create a boolean mask:



                In [40]: mask = (np.sin(df.velocity) / df.ix[:, 0:2].prod(axis=1)) > 0

                In [41]: mask
                Out[41]:
                0 False
                1 False
                2 False
                3 True
                4 True


                Then use the mask to select from the DataFrame:



                In [42]: df[mask]
                Out[42]:
                mass1 mass2 velocity
                3 0.602463 0.299249 0.474564
                4 -0.675339 -0.816702 0.799289






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jul 10 '12 at 19:35









                Chang SheChang She

                11.1k33322




                11.1k33322







                • 2





                  actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

                  – duckworthd
                  Jul 10 '12 at 21:07












                • 2





                  actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

                  – duckworthd
                  Jul 10 '12 at 21:07







                2




                2





                actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

                – duckworthd
                Jul 10 '12 at 21:07





                actually, this was probably a bad example: np.sin automatically broadcasts to all elements. What if I replaced it with a less intelligent function that could only handle one input at a time?

                – duckworthd
                Jul 10 '12 at 21:07











                3














                Specify reduce=True to handle empty DataFrames as well.



                import pandas as pd

                t = pd.DataFrame(columns=['a', 'b'])
                t[t.apply(lambda x: x['a'] > 1, axis=1, reduce=True)]


                https://crosscompute.com/n/jAbsB6OIm6oCCJX9PBIbY5FECFKCClyV/-/apply-custom-filter-on-rows-of-dataframe






                share|improve this answer





























                  3














                  Specify reduce=True to handle empty DataFrames as well.



                  import pandas as pd

                  t = pd.DataFrame(columns=['a', 'b'])
                  t[t.apply(lambda x: x['a'] > 1, axis=1, reduce=True)]


                  https://crosscompute.com/n/jAbsB6OIm6oCCJX9PBIbY5FECFKCClyV/-/apply-custom-filter-on-rows-of-dataframe






                  share|improve this answer



























                    3












                    3








                    3







                    Specify reduce=True to handle empty DataFrames as well.



                    import pandas as pd

                    t = pd.DataFrame(columns=['a', 'b'])
                    t[t.apply(lambda x: x['a'] > 1, axis=1, reduce=True)]


                    https://crosscompute.com/n/jAbsB6OIm6oCCJX9PBIbY5FECFKCClyV/-/apply-custom-filter-on-rows-of-dataframe






                    share|improve this answer















                    Specify reduce=True to handle empty DataFrames as well.



                    import pandas as pd

                    t = pd.DataFrame(columns=['a', 'b'])
                    t[t.apply(lambda x: x['a'] > 1, axis=1, reduce=True)]


                    https://crosscompute.com/n/jAbsB6OIm6oCCJX9PBIbY5FECFKCClyV/-/apply-custom-filter-on-rows-of-dataframe







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited May 16 '18 at 21:22

























                    answered Oct 21 '17 at 17:31









                    Roy Hyunjin HanRoy Hyunjin Han

                    3,35621917




                    3,35621917





















                        2














                        I canot comment on duckworthd's answer, but it is not perfectly working. It crashes when the dataframe is empty:



                        df = pandas.DataFrame(columns=['a', 'b', 'c'])
                        df[df.apply(lambda x: x['b'] > x['c'], axis=1)]


                        Outputs:



                        ValueError: Must pass DataFrame with boolean values only


                        To me it looks like a bug in pandas, since is definitively a valid set of boolean values.






                        share|improve this answer





























                          2














                          I canot comment on duckworthd's answer, but it is not perfectly working. It crashes when the dataframe is empty:



                          df = pandas.DataFrame(columns=['a', 'b', 'c'])
                          df[df.apply(lambda x: x['b'] > x['c'], axis=1)]


                          Outputs:



                          ValueError: Must pass DataFrame with boolean values only


                          To me it looks like a bug in pandas, since is definitively a valid set of boolean values.






                          share|improve this answer



























                            2












                            2








                            2







                            I canot comment on duckworthd's answer, but it is not perfectly working. It crashes when the dataframe is empty:



                            df = pandas.DataFrame(columns=['a', 'b', 'c'])
                            df[df.apply(lambda x: x['b'] > x['c'], axis=1)]


                            Outputs:



                            ValueError: Must pass DataFrame with boolean values only


                            To me it looks like a bug in pandas, since is definitively a valid set of boolean values.






                            share|improve this answer















                            I canot comment on duckworthd's answer, but it is not perfectly working. It crashes when the dataframe is empty:



                            df = pandas.DataFrame(columns=['a', 'b', 'c'])
                            df[df.apply(lambda x: x['b'] > x['c'], axis=1)]


                            Outputs:



                            ValueError: Must pass DataFrame with boolean values only


                            To me it looks like a bug in pandas, since is definitively a valid set of boolean values.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited May 23 '17 at 12:34









                            Community

                            11




                            11










                            answered Jul 10 '15 at 12:16









                            cglacetcglacet

                            1,619820




                            1,619820





















                                0














                                The best approach I've found is, instead of using reduce=True to avoid errors for empty df (since this arg is deprecated anyway), just check that df size > 0 before applying the filter:



                                def my_filter(row):
                                if row.columnA == something:
                                return True

                                return False

                                if len(df.index) > 0:
                                df[df.apply(my_filter, axis=1)]





                                share|improve this answer



























                                  0














                                  The best approach I've found is, instead of using reduce=True to avoid errors for empty df (since this arg is deprecated anyway), just check that df size > 0 before applying the filter:



                                  def my_filter(row):
                                  if row.columnA == something:
                                  return True

                                  return False

                                  if len(df.index) > 0:
                                  df[df.apply(my_filter, axis=1)]





                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    The best approach I've found is, instead of using reduce=True to avoid errors for empty df (since this arg is deprecated anyway), just check that df size > 0 before applying the filter:



                                    def my_filter(row):
                                    if row.columnA == something:
                                    return True

                                    return False

                                    if len(df.index) > 0:
                                    df[df.apply(my_filter, axis=1)]





                                    share|improve this answer













                                    The best approach I've found is, instead of using reduce=True to avoid errors for empty df (since this arg is deprecated anyway), just check that df size > 0 before applying the filter:



                                    def my_filter(row):
                                    if row.columnA == something:
                                    return True

                                    return False

                                    if len(df.index) > 0:
                                    df[df.apply(my_filter, axis=1)]






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Jan 14 at 19:04









                                    user553965user553965

                                    585612




                                    585612



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f11418192%2fpandas-complex-filter-on-rows-of-dataframe%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        AWS Lex not identifying response if by a variable The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceEnforcing custom enumeration in AWS LEX for slot valuesHow to give response based on user response in Amazon Lex?Intercepting AWS Lambda Response to a AWS Lex QueryLex chat bot error: Reached second execution of fulfillment lambda on the same utteranceamazon lex showing invalid responseLambda response send back to Lex slot?Response card in Amazon lexAmazon Lex - Lambda response return HTML to botHow can I solve 424 (Failed Dependency) (python) obtained from Amazon lex?

                                        Алба-Юлія

                                        Захаров Федір Захарович