How to implement 'in' and 'not in' for Pandas dataframeFilter dataframe rows if value in column is in a set list of valuesRemove rows not .isin('X')python pandas loc - filter for list of valuespython matplotlib window.setGeometry is giving me error FigureManagerBase' object has no attribute 'window'how to Search specific cell in Pandas through a list of matching contentSelecting rows based on multiple values in Pandasreplace all values in a column, based on multiple conditionsTest if every element of an array is in another arrayHow to compare 2 lists and to get True or False list in Python?Filter dataframe matching column values with list values in pythonHow to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Add one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

What's a natural way to say that someone works somewhere (for a job)?

Print name if parameter passed to function

What would be the benefits of having both a state and local currencies?

Lay out the Carpet

Why is delta-v is the most useful quantity for planning space travel?

How could Frankenstein get the parts for his _second_ creature?

What are the ramifications of creating a homebrew world without an Astral Plane?

Was the picture area of a CRT a parallelogram (instead of a true rectangle)?

What's the purpose of "true" in bash "if sudo true; then"

How do I keep an essay about "feeling flat" from feeling flat?

Cynical novel that describes an America ruled by the media, arms manufacturers, and ethnic figureheads

Curses work by shouting - How to avoid collateral damage?

What is the opposite of 'gravitas'?

Products and sum of cubes in Fibonacci

Personal Teleportation as a Weapon

Time travel short story where a man arrives in the late 19th century in a time machine and then sends the machine back into the past

Finding all intervals that match predicate in vector

Applicability of Single Responsibility Principle

How was Earth single-handedly capable of creating 3 of the 4 gods of chaos?

Hostile work environment after whistle-blowing on coworker and our boss. What do I do?

Hide Select Output from T-SQL

Was Spock the First Vulcan in Starfleet?

What is the term when two people sing in harmony, but they aren't singing the same notes?

At which point does a character regain all their Hit Dice?

How to implement 'in' and 'not in' for Pandas dataframe

Filter dataframe rows if value in column is in a set list of valuesRemove rows not .isin('X')python pandas loc - filter for list of valuespython matplotlib window.setGeometry is giving me error FigureManagerBase' object has no attribute 'window'how to Search specific cell in Pandas through a list of matching contentSelecting rows based on multiple values in Pandasreplace all values in a column, based on multiple conditionsTest if every element of an array is in another arrayHow to compare 2 lists and to get True or False list in Python?Filter dataframe matching column values with list values in pythonHow to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Add one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

229

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values.
Here's the scenario:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = pd.DataFrame('countries':['UK','China'], 'matched':True)

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

edited Jul 17 '15 at 20:25

smci

15.5k678109

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

1

I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.

– Bruce Jung
Mar 17 '15 at 1:55

Do you want to test on single column or multiple columns?

– smci
Jul 17 '15 at 20:26

Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array

– jpp
Jun 28 '18 at 0:06

add a comment |

229

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values.
Here's the scenario:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = pd.DataFrame('countries':['UK','China'], 'matched':True)

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

edited Jul 17 '15 at 20:25

smci

15.5k678109

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

1

I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.

– Bruce Jung
Mar 17 '15 at 1:55

Do you want to test on single column or multiple columns?

– smci
Jul 17 '15 at 20:26

Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array

– jpp
Jun 28 '18 at 0:06

add a comment |

229

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values.
Here's the scenario:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = pd.DataFrame('countries':['UK','China'], 'matched':True)

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

edited Jul 17 '15 at 20:25

smci

15.5k678109

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values.
Here's the scenario:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = pd.DataFrame('countries':['UK','China'], 'matched':True)

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

python pandas dataframe sql-function

edited Jul 17 '15 at 20:25

smci

15.5k678109

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

edited Jul 17 '15 at 20:25

smci

15.5k678109

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

edited Jul 17 '15 at 20:25

smci

15.5k678109

edited Jul 17 '15 at 20:25

smci

15.5k678109

edited Jul 17 '15 at 20:25

smci

15.5k678109

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

asked Nov 13 '13 at 17:11

LondonRob

27.8k1676117

1

I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.

– Bruce Jung
Mar 17 '15 at 1:55

Do you want to test on single column or multiple columns?

– smci
Jul 17 '15 at 20:26

Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array

– jpp
Jun 28 '18 at 0:06

add a comment |

1

I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.

– Bruce Jung
Mar 17 '15 at 1:55

Do you want to test on single column or multiple columns?

– smci
Jul 17 '15 at 20:26

Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array

– jpp
Jun 28 '18 at 0:06

I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.

– Bruce Jung
Mar 17 '15 at 1:55

Do you want to test on single column or multiple columns?

– smci
Jul 17 '15 at 20:26

Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array

– jpp
Jun 28 '18 at 0:06

add a comment |

5 Answers
5

active

oldest

votes

494

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

>>> df
 countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
 countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
 countries
0 US
2 Germany

edited Apr 15 '18 at 17:52

jpp

102k2165116

answered Nov 13 '13 at 17:13

DSM

214k35411379

32

isin is not inverse sin()? :D

– Kos
Nov 13 '13 at 17:15

1

Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's isin was added in .13.

– TomAugspurger
Nov 13 '13 at 18:07

Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)

– LondonRob
Nov 13 '13 at 18:41

2

@TomAugspurger: like usual, I'm probably missing something. df, both mine and his, is a DataFrame. countries is a list. df[~df.countries.isin(countries)] produces a DataFrame, not a Series, and seems to work even back in 0.11.0.dev-14a04dd.

– DSM
Nov 14 '13 at 16:10

3

This answer is confusing because you keep reusing the countries variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.

– ifly6
May 18 '18 at 22:20

|
show 4 more comments

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries")
Out[5]:
 countries
1 UK
3 China

In [6]: df.query("countries not in @countries")
Out[6]:
 countries
0 US
2 Germany

answered Jul 19 '17 at 12:19

MaxU

124k12126182

4

Note that this is currently marked as "experimental" in the docs...

– LondonRob
Jul 19 '17 at 14:49

add a comment |

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

7

FYI, this is much slower than @DSM soln which is vectorized

– Jeff
Nov 13 '13 at 17:47

@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)

– Kos
Nov 14 '13 at 7:42

add a comment |

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[~dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID'])]

edited Mar 11 at 6:25

jezrael

351k26315391

answered Jul 13 '17 at 3:12

Sam Henderson

30135

4

You can negate the isin (as done in the accepted answer) rather than comparing to False

– cricket_007
Jul 19 '17 at 12:17

add a comment |

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

implement in:

df[df.countries.isin(countries)]

implement not in as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19960077%2fhow-to-implement-in-and-not-in-for-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

494

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

>>> df
 countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
 countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
 countries
0 US
2 Germany

edited Apr 15 '18 at 17:52

jpp

102k2165116

answered Nov 13 '13 at 17:13

DSM

214k35411379

32

isin is not inverse sin()? :D

– Kos
Nov 13 '13 at 17:15

1

Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's isin was added in .13.

– TomAugspurger
Nov 13 '13 at 18:07

Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)

– LondonRob
Nov 13 '13 at 18:41

2

@TomAugspurger: like usual, I'm probably missing something. df, both mine and his, is a DataFrame. countries is a list. df[~df.countries.isin(countries)] produces a DataFrame, not a Series, and seems to work even back in 0.11.0.dev-14a04dd.

– DSM
Nov 14 '13 at 16:10

3

This answer is confusing because you keep reusing the countries variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.

– ifly6
May 18 '18 at 22:20

|
show 4 more comments

494

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

>>> df
 countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
 countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
 countries
0 US
2 Germany

edited Apr 15 '18 at 17:52

jpp

102k2165116

answered Nov 13 '13 at 17:13

DSM

214k35411379

32

isin is not inverse sin()? :D

– Kos
Nov 13 '13 at 17:15

1

Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's isin was added in .13.

– TomAugspurger
Nov 13 '13 at 18:07

Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)

– LondonRob
Nov 13 '13 at 18:41

2

@TomAugspurger: like usual, I'm probably missing something. df, both mine and his, is a DataFrame. countries is a list. df[~df.countries.isin(countries)] produces a DataFrame, not a Series, and seems to work even back in 0.11.0.dev-14a04dd.

– DSM
Nov 14 '13 at 16:10

3

This answer is confusing because you keep reusing the countries variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.

– ifly6
May 18 '18 at 22:20

|
show 4 more comments

494

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

>>> df
 countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
 countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
 countries
0 US
2 Germany

edited Apr 15 '18 at 17:52

jpp

102k2165116

answered Nov 13 '13 at 17:13

DSM

214k35411379

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

>>> df
 countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
 countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
 countries
0 US
2 Germany

edited Apr 15 '18 at 17:52

jpp

102k2165116

answered Nov 13 '13 at 17:13

DSM

214k35411379

edited Apr 15 '18 at 17:52

jpp

102k2165116

edited Apr 15 '18 at 17:52

jpp

102k2165116

edited Apr 15 '18 at 17:52

jpp

102k2165116

answered Nov 13 '13 at 17:13

DSM

214k35411379

answered Nov 13 '13 at 17:13

DSM

214k35411379

answered Nov 13 '13 at 17:13

DSM

214k35411379

32

isin is not inverse sin()? :D

– Kos
Nov 13 '13 at 17:15

1

Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's isin was added in .13.

– TomAugspurger
Nov 13 '13 at 18:07

Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)

– LondonRob
Nov 13 '13 at 18:41

2

@TomAugspurger: like usual, I'm probably missing something. df, both mine and his, is a DataFrame. countries is a list. df[~df.countries.isin(countries)] produces a DataFrame, not a Series, and seems to work even back in 0.11.0.dev-14a04dd.

– DSM
Nov 14 '13 at 16:10

3

This answer is confusing because you keep reusing the countries variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.

– ifly6
May 18 '18 at 22:20

|
show 4 more comments

32

isin is not inverse sin()? :D

– Kos
Nov 13 '13 at 17:15

1

Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's isin was added in .13.

– TomAugspurger
Nov 13 '13 at 18:07

Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)

– LondonRob
Nov 13 '13 at 18:41

2

@TomAugspurger: like usual, I'm probably missing something. df, both mine and his, is a DataFrame. countries is a list. df[~df.countries.isin(countries)] produces a DataFrame, not a Series, and seems to work even back in 0.11.0.dev-14a04dd.

– DSM
Nov 14 '13 at 16:10

3

This answer is confusing because you keep reusing the countries variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.

– ifly6
May 18 '18 at 22:20

isin is not inverse sin()? :D

– Kos
Nov 13 '13 at 17:15

Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's isin was added in .13.

– TomAugspurger
Nov 13 '13 at 18:07

Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)

– LondonRob
Nov 13 '13 at 18:41

@TomAugspurger: like usual, I'm probably missing something. df, both mine and his, is a DataFrame. countries is a list. df[~df.countries.isin(countries)] produces a DataFrame, not a Series, and seems to work even back in 0.11.0.dev-14a04dd.

– DSM
Nov 14 '13 at 16:10

This answer is confusing because you keep reusing the countries variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.

– ifly6
May 18 '18 at 22:20

|
show 4 more comments

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries")
Out[5]:
 countries
1 UK
3 China

In [6]: df.query("countries not in @countries")
Out[6]:
 countries
0 US
2 Germany

answered Jul 19 '17 at 12:19

MaxU

124k12126182

4

Note that this is currently marked as "experimental" in the docs...

– LondonRob
Jul 19 '17 at 14:49

add a comment |

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries")
Out[5]:
 countries
1 UK
3 China

In [6]: df.query("countries not in @countries")
Out[6]:
 countries
0 US
2 Germany

answered Jul 19 '17 at 12:19

MaxU

124k12126182

4

Note that this is currently marked as "experimental" in the docs...

– LondonRob
Jul 19 '17 at 14:49

add a comment |

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries")
Out[5]:
 countries
1 UK
3 China

In [6]: df.query("countries not in @countries")
Out[6]:
 countries
0 US
2 Germany

answered Jul 19 '17 at 12:19

MaxU

124k12126182

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries")
Out[5]:
 countries
1 UK
3 China

In [6]: df.query("countries not in @countries")
Out[6]:
 countries
0 US
2 Germany

answered Jul 19 '17 at 12:19

MaxU

124k12126182

answered Jul 19 '17 at 12:19

MaxU

124k12126182

answered Jul 19 '17 at 12:19

MaxU

124k12126182

answered Jul 19 '17 at 12:19

MaxU

124k12126182

4

Note that this is currently marked as "experimental" in the docs...

– LondonRob
Jul 19 '17 at 14:49

add a comment |

4

Note that this is currently marked as "experimental" in the docs...

– LondonRob
Jul 19 '17 at 14:49

Note that this is currently marked as "experimental" in the docs...

– LondonRob
Jul 19 '17 at 14:49

add a comment |

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

7

FYI, this is much slower than @DSM soln which is vectorized

– Jeff
Nov 13 '13 at 17:47

@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)

– Kos
Nov 14 '13 at 7:42

add a comment |

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

7

FYI, this is much slower than @DSM soln which is vectorized

– Jeff
Nov 13 '13 at 17:47

@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)

– Kos
Nov 14 '13 at 7:42

add a comment |

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

I've been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

answered Nov 13 '13 at 17:14

Kos

50.5k19123201

7

FYI, this is much slower than @DSM soln which is vectorized

– Jeff
Nov 13 '13 at 17:47

@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)

– Kos
Nov 14 '13 at 7:42

add a comment |

7

FYI, this is much slower than @DSM soln which is vectorized

– Jeff
Nov 13 '13 at 17:47

@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)

– Kos
Nov 14 '13 at 7:42

FYI, this is much slower than @DSM soln which is vectorized

– Jeff
Nov 13 '13 at 17:47

@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)

– Kos
Nov 14 '13 at 7:42

add a comment |

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[~dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID'])]

edited Mar 11 at 6:25

jezrael

351k26315391

answered Jul 13 '17 at 3:12

Sam Henderson

30135

4

You can negate the isin (as done in the accepted answer) rather than comparing to False

– cricket_007
Jul 19 '17 at 12:17

add a comment |

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[~dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID'])]

edited Mar 11 at 6:25

jezrael

351k26315391

answered Jul 13 '17 at 3:12

Sam Henderson

30135

4

You can negate the isin (as done in the accepted answer) rather than comparing to False

– cricket_007
Jul 19 '17 at 12:17

add a comment |

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[~dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID'])]

edited Mar 11 at 6:25

jezrael

351k26315391

answered Jul 13 '17 at 3:12

Sam Henderson

30135

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

Finally got it working:

dfbc = dfbc[~dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID'])]

edited Mar 11 at 6:25

jezrael

351k26315391

answered Jul 13 '17 at 3:12

Sam Henderson

30135

edited Mar 11 at 6:25

jezrael

351k26315391

edited Mar 11 at 6:25

jezrael

351k26315391

edited Mar 11 at 6:25

jezrael

351k26315391

answered Jul 13 '17 at 3:12

Sam Henderson

30135

answered Jul 13 '17 at 3:12

Sam Henderson

30135

answered Jul 13 '17 at 3:12

Sam Henderson

30135

4

You can negate the isin (as done in the accepted answer) rather than comparing to False

– cricket_007
Jul 19 '17 at 12:17

add a comment |

4

You can negate the isin (as done in the accepted answer) rather than comparing to False

– cricket_007
Jul 19 '17 at 12:17

You can negate the isin (as done in the accepted answer) rather than comparing to False

– cricket_007
Jul 19 '17 at 12:17

add a comment |

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

implement in:

df[df.countries.isin(countries)]

implement not in as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

add a comment |

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

implement in:

df[df.countries.isin(countries)]

implement not in as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

add a comment |

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

implement in:

df[df.countries.isin(countries)]

implement not in as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

df = pd.DataFrame('countries':['US','UK','Germany','China'])
countries = ['UK','China']

implement in:

df[df.countries.isin(countries)]

implement not in as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

answered Apr 4 '18 at 11:51

Ioannis Nasios

3,75831036

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

5 Answers
5

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

5 Answers
5

5 Answers
5

5 Answers
5