sample a dataframe based on priority Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasPandas Dataframe split in to sessionsGet list from pandas DataFrame column headersReplace some specific values in pandas column based on conditions in other columnconcate 2 dataframes having keys of first not in another
If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?
Lagrange four-squares theorem --- deterministic complexity
How to report t statistic from R
What is an "asse" in Elizabethan English?
Is there hard evidence that the grant peer review system performs significantly better than random?
Is there public access to the Meteor Crater in Arizona?
Putting class ranking in CV, but against dept guidelines
Why are my pictures showing a dark band on one edge?
Maximum summed subsequences with non-adjacent items
The test team as an enemy of development? And how can this be avoided?
Getting prompted for verification code but where do I put it in?
What is the meaning of 'breadth' in breadth first search?
Did any compiler fully use 80-bit floating point?
What to do with repeated rejections for phd position
Why does it sometimes sound good to play a grace note as a lead in to a note in a melody?
How many time has Arya actually used Needle?
macOS: Name for app shortcut screen found by pinching with thumb and three fingers
How to compare two different files line by line in unix?
Does the Mueller report show a conspiracy between Russia and the Trump Campaign?
How to write capital alpha?
Why do early math courses focus on the cross sections of a cone and not on other 3D objects?
Would it be easier to apply for a UK visa if there is a host family to sponsor for you in going there?
Most bit efficient text communication method?
What makes a man succeed?
sample a dataframe based on priority
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasPandas Dataframe split in to sessionsGet list from pandas DataFrame column headersReplace some specific values in pandas column based on conditions in other columnconcate 2 dataframes having keys of first not in another
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a dataframe like
import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)
I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.
The only way I can think to do this is split the dataset into 3, and then use if statements like
if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])
and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?
python pandas
add a comment |
I have a dataframe like
import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)
I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.
The only way I can think to do this is split the dataset into 3, and then use if statements like
if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])
and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?
python pandas
I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.
– Phil
Mar 8 at 21:57
1
df.sample(frac=1).sort_values(by='priority').head(25)
– user3483203
Mar 8 at 21:57
add a comment |
I have a dataframe like
import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)
I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.
The only way I can think to do this is split the dataset into 3, and then use if statements like
if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])
and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?
python pandas
I have a dataframe like
import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)
I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.
The only way I can think to do this is split the dataset into 3, and then use if statements like
if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])
and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?
python pandas
python pandas
asked Mar 8 at 21:52
PhilPhil
274
274
I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.
– Phil
Mar 8 at 21:57
1
df.sample(frac=1).sort_values(by='priority').head(25)
– user3483203
Mar 8 at 21:57
add a comment |
I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.
– Phil
Mar 8 at 21:57
1
df.sample(frac=1).sort_values(by='priority').head(25)
– user3483203
Mar 8 at 21:57
I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.
– Phil
Mar 8 at 21:57
I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.
– Phil
Mar 8 at 21:57
1
1
df.sample(frac=1).sort_values(by='priority').head(25)
– user3483203
Mar 8 at 21:57
df.sample(frac=1).sort_values(by='priority').head(25)
– user3483203
Mar 8 at 21:57
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55071506%2fsample-a-dataframe-based-on-priority%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55071506%2fsample-a-dataframe-based-on-priority%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.
– Phil
Mar 8 at 21:57
1
df.sample(frac=1).sort_values(by='priority').head(25)
– user3483203
Mar 8 at 21:57