sample a dataframe based on priority Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasPandas Dataframe split in to sessionsGet list from pandas DataFrame column headersReplace some specific values in pandas column based on conditions in other columnconcate 2 dataframes having keys of first not in another

If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?

Lagrange four-squares theorem --- deterministic complexity

How to report t statistic from R

What is an "asse" in Elizabethan English?

Is there hard evidence that the grant peer review system performs significantly better than random?

Is there public access to the Meteor Crater in Arizona?

Putting class ranking in CV, but against dept guidelines

Why are my pictures showing a dark band on one edge?

Maximum summed subsequences with non-adjacent items

The test team as an enemy of development? And how can this be avoided?

Getting prompted for verification code but where do I put it in?

What is the meaning of 'breadth' in breadth first search?

Did any compiler fully use 80-bit floating point?

What to do with repeated rejections for phd position

Why does it sometimes sound good to play a grace note as a lead in to a note in a melody?

How many time has Arya actually used Needle?

macOS: Name for app shortcut screen found by pinching with thumb and three fingers

How to compare two different files line by line in unix?

Does the Mueller report show a conspiracy between Russia and the Trump Campaign?

How to write capital alpha?

Why do early math courses focus on the cross sections of a cone and not on other 3D objects?

Would it be easier to apply for a UK visa if there is a host family to sponsor for you in going there?

Most bit efficient text communication method?

What makes a man succeed?



sample a dataframe based on priority



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasPandas Dataframe split in to sessionsGet list from pandas DataFrame column headersReplace some specific values in pandas column based on conditions in other columnconcate 2 dataframes having keys of first not in another



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a dataframe like



import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)


I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.



The only way I can think to do this is split the dataset into 3, and then use if statements like



if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])


and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?










share|improve this question






















  • I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.

    – Phil
    Mar 8 at 21:57






  • 1





    df.sample(frac=1).sort_values(by='priority').head(25)

    – user3483203
    Mar 8 at 21:57

















0















I have a dataframe like



import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)


I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.



The only way I can think to do this is split the dataset into 3, and then use if statements like



if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])


and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?










share|improve this question






















  • I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.

    – Phil
    Mar 8 at 21:57






  • 1





    df.sample(frac=1).sort_values(by='priority').head(25)

    – user3483203
    Mar 8 at 21:57













0












0








0








I have a dataframe like



import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)


I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.



The only way I can think to do this is split the dataset into 3, and then use if statements like



if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])


and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?










share|improve this question














I have a dataframe like



import random
import pandas as pd
col1 = random.choices([1,2,3,4,5],k=50)
col2 = random.choices(['A','B','C'],k=50)
df = pd.DataFrame('values':col1,'priority':col2)


I want to pull a sample of 25, but if 25 have priority 'A', I want those 25. if 30 have priority 'A', I want 25 chosen from that 30. if 20 have priority 'A', I want those 20, and a random 5 from the 30 with priority 'B'. If 10 have priority 'A' and 10 have priority 'B', I want all 10 'A' and all 10 'B', and 5 random 'C'.



The only way I can think to do this is split the dataset into 3, and then use if statements like



if len(df_A) ==25:
output = df_A
elif len(df_A) >25:
output = df_A.sample(n=25)
elif len(df_A) + len(df_B) == 25:
output = pd.concat([df_A,df_b])


and so on.
Is there a better way to do this? possibly something that scales up to a larger number of priority groups?







python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 8 at 21:52









PhilPhil

274




274












  • I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.

    – Phil
    Mar 8 at 21:57






  • 1





    df.sample(frac=1).sort_values(by='priority').head(25)

    – user3483203
    Mar 8 at 21:57

















  • I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.

    – Phil
    Mar 8 at 21:57






  • 1





    df.sample(frac=1).sort_values(by='priority').head(25)

    – user3483203
    Mar 8 at 21:57
















I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.

– Phil
Mar 8 at 21:57





I just had the idea to randomize the dataframe, then use sort and take the top 25 of the dataframe. I think this will work.

– Phil
Mar 8 at 21:57




1




1





df.sample(frac=1).sort_values(by='priority').head(25)

– user3483203
Mar 8 at 21:57





df.sample(frac=1).sort_values(by='priority').head(25)

– user3483203
Mar 8 at 21:57












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55071506%2fsample-a-dataframe-based-on-priority%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55071506%2fsample-a-dataframe-based-on-priority%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

1928 у кіно

Захаров Федір Захарович

Ель Греко