Shuffle a DataFrame while keeping internal order Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Shuffling a list of objectsHow to randomize (shuffle) a JavaScript array?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersShuffle DataFrame rows
Does silver oxide react with hydrogen sulfide?
Asymptotics question
Did Mueller's report provide an evidentiary basis for the claim of Russian govt election interference via social media?
How to ask rejected full-time candidates to apply to teach individual courses?
malloc in main() or malloc in another function: allocating memory for a struct and its members
Tips to organize LaTeX presentations for a semester
A term for a woman complaining about things/begging in a cute/childish way
GDP with Intermediate Production
Is it possible for SQL statements to execute concurrently within a single session in SQL Server?
One-one communication
Google .dev domain strangely redirects to https
Why do early math courses focus on the cross sections of a cone and not on other 3D objects?
Why datecode is SO IMPORTANT to chip manufacturers?
Why is a lens darker than other ones when applying the same settings?
Did pre-Columbian Americans know the spherical shape of the Earth?
Special flights
How does TikZ render an arc?
Can you force honesty by using the Speak with Dead and Zone of Truth spells together?
Why not use the yoke to control yaw, as well as pitch and roll?
Why is it faster to reheat something than it is to cook it?
Can two people see the same photon?
Why is the change of basis formula counter-intuitive? [See details]
My mentor says to set image to Fine instead of RAW — how is this different from JPG?
How to write capital alpha?
Shuffle a DataFrame while keeping internal order
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Shuffling a list of objectsHow to randomize (shuffle) a JavaScript array?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersShuffle DataFrame rows
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).
I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:a = [1,2,3,4,10,11,12,13,20,21,22,23]
will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13]
.
df.sample(frac=1)
is not enough since it will break the sequences.
Solution , thanks to @Wen-Ben:
seq_length = 4
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])
python pandas shuffle
add a comment |
I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).
I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:a = [1,2,3,4,10,11,12,13,20,21,22,23]
will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13]
.
df.sample(frac=1)
is not enough since it will break the sequences.
Solution , thanks to @Wen-Ben:
seq_length = 4
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])
python pandas shuffle
1
Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?
– entropy
Mar 8 at 23:47
Question has nothing to do withmachine-learning
- kindly do not spam the tag (removed).
– desertnaut
Mar 9 at 0:19
@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat
– M.F
Mar 9 at 9:34
add a comment |
I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).
I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:a = [1,2,3,4,10,11,12,13,20,21,22,23]
will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13]
.
df.sample(frac=1)
is not enough since it will break the sequences.
Solution , thanks to @Wen-Ben:
seq_length = 4
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])
python pandas shuffle
I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).
I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:a = [1,2,3,4,10,11,12,13,20,21,22,23]
will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13]
.
df.sample(frac=1)
is not enough since it will break the sequences.
Solution , thanks to @Wen-Ben:
seq_length = 4
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])
python pandas shuffle
python pandas shuffle
edited Mar 9 at 10:31
M.F
asked Mar 8 at 23:33
M.FM.F
285
285
1
Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?
– entropy
Mar 8 at 23:47
Question has nothing to do withmachine-learning
- kindly do not spam the tag (removed).
– desertnaut
Mar 9 at 0:19
@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat
– M.F
Mar 9 at 9:34
add a comment |
1
Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?
– entropy
Mar 8 at 23:47
Question has nothing to do withmachine-learning
- kindly do not spam the tag (removed).
– desertnaut
Mar 9 at 0:19
@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat
– M.F
Mar 9 at 9:34
1
1
Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?
– entropy
Mar 8 at 23:47
Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?
– entropy
Mar 8 at 23:47
Question has nothing to do with
machine-learning
- kindly do not spam the tag (removed).– desertnaut
Mar 9 at 0:19
Question has nothing to do with
machine-learning
- kindly do not spam the tag (removed).– desertnaut
Mar 9 at 0:19
@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat
– M.F
Mar 9 at 9:34
@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat
– M.F
Mar 9 at 9:34
add a comment |
3 Answers
3
active
oldest
votes
Is this what you need , np.random.choice
d=x : y for x, y in df.groupby(np.arange(len(df))//4)
yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
yourdf
Out[986]:
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to usenp.roll
)
– P Maschhoff
Mar 9 at 0:50
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,reshape
will failed
– Wen-Ben
Mar 9 at 0:54
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
add a comment |
You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.
Example:
df = pd.DataFrame(np.random.randint(10, size=(12, 2)))
a b
0 5 4
1 7 7
2 7 8
3 8 4
4 9 4
5 9 0
6 1 5
7 4 1
8 0 1
9 5 6
10 1 3
11 9 2
new_index = np.array(df.index).reshape(-1, 4)
np.random.shuffle(new_index) # shuffles array in-place
df = df.loc[new_index.reshape(-1)]
a b
8 0 1
9 5 6
10 1 3
11 9 2
4 9 4
5 9 0
6 1 5
7 4 1
0 5 4
1 7 7
2 7 8
3 8 4
add a comment |
As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.
>>> import pandas as pd
>>> import numpy as np
Creating the table:
>>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
>>> df
col1 col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
>>> df.shape[0]
8
Creating the list for shuffling:
>>> np_range = np.arange(0,df.shape[0])
>>> np_range
array([0, 1, 2, 3, 4, 5, 6, 7])
Reshaping and shuffling:
>>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
>>> np_range1
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> np.random.shuffle(np_range1)
>>> np_range1
array([[4, 5, 6, 7],
[0, 1, 2, 3]])
>>> np_range2 = np.reshape(np_range1,(df.shape[0],))
>>> np_range2
array([4, 5, 6, 7, 0, 1, 2, 3])
Selecting the data:
>>> new_df = df.loc[np_range2]
>>> new_df
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
I hope this helps! Thank you!
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
When the length of df is notn*4
, for example , it is has 6 rows, reshape will fail any thought ?
– Wen-Ben
Mar 9 at 0:55
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072435%2fshuffle-a-dataframe-while-keeping-internal-order%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this what you need , np.random.choice
d=x : y for x, y in df.groupby(np.arange(len(df))//4)
yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
yourdf
Out[986]:
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to usenp.roll
)
– P Maschhoff
Mar 9 at 0:50
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,reshape
will failed
– Wen-Ben
Mar 9 at 0:54
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
add a comment |
Is this what you need , np.random.choice
d=x : y for x, y in df.groupby(np.arange(len(df))//4)
yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
yourdf
Out[986]:
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to usenp.roll
)
– P Maschhoff
Mar 9 at 0:50
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,reshape
will failed
– Wen-Ben
Mar 9 at 0:54
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
add a comment |
Is this what you need , np.random.choice
d=x : y for x, y in df.groupby(np.arange(len(df))//4)
yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
yourdf
Out[986]:
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
Is this what you need , np.random.choice
d=x : y for x, y in df.groupby(np.arange(len(df))//4)
yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
yourdf
Out[986]:
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
edited Mar 9 at 0:51
answered Mar 9 at 0:13
Wen-BenWen-Ben
127k83872
127k83872
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to usenp.roll
)
– P Maschhoff
Mar 9 at 0:50
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,reshape
will failed
– Wen-Ben
Mar 9 at 0:54
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
add a comment |
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to usenp.roll
)
– P Maschhoff
Mar 9 at 0:50
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,reshape
will failed
– Wen-Ben
Mar 9 at 0:54
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use
np.roll
)– P Maschhoff
Mar 9 at 0:50
This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use
np.roll
)– P Maschhoff
Mar 9 at 0:50
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,
reshape
will failed– Wen-Ben
Mar 9 at 0:54
@PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` ,
reshape
will failed– Wen-Ben
Mar 9 at 0:54
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
@Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!
– M.F
Mar 9 at 10:29
add a comment |
You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.
Example:
df = pd.DataFrame(np.random.randint(10, size=(12, 2)))
a b
0 5 4
1 7 7
2 7 8
3 8 4
4 9 4
5 9 0
6 1 5
7 4 1
8 0 1
9 5 6
10 1 3
11 9 2
new_index = np.array(df.index).reshape(-1, 4)
np.random.shuffle(new_index) # shuffles array in-place
df = df.loc[new_index.reshape(-1)]
a b
8 0 1
9 5 6
10 1 3
11 9 2
4 9 4
5 9 0
6 1 5
7 4 1
0 5 4
1 7 7
2 7 8
3 8 4
add a comment |
You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.
Example:
df = pd.DataFrame(np.random.randint(10, size=(12, 2)))
a b
0 5 4
1 7 7
2 7 8
3 8 4
4 9 4
5 9 0
6 1 5
7 4 1
8 0 1
9 5 6
10 1 3
11 9 2
new_index = np.array(df.index).reshape(-1, 4)
np.random.shuffle(new_index) # shuffles array in-place
df = df.loc[new_index.reshape(-1)]
a b
8 0 1
9 5 6
10 1 3
11 9 2
4 9 4
5 9 0
6 1 5
7 4 1
0 5 4
1 7 7
2 7 8
3 8 4
add a comment |
You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.
Example:
df = pd.DataFrame(np.random.randint(10, size=(12, 2)))
a b
0 5 4
1 7 7
2 7 8
3 8 4
4 9 4
5 9 0
6 1 5
7 4 1
8 0 1
9 5 6
10 1 3
11 9 2
new_index = np.array(df.index).reshape(-1, 4)
np.random.shuffle(new_index) # shuffles array in-place
df = df.loc[new_index.reshape(-1)]
a b
8 0 1
9 5 6
10 1 3
11 9 2
4 9 4
5 9 0
6 1 5
7 4 1
0 5 4
1 7 7
2 7 8
3 8 4
You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.
Example:
df = pd.DataFrame(np.random.randint(10, size=(12, 2)))
a b
0 5 4
1 7 7
2 7 8
3 8 4
4 9 4
5 9 0
6 1 5
7 4 1
8 0 1
9 5 6
10 1 3
11 9 2
new_index = np.array(df.index).reshape(-1, 4)
np.random.shuffle(new_index) # shuffles array in-place
df = df.loc[new_index.reshape(-1)]
a b
8 0 1
9 5 6
10 1 3
11 9 2
4 9 4
5 9 0
6 1 5
7 4 1
0 5 4
1 7 7
2 7 8
3 8 4
answered Mar 9 at 0:18
P MaschhoffP Maschhoff
1663
1663
add a comment |
add a comment |
As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.
>>> import pandas as pd
>>> import numpy as np
Creating the table:
>>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
>>> df
col1 col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
>>> df.shape[0]
8
Creating the list for shuffling:
>>> np_range = np.arange(0,df.shape[0])
>>> np_range
array([0, 1, 2, 3, 4, 5, 6, 7])
Reshaping and shuffling:
>>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
>>> np_range1
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> np.random.shuffle(np_range1)
>>> np_range1
array([[4, 5, 6, 7],
[0, 1, 2, 3]])
>>> np_range2 = np.reshape(np_range1,(df.shape[0],))
>>> np_range2
array([4, 5, 6, 7, 0, 1, 2, 3])
Selecting the data:
>>> new_df = df.loc[np_range2]
>>> new_df
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
I hope this helps! Thank you!
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
When the length of df is notn*4
, for example , it is has 6 rows, reshape will fail any thought ?
– Wen-Ben
Mar 9 at 0:55
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
|
show 1 more comment
As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.
>>> import pandas as pd
>>> import numpy as np
Creating the table:
>>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
>>> df
col1 col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
>>> df.shape[0]
8
Creating the list for shuffling:
>>> np_range = np.arange(0,df.shape[0])
>>> np_range
array([0, 1, 2, 3, 4, 5, 6, 7])
Reshaping and shuffling:
>>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
>>> np_range1
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> np.random.shuffle(np_range1)
>>> np_range1
array([[4, 5, 6, 7],
[0, 1, 2, 3]])
>>> np_range2 = np.reshape(np_range1,(df.shape[0],))
>>> np_range2
array([4, 5, 6, 7, 0, 1, 2, 3])
Selecting the data:
>>> new_df = df.loc[np_range2]
>>> new_df
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
I hope this helps! Thank you!
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
When the length of df is notn*4
, for example , it is has 6 rows, reshape will fail any thought ?
– Wen-Ben
Mar 9 at 0:55
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
|
show 1 more comment
As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.
>>> import pandas as pd
>>> import numpy as np
Creating the table:
>>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
>>> df
col1 col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
>>> df.shape[0]
8
Creating the list for shuffling:
>>> np_range = np.arange(0,df.shape[0])
>>> np_range
array([0, 1, 2, 3, 4, 5, 6, 7])
Reshaping and shuffling:
>>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
>>> np_range1
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> np.random.shuffle(np_range1)
>>> np_range1
array([[4, 5, 6, 7],
[0, 1, 2, 3]])
>>> np_range2 = np.reshape(np_range1,(df.shape[0],))
>>> np_range2
array([4, 5, 6, 7, 0, 1, 2, 3])
Selecting the data:
>>> new_df = df.loc[np_range2]
>>> new_df
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
I hope this helps! Thank you!
As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.
>>> import pandas as pd
>>> import numpy as np
Creating the table:
>>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
>>> df
col1 col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
>>> df.shape[0]
8
Creating the list for shuffling:
>>> np_range = np.arange(0,df.shape[0])
>>> np_range
array([0, 1, 2, 3, 4, 5, 6, 7])
Reshaping and shuffling:
>>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
>>> np_range1
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
>>> np.random.shuffle(np_range1)
>>> np_range1
array([[4, 5, 6, 7],
[0, 1, 2, 3]])
>>> np_range2 = np.reshape(np_range1,(df.shape[0],))
>>> np_range2
array([4, 5, 6, 7, 0, 1, 2, 3])
Selecting the data:
>>> new_df = df.loc[np_range2]
>>> new_df
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d
I hope this helps! Thank you!
edited Mar 9 at 1:12
answered Mar 9 at 0:14
sambasiva raosambasiva rao
1386
1386
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
When the length of df is notn*4
, for example , it is has 6 rows, reshape will fail any thought ?
– Wen-Ben
Mar 9 at 0:55
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
|
show 1 more comment
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
When the length of df is notn*4
, for example , it is has 6 rows, reshape will fail any thought ?
– Wen-Ben
Mar 9 at 0:55
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@M.F I hope this helps
– sambasiva rao
Mar 9 at 0:16
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
@Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!
– sambasiva rao
Mar 9 at 0:19
When the length of df is not
n*4
, for example , it is has 6 rows, reshape will fail any thought ?– Wen-Ben
Mar 9 at 0:55
When the length of df is not
n*4
, for example , it is has 6 rows, reshape will fail any thought ?– Wen-Ben
Mar 9 at 0:55
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
@Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?
– sambasiva rao
Mar 9 at 1:06
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
I am not sure about that , that is why I am not using reshape.
– Wen-Ben
Mar 9 at 1:12
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072435%2fshuffle-a-dataframe-while-keeping-internal-order%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?
– entropy
Mar 8 at 23:47
Question has nothing to do with
machine-learning
- kindly do not spam the tag (removed).– desertnaut
Mar 9 at 0:19
@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat
– M.F
Mar 9 at 9:34