Pandas group by year, date producing spurious valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSet value for particular cell in pandas DataFrame using indexSelect rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valueGet statistics for each group (such as count, mean, etc) using pandas GroupBy?grouping rows in list in pandas groupbyHow to count the NaN values in a column in pandas DataFrameHow to check if any value is NaN in a Pandas DataFrameQuantileRegression ValueError: operands could not be broadcast together with shapes
Is there a way to generate a list of distinct numbers such that no two subsets ever have an equal sum?
Aliens crash on Earth and go into stasis to wait for technology to fix their ship
Can I criticise the more senior developers around me for not writing clean code?
Is the claim "Employers won't employ people with no 'social media presence'" realistic?
Don’t seats that recline flat defeat the purpose of having seatbelts?
Minor Revision with suggestion of an alternative proof by reviewer
Rivers without rain
infinitely many negative and infinitely many positive numbers
How much cash can I safely carry into the USA and avoid civil forfeiture?
I preordered a game on my Xbox while on the home screen of my friend's account. Which of us owns the game?
Critique of timeline aesthetic
555 timer FM transmitter
Relationship between strut and baselineskip
Classification of surfaces
How to not starve gigantic beasts
How to write a column outside the braces in a matrix?
Can SQL Server create collisions in system generated constraint names?
What are the steps to solving this definite integral?
a sore throat vs a strep throat vs strep throat
How does Captain America channel this power?
Can we say “you can pay when the order gets ready”?
Is Diceware more secure than a long passphrase?
What happened to Captain America in Endgame?
Get consecutive integer number ranges from list of int
Pandas group by year, date producing spurious values
Use a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSet value for particular cell in pandas DataFrame using indexSelect rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valueGet statistics for each group (such as count, mean, etc) using pandas GroupBy?grouping rows in list in pandas groupbyHow to count the NaN values in a column in pandas DataFrameHow to check if any value is NaN in a Pandas DataFrameQuantileRegression ValueError: operands could not be broadcast together with shapes
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a pandas dataframe of daily stats from 1st-Jan-2015 to 3rd-Mar-2019. Reading this to a df and applying groupby month/year produces spurious values right until Dec-2019. Here is the code to get the MultiIndex levels:
col_types = 'count': np.int64, 'value': np.float64
df = pd.read_csv("myfile.csv", sep = 't', index_col = 1, dtype = col_types, parse_dates=True)
df.dtypes # count int64, value float64
type(df.index) #pandas.core.indexes.datetimes.DatetimeIndex
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index
MultiIndex(levels=[[2015, 2016, 2017, 2018, 2019], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11]],
names=['year', 'month'])
It seems MultiIndex levels are being created even for dates outside the range of the data. Instead of filtering is there a way of avoiding this during the groupby() call itself? Thanks
python pandas
|
show 3 more comments
I have a pandas dataframe of daily stats from 1st-Jan-2015 to 3rd-Mar-2019. Reading this to a df and applying groupby month/year produces spurious values right until Dec-2019. Here is the code to get the MultiIndex levels:
col_types = 'count': np.int64, 'value': np.float64
df = pd.read_csv("myfile.csv", sep = 't', index_col = 1, dtype = col_types, parse_dates=True)
df.dtypes # count int64, value float64
type(df.index) #pandas.core.indexes.datetimes.DatetimeIndex
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index
MultiIndex(levels=[[2015, 2016, 2017, 2018, 2019], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11]],
names=['year', 'month'])
It seems MultiIndex levels are being created even for dates outside the range of the data. Instead of filtering is there a way of avoiding this during the groupby() call itself? Thanks
python pandas
What isdf.index.min()
anddf.index.max()
?
– Chris A
Mar 9 at 9:23
df.index.min() == (2015, 1), df.index.max() == (2019, 12)
– shanlodh
Mar 9 at 9:24
The data runs only until Mar-19, shouldn't df.index.max() == (2019, 3)?
– shanlodh
Mar 9 at 9:26
@perl: the problem is that it's grouping non-existent data
– shanlodh
Mar 9 at 9:26
1
Oh, but you said yourdf.index.max() == (2019, 12)
. So the problem is with the index in the original DataFrame
– perl
Mar 9 at 9:34
|
show 3 more comments
I have a pandas dataframe of daily stats from 1st-Jan-2015 to 3rd-Mar-2019. Reading this to a df and applying groupby month/year produces spurious values right until Dec-2019. Here is the code to get the MultiIndex levels:
col_types = 'count': np.int64, 'value': np.float64
df = pd.read_csv("myfile.csv", sep = 't', index_col = 1, dtype = col_types, parse_dates=True)
df.dtypes # count int64, value float64
type(df.index) #pandas.core.indexes.datetimes.DatetimeIndex
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index
MultiIndex(levels=[[2015, 2016, 2017, 2018, 2019], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11]],
names=['year', 'month'])
It seems MultiIndex levels are being created even for dates outside the range of the data. Instead of filtering is there a way of avoiding this during the groupby() call itself? Thanks
python pandas
I have a pandas dataframe of daily stats from 1st-Jan-2015 to 3rd-Mar-2019. Reading this to a df and applying groupby month/year produces spurious values right until Dec-2019. Here is the code to get the MultiIndex levels:
col_types = 'count': np.int64, 'value': np.float64
df = pd.read_csv("myfile.csv", sep = 't', index_col = 1, dtype = col_types, parse_dates=True)
df.dtypes # count int64, value float64
type(df.index) #pandas.core.indexes.datetimes.DatetimeIndex
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index
MultiIndex(levels=[[2015, 2016, 2017, 2018, 2019], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11]],
names=['year', 'month'])
It seems MultiIndex levels are being created even for dates outside the range of the data. Instead of filtering is there a way of avoiding this during the groupby() call itself? Thanks
python pandas
python pandas
asked Mar 9 at 9:16
shanlodhshanlodh
189212
189212
What isdf.index.min()
anddf.index.max()
?
– Chris A
Mar 9 at 9:23
df.index.min() == (2015, 1), df.index.max() == (2019, 12)
– shanlodh
Mar 9 at 9:24
The data runs only until Mar-19, shouldn't df.index.max() == (2019, 3)?
– shanlodh
Mar 9 at 9:26
@perl: the problem is that it's grouping non-existent data
– shanlodh
Mar 9 at 9:26
1
Oh, but you said yourdf.index.max() == (2019, 12)
. So the problem is with the index in the original DataFrame
– perl
Mar 9 at 9:34
|
show 3 more comments
What isdf.index.min()
anddf.index.max()
?
– Chris A
Mar 9 at 9:23
df.index.min() == (2015, 1), df.index.max() == (2019, 12)
– shanlodh
Mar 9 at 9:24
The data runs only until Mar-19, shouldn't df.index.max() == (2019, 3)?
– shanlodh
Mar 9 at 9:26
@perl: the problem is that it's grouping non-existent data
– shanlodh
Mar 9 at 9:26
1
Oh, but you said yourdf.index.max() == (2019, 12)
. So the problem is with the index in the original DataFrame
– perl
Mar 9 at 9:34
What is
df.index.min()
and df.index.max()
?– Chris A
Mar 9 at 9:23
What is
df.index.min()
and df.index.max()
?– Chris A
Mar 9 at 9:23
df.index.min() == (2015, 1), df.index.max() == (2019, 12)
– shanlodh
Mar 9 at 9:24
df.index.min() == (2015, 1), df.index.max() == (2019, 12)
– shanlodh
Mar 9 at 9:24
The data runs only until Mar-19, shouldn't df.index.max() == (2019, 3)?
– shanlodh
Mar 9 at 9:26
The data runs only until Mar-19, shouldn't df.index.max() == (2019, 3)?
– shanlodh
Mar 9 at 9:26
@perl: the problem is that it's grouping non-existent data
– shanlodh
Mar 9 at 9:26
@perl: the problem is that it's grouping non-existent data
– shanlodh
Mar 9 at 9:26
1
1
Oh, but you said your
df.index.max() == (2019, 12)
. So the problem is with the index in the original DataFrame– perl
Mar 9 at 9:34
Oh, but you said your
df.index.max() == (2019, 12)
. So the problem is with the index in the original DataFrame– perl
Mar 9 at 9:34
|
show 3 more comments
1 Answer
1
active
oldest
votes
The problem seems to be with the index of the original DataFrame df
, e.g. if we set df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
, it works without any issue:
df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index.max()
Output:
(2019, 3)
P.S. By the way, any reason for not using resample
instead of groupby?
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075781%2fpandas-group-by-year-date-producing-spurious-values%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The problem seems to be with the index of the original DataFrame df
, e.g. if we set df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
, it works without any issue:
df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index.max()
Output:
(2019, 3)
P.S. By the way, any reason for not using resample
instead of groupby?
add a comment |
The problem seems to be with the index of the original DataFrame df
, e.g. if we set df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
, it works without any issue:
df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index.max()
Output:
(2019, 3)
P.S. By the way, any reason for not using resample
instead of groupby?
add a comment |
The problem seems to be with the index of the original DataFrame df
, e.g. if we set df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
, it works without any issue:
df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index.max()
Output:
(2019, 3)
P.S. By the way, any reason for not using resample
instead of groupby?
The problem seems to be with the index of the original DataFrame df
, e.g. if we set df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
, it works without any issue:
df = pd.DataFrame('a': 1, index=pd.date_range('2015-01-01', '2019-03-03'))
group_by_list = [df.index.year, df.index.month]
grouped_df = df.groupby(group_by_list).sum()
index_rename_names_list = ['year', 'month']
index_rename_position_list = [0, 1]
grouped_df.index.rename(index_rename_names_list, index_rename_position_list, inplace = True)
grouped_df.index.max()
Output:
(2019, 3)
P.S. By the way, any reason for not using resample
instead of groupby?
edited Mar 9 at 9:37
answered Mar 9 at 9:27
perlperl
1,916416
1,916416
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075781%2fpandas-group-by-year-date-producing-spurious-values%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What is
df.index.min()
anddf.index.max()
?– Chris A
Mar 9 at 9:23
df.index.min() == (2015, 1), df.index.max() == (2019, 12)
– shanlodh
Mar 9 at 9:24
The data runs only until Mar-19, shouldn't df.index.max() == (2019, 3)?
– shanlodh
Mar 9 at 9:26
@perl: the problem is that it's grouping non-existent data
– shanlodh
Mar 9 at 9:26
1
Oh, but you said your
df.index.max() == (2019, 12)
. So the problem is with the index in the original DataFrame– perl
Mar 9 at 9:34