Create a Dask Dataframes from a strings representing multilevel dictionariesConvert a String representation of a Dictionary to a dictionary?Create a dictionary with list comprehension in PythonDelete an element from a dictionaryPythonic way to create a long multi-line stringHow to remove a key from a Python dictionary?Delete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFramepython dask dataframes - concatenate groupby.apply output to a single data frame
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
Can I ask the recruiters in my resume to put the reason why I am rejected?
A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?
Collect Fourier series terms
Languages that we cannot (dis)prove to be Context-Free
What does it mean to describe someone as a butt steak?
Why do I get two different answers for this counting problem?
Is it legal for company to use my work email to pretend I still work there?
Test whether all array elements are factors of a number
What's the point of deactivating Num Lock on login screens?
What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)
Why was the small council so happy for Tyrion to become the Master of Coin?
Why did Neo believe he could trust the machine when he asked for peace?
Why don't electron-positron collisions release infinite energy?
Prove that NP is closed under karp reduction?
Why can't I see bouncing of a switch on an oscilloscope?
Can divisibility rules for digits be generalized to sum of digits
Is a tag line useful on a cover?
What does "Puller Prush Person" mean?
To string or not to string
"to be prejudice towards/against someone" vs "to be prejudiced against/towards someone"
Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.
Do VLANs within a subnet need to have their own subnet for router on a stick?
Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?
Create a Dask Dataframes from a strings representing multilevel dictionaries
Convert a String representation of a Dictionary to a dictionary?Create a dictionary with list comprehension in PythonDelete an element from a dictionaryPythonic way to create a long multi-line stringHow to remove a key from a Python dictionary?Delete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFramepython dask dataframes - concatenate groupby.apply output to a single data frame
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a massive dataset and I am trying to make dask dataframes out of a list of strings
df_.head()
:
A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...
Note that the column c is a string, so I have to do a literal_eval
.
In pandas I did the following:
import ast
for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])
dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)
Then, after this, I merge dat
with the original dataframe (df_) based on column A.
This process takes forever so I want to do it in dask.
Thanks.
python pandas dictionary dask
add a comment |
I have a massive dataset and I am trying to make dask dataframes out of a list of strings
df_.head()
:
A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...
Note that the column c is a string, so I have to do a literal_eval
.
In pandas I did the following:
import ast
for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])
dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)
Then, after this, I merge dat
with the original dataframe (df_) based on column A.
This process takes forever so I want to do it in dask.
Thanks.
python pandas dictionary dask
Hi Daniel, do you mind to provide a mcve?
– user32185
Mar 8 at 12:42
add a comment |
I have a massive dataset and I am trying to make dask dataframes out of a list of strings
df_.head()
:
A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...
Note that the column c is a string, so I have to do a literal_eval
.
In pandas I did the following:
import ast
for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])
dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)
Then, after this, I merge dat
with the original dataframe (df_) based on column A.
This process takes forever so I want to do it in dask.
Thanks.
python pandas dictionary dask
I have a massive dataset and I am trying to make dask dataframes out of a list of strings
df_.head()
:
A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...
Note that the column c is a string, so I have to do a literal_eval
.
In pandas I did the following:
import ast
for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])
dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)
Then, after this, I merge dat
with the original dataframe (df_) based on column A.
This process takes forever so I want to do it in dask.
Thanks.
python pandas dictionary dask
python pandas dictionary dask
edited Mar 8 at 4:51
Tiw
4,38961730
4,38961730
asked Mar 8 at 4:04
Daniel FernandezDaniel Fernandez
1
1
Hi Daniel, do you mind to provide a mcve?
– user32185
Mar 8 at 12:42
add a comment |
Hi Daniel, do you mind to provide a mcve?
– user32185
Mar 8 at 12:42
Hi Daniel, do you mind to provide a mcve?
– user32185
Mar 8 at 12:42
Hi Daniel, do you mind to provide a mcve?
– user32185
Mar 8 at 12:42
add a comment |
1 Answer
1
active
oldest
votes
dat=pd.concat([dat,b], axis=0, ignore_index=True)
In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.
You might instead try using a Pandas operation like map
or apply
to do this operation to your input dataframe all at once.
You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056556%2fcreate-a-dask-dataframes-from-a-strings-representing-multilevel-dictionaries%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
dat=pd.concat([dat,b], axis=0, ignore_index=True)
In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.
You might instead try using a Pandas operation like map
or apply
to do this operation to your input dataframe all at once.
You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
add a comment |
dat=pd.concat([dat,b], axis=0, ignore_index=True)
In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.
You might instead try using a Pandas operation like map
or apply
to do this operation to your input dataframe all at once.
You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
add a comment |
dat=pd.concat([dat,b], axis=0, ignore_index=True)
In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.
You might instead try using a Pandas operation like map
or apply
to do this operation to your input dataframe all at once.
You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.
dat=pd.concat([dat,b], axis=0, ignore_index=True)
In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.
You might instead try using a Pandas operation like map
or apply
to do this operation to your input dataframe all at once.
You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.
answered Mar 9 at 23:56
MRocklinMRocklin
27.3k1473131
27.3k1473131
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
add a comment |
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.
– Daniel Fernandez
Mar 10 at 2:27
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056556%2fcreate-a-dask-dataframes-from-a-strings-representing-multilevel-dictionaries%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Hi Daniel, do you mind to provide a mcve?
– user32185
Mar 8 at 12:42