Create a Dask Dataframes from a strings representing multilevel dictionariesConvert a String representation of a Dictionary to a dictionary?Create a dictionary with list comprehension in PythonDelete an element from a dictionaryPythonic way to create a long multi-line stringHow to remove a key from a Python dictionary?Delete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFramepython dask dataframes - concatenate groupby.apply output to a single data frame

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

Can I ask the recruiters in my resume to put the reason why I am rejected?

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

Collect Fourier series terms

Languages that we cannot (dis)prove to be Context-Free

What does it mean to describe someone as a butt steak?

Why do I get two different answers for this counting problem?

Is it legal for company to use my work email to pretend I still work there?

Test whether all array elements are factors of a number

What's the point of deactivating Num Lock on login screens?

What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)

Why was the small council so happy for Tyrion to become the Master of Coin?

Why did Neo believe he could trust the machine when he asked for peace?

Why don't electron-positron collisions release infinite energy?

Prove that NP is closed under karp reduction?

Why can't I see bouncing of a switch on an oscilloscope?

Can divisibility rules for digits be generalized to sum of digits

Is a tag line useful on a cover?

What does "Puller Prush Person" mean?

To string or not to string

"to be prejudice towards/against someone" vs "to be prejudiced against/towards someone"

Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.

Do VLANs within a subnet need to have their own subnet for router on a stick?

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?



Create a Dask Dataframes from a strings representing multilevel dictionaries


Convert a String representation of a Dictionary to a dictionary?Create a dictionary with list comprehension in PythonDelete an element from a dictionaryPythonic way to create a long multi-line stringHow to remove a key from a Python dictionary?Delete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFramepython dask dataframes - concatenate groupby.apply output to a single data frame






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a massive dataset and I am trying to make dask dataframes out of a list of strings



df_.head():



A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...


Note that the column c is a string, so I have to do a literal_eval.



In pandas I did the following:



import ast

for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)


Then, after this, I merge dat with the original dataframe (df_) based on column A.



This process takes forever so I want to do it in dask.



Thanks.










share|improve this question
























  • Hi Daniel, do you mind to provide a mcve?

    – user32185
    Mar 8 at 12:42

















0















I have a massive dataset and I am trying to make dask dataframes out of a list of strings



df_.head():



A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...


Note that the column c is a string, so I have to do a literal_eval.



In pandas I did the following:



import ast

for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)


Then, after this, I merge dat with the original dataframe (df_) based on column A.



This process takes forever so I want to do it in dask.



Thanks.










share|improve this question
























  • Hi Daniel, do you mind to provide a mcve?

    – user32185
    Mar 8 at 12:42













0












0








0








I have a massive dataset and I am trying to make dask dataframes out of a list of strings



df_.head():



A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...


Note that the column c is a string, so I have to do a literal_eval.



In pandas I did the following:



import ast

for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)


Then, after this, I merge dat with the original dataframe (df_) based on column A.



This process takes forever so I want to do it in dask.



Thanks.










share|improve this question
















I have a massive dataset and I am trying to make dask dataframes out of a list of strings



df_.head():



A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...


Note that the column c is a string, so I have to do a literal_eval.



In pandas I did the following:



import ast

for i in range(0,len(df_),1):
df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)


Then, after this, I merge dat with the original dataframe (df_) based on column A.



This process takes forever so I want to do it in dask.



Thanks.







python pandas dictionary dask






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 8 at 4:51









Tiw

4,38961730




4,38961730










asked Mar 8 at 4:04









Daniel FernandezDaniel Fernandez

1




1












  • Hi Daniel, do you mind to provide a mcve?

    – user32185
    Mar 8 at 12:42

















  • Hi Daniel, do you mind to provide a mcve?

    – user32185
    Mar 8 at 12:42
















Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42





Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42












1 Answer
1






active

oldest

votes


















0















dat=pd.concat([dat,b], axis=0, ignore_index=True)




In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.



You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.



You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.






share|improve this answer























  • Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

    – Daniel Fernandez
    Mar 10 at 2:27












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056556%2fcreate-a-dask-dataframes-from-a-strings-representing-multilevel-dictionaries%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0















dat=pd.concat([dat,b], axis=0, ignore_index=True)




In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.



You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.



You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.






share|improve this answer























  • Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

    – Daniel Fernandez
    Mar 10 at 2:27
















0















dat=pd.concat([dat,b], axis=0, ignore_index=True)




In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.



You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.



You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.






share|improve this answer























  • Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

    – Daniel Fernandez
    Mar 10 at 2:27














0












0








0








dat=pd.concat([dat,b], axis=0, ignore_index=True)




In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.



You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.



You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.






share|improve this answer














dat=pd.concat([dat,b], axis=0, ignore_index=True)




In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.



You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.



You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 9 at 23:56









MRocklinMRocklin

27.3k1473131




27.3k1473131












  • Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

    – Daniel Fernandez
    Mar 10 at 2:27


















  • Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

    – Daniel Fernandez
    Mar 10 at 2:27

















Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27






Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056556%2fcreate-a-dask-dataframes-from-a-strings-representing-multilevel-dictionaries%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

1928 у кіно

Захаров Федір Захарович

Ель Греко