Create a Dask Dataframes from a strings representing multilevel dictionariesConvert a String representation of a Dictionary to a dictionary?Create a dictionary with list comprehension in PythonDelete an element from a dictionaryPythonic way to create a long multi-line stringHow to remove a key from a Python dictionary?Delete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFramepython dask dataframes - concatenate groupby.apply output to a single data frame

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

Can I ask the recruiters in my resume to put the reason why I am rejected?

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

Collect Fourier series terms

Languages that we cannot (dis)prove to be Context-Free

What does it mean to describe someone as a butt steak?

Why do I get two different answers for this counting problem?

Is it legal for company to use my work email to pretend I still work there?

Test whether all array elements are factors of a number

What's the point of deactivating Num Lock on login screens?

What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)

Why was the small council so happy for Tyrion to become the Master of Coin?

Why did Neo believe he could trust the machine when he asked for peace?

Why don't electron-positron collisions release infinite energy?

Prove that NP is closed under karp reduction?

Why can't I see bouncing of a switch on an oscilloscope?

Can divisibility rules for digits be generalized to sum of digits

Is a tag line useful on a cover?

What does "Puller Prush Person" mean?

To string or not to string

"to be prejudice towards/against someone" vs "to be prejudiced against/towards someone"

Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.

Do VLANs within a subnet need to have their own subnet for router on a stick?

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

Create a Dask Dataframes from a strings representing multilevel dictionaries

Convert a String representation of a Dictionary to a dictionary?Create a dictionary with list comprehension in PythonDelete an element from a dictionaryPythonic way to create a long multi-line stringHow to remove a key from a Python dictionary?Delete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFramepython dask dataframes - concatenate groupby.apply output to a single data frame

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a massive dataset and I am trying to make dask dataframes out of a list of strings

df_.head():

A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...

Note that the column c is a string, so I have to do a literal_eval.

In pandas I did the following:

import ast

for i in range(0,len(df_),1):
 df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)

Then, after this, I merge dat with the original dataframe (df_) based on column A.

This process takes forever so I want to do it in dask.

Thanks.

edited Mar 8 at 4:51

Tiw

4,38961730

asked Mar 8 at 4:04

Daniel Fernandez

Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42

add a comment |

I have a massive dataset and I am trying to make dask dataframes out of a list of strings

df_.head():

A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...

Note that the column c is a string, so I have to do a literal_eval.

In pandas I did the following:

import ast

for i in range(0,len(df_),1):
 df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)

Then, after this, I merge dat with the original dataframe (df_) based on column A.

This process takes forever so I want to do it in dask.

Thanks.

edited Mar 8 at 4:51

Tiw

4,38961730

asked Mar 8 at 4:04

Daniel Fernandez

Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42

add a comment |

I have a massive dataset and I am trying to make dask dataframes out of a list of strings

df_.head():

A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...

Note that the column c is a string, so I have to do a literal_eval.

In pandas I did the following:

import ast

for i in range(0,len(df_),1):
 df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)

Then, after this, I merge dat with the original dataframe (df_) based on column A.

This process takes forever so I want to do it in dask.

Thanks.

edited Mar 8 at 4:51

Tiw

4,38961730

asked Mar 8 at 4:04

Daniel Fernandez

I have a massive dataset and I am trying to make dask dataframes out of a list of strings

df_.head():

A | B | C
----------------------------------------
1 | "a:1, b:2, c:3, d:5" | 4
2 | "a:5, b:2, c:3, d:0" | 7
...

Note that the column c is a string, so I have to do a literal_eval.

In pandas I did the following:

import ast

for i in range(0,len(df_),1):
 df_.at[i,'B'] = ast.literal_eval(df_.iloc[i,2])

dat = pd.DataFrame()
for i in range(len(df_)):
#Makes the list of dicts into a dataframe
b = pd.DataFrame(df_.iloc[i,2])
#Keeps track of row number
b['A']=i
#Concat with master DF
dat=pd.concat([dat,b], axis=0, ignore_index=True)

Then, after this, I merge dat with the original dataframe (df_) based on column A.

This process takes forever so I want to do it in dask.

Thanks.

python pandas dictionary dask

edited Mar 8 at 4:51

Tiw

4,38961730

asked Mar 8 at 4:04

Daniel Fernandez

edited Mar 8 at 4:51

Tiw

4,38961730

asked Mar 8 at 4:04

Daniel Fernandez

edited Mar 8 at 4:51

Tiw

4,38961730

edited Mar 8 at 4:51

Tiw

4,38961730

edited Mar 8 at 4:51

Tiw

4,38961730

asked Mar 8 at 4:04

Daniel Fernandez

asked Mar 8 at 4:04

Daniel Fernandez

asked Mar 8 at 4:04

Daniel Fernandez

Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42

add a comment |

Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42

Hi Daniel, do you mind to provide a mcve?

– user32185
Mar 8 at 12:42

add a comment |

1 Answer
1

active

oldest

votes

dat=pd.concat([dat,b], axis=0, ignore_index=True)

In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.

You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.

You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.

answered Mar 9 at 23:56

MRocklin

27.3k1473131

Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056556%2fcreate-a-dask-dataframes-from-a-strings-representing-multilevel-dictionaries%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

dat=pd.concat([dat,b], axis=0, ignore_index=True)

In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.

You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.

You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.

answered Mar 9 at 23:56

MRocklin

27.3k1473131

Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27

add a comment |

dat=pd.concat([dat,b], axis=0, ignore_index=True)

In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.

You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.

You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.

answered Mar 9 at 23:56

MRocklin

27.3k1473131

Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27

add a comment |

dat=pd.concat([dat,b], axis=0, ignore_index=True)

In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.

You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.

You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.

answered Mar 9 at 23:56

MRocklin

27.3k1473131

dat=pd.concat([dat,b], axis=0, ignore_index=True)

In this line you repeatedly allocate a new Pandas dataframe of increasing size. Recreating your dataframe every iteration is likely very very slow.

You might instead try using a Pandas operation like map or apply to do this operation to your input dataframe all at once.

You probably don't need Dask here. It's better to start with simpler optimizations like the one listed above, before bringing in the extra complexity of parallel computing.

answered Mar 9 at 23:56

MRocklin

27.3k1473131

answered Mar 9 at 23:56

MRocklin

27.3k1473131

answered Mar 9 at 23:56

MRocklin

27.3k1473131

answered Mar 9 at 23:56

MRocklin

27.3k1473131

Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27

add a comment |

Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27

Thank you MRoklin. Actually, the most expensive part is this one: ` #Makes the list of dicts into a dataframe` b = pd.DataFrame(df_.iloc[i,1]) #Keeps track of row number b['A']=i And I am doing it this way because I want to add the identifier to b, because all the dictionaries in df_.B have different lengths.

– Daniel Fernandez
Mar 10 at 2:27

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer
1

1 Answer
1

1 Answer
1