How to update the shape, chunks and chunksize metadata of a dask array with nan dimensions2019 Community Moderator ElectionCorrect choice of chunks-specification for dask arrayscan function in theano, recurrent neural netCompute sum of the elements in a chunk of a dask arraydask.async.MemoryError on to_csvValueError: Unknown label type: 'unknown'ValueError: cannot reshape array of size 0 into shape (1,256,256,6)“ValueError: Not a location id (Invalid object id)” while creating HDF5 datasetsValueError: Input contains NaN, infinity or a value too large for dtype('float32')Can I create a dask array with a delayed shapeHow to chunk dask array with unknown chunks
Should I tell my boss the work he did was worthless
How do I locate a classical quotation?
Examples of a statistic that is not independent of sample's distribution?
PTIJ: How can I halachically kill a vampire?
Force user to remove USB token
How do you like my writing?
Why doesn't this Google Translate ad use the word "Translation" instead of "Translate"?
Is having access to past exams cheating and, if yes, could it be proven just by a good grade?
Does "variables should live in the smallest scope as possible" include the case "variables should not exist if possible"?
A three room house but a three headED dog
They call me Inspector Morse
Subset counting for even numbers
How to create a hard link to an inode (ext4)?
Why the color red for the Republican Party
A question on the ultrafilter number
Why does Deadpool say "You're welcome, Canada," after shooting Ryan Reynolds in the end credits?
Should I take out a loan for a friend to invest on my behalf?
What do you call the air that rushes into your car in the highway?
How do I express some one as a black person?
Why don't MCU characters ever seem to have language issues?
How does airport security verify that you can carry a battery bank over 100 Wh?
Who deserves to be first and second author? PhD student who collected data, research associate who wrote the paper or supervisor?
Is Gradient Descent central to every optimizer?
Append a note to one of three files based on user choice
How to update the shape, chunks and chunksize metadata of a dask array with nan dimensions
2019 Community Moderator ElectionCorrect choice of chunks-specification for dask arrayscan function in theano, recurrent neural netCompute sum of the elements in a chunk of a dask arraydask.async.MemoryError on to_csvValueError: Unknown label type: 'unknown'ValueError: cannot reshape array of size 0 into shape (1,256,256,6)“ValueError: Not a location id (Invalid object id)” while creating HDF5 datasetsValueError: Input contains NaN, infinity or a value too large for dtype('float32')Can I create a dask array with a delayed shapeHow to chunk dask array with unknown chunks
Suppose I generate an array with a shape that depends on some computation, such as:
>>> import dask.array as da
>>> a = da.random.normal(size=(int(1e6), 10))
>>> a = a[a.mean(axis=1) > 0]
>>> a.shape
(nan, 10)
>>> a.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a.chunksize
(nan, 10)
The nan
are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:
>>> a_persisted = a.persist()
>>> a_persisted.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a_persisted.chunksize
(nan, 10)
>>> a_persisted.shape
(nan, 10)
If I try to force a rechunk I get:
>>> a_persisted.rechunk("auto")
Traceback (most recent call last):
File "<ipython-input-26-31162de022a0>", line 1, in <module>
a_persisted.rechunk("auto")
File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
return rechunk(self, chunks, threshold, block_size_limit)
File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
dtype=x.dtype, previous_chunks=x.chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
raise ValueError("Can not perform automatic rechunking with unknown "
ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes
What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?
I can compute them very cheaply with:
>>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)
My question is how to get a new dask array backed by the same chunks with some informative .shape
, .chunk
and .chunksize
attributes (with no nans).
>>> dask.__version__
'1.1.0+9.gb1fef05'
python dask
add a comment |
Suppose I generate an array with a shape that depends on some computation, such as:
>>> import dask.array as da
>>> a = da.random.normal(size=(int(1e6), 10))
>>> a = a[a.mean(axis=1) > 0]
>>> a.shape
(nan, 10)
>>> a.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a.chunksize
(nan, 10)
The nan
are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:
>>> a_persisted = a.persist()
>>> a_persisted.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a_persisted.chunksize
(nan, 10)
>>> a_persisted.shape
(nan, 10)
If I try to force a rechunk I get:
>>> a_persisted.rechunk("auto")
Traceback (most recent call last):
File "<ipython-input-26-31162de022a0>", line 1, in <module>
a_persisted.rechunk("auto")
File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
return rechunk(self, chunks, threshold, block_size_limit)
File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
dtype=x.dtype, previous_chunks=x.chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
raise ValueError("Can not perform automatic rechunking with unknown "
ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes
What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?
I can compute them very cheaply with:
>>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)
My question is how to get a new dask array backed by the same chunks with some informative .shape
, .chunk
and .chunksize
attributes (with no nans).
>>> dask.__version__
'1.1.0+9.gb1fef05'
python dask
add a comment |
Suppose I generate an array with a shape that depends on some computation, such as:
>>> import dask.array as da
>>> a = da.random.normal(size=(int(1e6), 10))
>>> a = a[a.mean(axis=1) > 0]
>>> a.shape
(nan, 10)
>>> a.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a.chunksize
(nan, 10)
The nan
are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:
>>> a_persisted = a.persist()
>>> a_persisted.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a_persisted.chunksize
(nan, 10)
>>> a_persisted.shape
(nan, 10)
If I try to force a rechunk I get:
>>> a_persisted.rechunk("auto")
Traceback (most recent call last):
File "<ipython-input-26-31162de022a0>", line 1, in <module>
a_persisted.rechunk("auto")
File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
return rechunk(self, chunks, threshold, block_size_limit)
File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
dtype=x.dtype, previous_chunks=x.chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
raise ValueError("Can not perform automatic rechunking with unknown "
ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes
What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?
I can compute them very cheaply with:
>>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)
My question is how to get a new dask array backed by the same chunks with some informative .shape
, .chunk
and .chunksize
attributes (with no nans).
>>> dask.__version__
'1.1.0+9.gb1fef05'
python dask
Suppose I generate an array with a shape that depends on some computation, such as:
>>> import dask.array as da
>>> a = da.random.normal(size=(int(1e6), 10))
>>> a = a[a.mean(axis=1) > 0]
>>> a.shape
(nan, 10)
>>> a.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a.chunksize
(nan, 10)
The nan
are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:
>>> a_persisted = a.persist()
>>> a_persisted.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a_persisted.chunksize
(nan, 10)
>>> a_persisted.shape
(nan, 10)
If I try to force a rechunk I get:
>>> a_persisted.rechunk("auto")
Traceback (most recent call last):
File "<ipython-input-26-31162de022a0>", line 1, in <module>
a_persisted.rechunk("auto")
File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
return rechunk(self, chunks, threshold, block_size_limit)
File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
dtype=x.dtype, previous_chunks=x.chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
raise ValueError("Can not perform automatic rechunking with unknown "
ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes
What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?
I can compute them very cheaply with:
>>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)
My question is how to get a new dask array backed by the same chunks with some informative .shape
, .chunk
and .chunksize
attributes (with no nans).
>>> dask.__version__
'1.1.0+9.gb1fef05'
python dask
python dask
asked Feb 28 at 14:45
ogriselogrisel
29.1k108095
29.1k108095
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.
Edit: this is tracked here: https://github.com/dask/dask/issues/3293
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
add a comment |
Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:
import dask.array as da
import dask.dataframe as dd
a = da.random.normal(size=(int(1e6), 10))
a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
print(a.chunks)
print(a.shape)
((100068, 100157, 100279, 100446, 99706), (10,))
(500656, 10)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54928233%2fhow-to-update-the-shape-chunks-and-chunksize-metadata-of-a-dask-array-with-nan%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.
Edit: this is tracked here: https://github.com/dask/dask/issues/3293
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
add a comment |
There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.
Edit: this is tracked here: https://github.com/dask/dask/issues/3293
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
add a comment |
There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.
Edit: this is tracked here: https://github.com/dask/dask/issues/3293
There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.
Edit: this is tracked here: https://github.com/dask/dask/issues/3293
answered Mar 6 at 16:14
MRocklinMRocklin
26.8k1471129
26.8k1471129
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
add a comment |
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
Thanks, I was too slow to follow up.
– ogrisel
Mar 7 at 17:43
add a comment |
Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:
import dask.array as da
import dask.dataframe as dd
a = da.random.normal(size=(int(1e6), 10))
a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
print(a.chunks)
print(a.shape)
((100068, 100157, 100279, 100446, 99706), (10,))
(500656, 10)
add a comment |
Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:
import dask.array as da
import dask.dataframe as dd
a = da.random.normal(size=(int(1e6), 10))
a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
print(a.chunks)
print(a.shape)
((100068, 100157, 100279, 100446, 99706), (10,))
(500656, 10)
add a comment |
Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:
import dask.array as da
import dask.dataframe as dd
a = da.random.normal(size=(int(1e6), 10))
a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
print(a.chunks)
print(a.shape)
((100068, 100157, 100279, 100446, 99706), (10,))
(500656, 10)
Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:
import dask.array as da
import dask.dataframe as dd
a = da.random.normal(size=(int(1e6), 10))
a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
print(a.chunks)
print(a.shape)
((100068, 100157, 100279, 100446, 99706), (10,))
(500656, 10)
answered Mar 8 at 18:55
Rowan_GaffneyRowan_Gaffney
8010
8010
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54928233%2fhow-to-update-the-shape-chunks-and-chunksize-metadata-of-a-dask-array-with-nan%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown