How to update the shape, chunks and chunksize metadata of a dask array with nan dimensions2019 Community Moderator ElectionCorrect choice of chunks-specification for dask arrayscan function in theano, recurrent neural netCompute sum of the elements in a chunk of a dask arraydask.async.MemoryError on to_csvValueError: Unknown label type: 'unknown'ValueError: cannot reshape array of size 0 into shape (1,256,256,6)“ValueError: Not a location id (Invalid object id)” while creating HDF5 datasetsValueError: Input contains NaN, infinity or a value too large for dtype('float32')Can I create a dask array with a delayed shapeHow to chunk dask array with unknown chunks

Should I tell my boss the work he did was worthless

How do I locate a classical quotation?

Examples of a statistic that is not independent of sample's distribution?

PTIJ: How can I halachically kill a vampire?

Force user to remove USB token

How do you like my writing?

Why doesn't this Google Translate ad use the word "Translation" instead of "Translate"?

Is having access to past exams cheating and, if yes, could it be proven just by a good grade?

Does "variables should live in the smallest scope as possible" include the case "variables should not exist if possible"?

A three room house but a three headED dog

They call me Inspector Morse

Subset counting for even numbers

How to create a hard link to an inode (ext4)?

Why the color red for the Republican Party

A question on the ultrafilter number

Why does Deadpool say "You're welcome, Canada," after shooting Ryan Reynolds in the end credits?

Should I take out a loan for a friend to invest on my behalf?

What do you call the air that rushes into your car in the highway?

How do I express some one as a black person?

Why don't MCU characters ever seem to have language issues?

How does airport security verify that you can carry a battery bank over 100 Wh?

Who deserves to be first and second author? PhD student who collected data, research associate who wrote the paper or supervisor?

Is Gradient Descent central to every optimizer?

Append a note to one of three files based on user choice



How to update the shape, chunks and chunksize metadata of a dask array with nan dimensions



2019 Community Moderator ElectionCorrect choice of chunks-specification for dask arrayscan function in theano, recurrent neural netCompute sum of the elements in a chunk of a dask arraydask.async.MemoryError on to_csvValueError: Unknown label type: 'unknown'ValueError: cannot reshape array of size 0 into shape (1,256,256,6)“ValueError: Not a location id (Invalid object id)” while creating HDF5 datasetsValueError: Input contains NaN, infinity or a value too large for dtype('float32')Can I create a dask array with a delayed shapeHow to chunk dask array with unknown chunks










3















Suppose I generate an array with a shape that depends on some computation, such as:



>>> import dask.array as da
>>> a = da.random.normal(size=(int(1e6), 10))
>>> a = a[a.mean(axis=1) > 0]
>>> a.shape
(nan, 10)
>>> a.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a.chunksize
(nan, 10)


The nan are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:



>>> a_persisted = a.persist()
>>> a_persisted.chunks
((nan, nan, nan, nan, nan), (10,))
>>> a_persisted.chunksize
(nan, 10)
>>> a_persisted.shape
(nan, 10)


If I try to force a rechunk I get:



>>> a_persisted.rechunk("auto")
Traceback (most recent call last):
File "<ipython-input-26-31162de022a0>", line 1, in <module>
a_persisted.rechunk("auto")
File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
return rechunk(self, chunks, threshold, block_size_limit)
File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
dtype=x.dtype, previous_chunks=x.chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
raise ValueError("Can not perform automatic rechunking with unknown "
ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes



What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?



I can compute them very cheaply with:



>>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)


My question is how to get a new dask array backed by the same chunks with some informative .shape, .chunk and .chunksize attributes (with no nans).



>>> dask.__version__
'1.1.0+9.gb1fef05'









share|improve this question


























    3















    Suppose I generate an array with a shape that depends on some computation, such as:



    >>> import dask.array as da
    >>> a = da.random.normal(size=(int(1e6), 10))
    >>> a = a[a.mean(axis=1) > 0]
    >>> a.shape
    (nan, 10)
    >>> a.chunks
    ((nan, nan, nan, nan, nan), (10,))
    >>> a.chunksize
    (nan, 10)


    The nan are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:



    >>> a_persisted = a.persist()
    >>> a_persisted.chunks
    ((nan, nan, nan, nan, nan), (10,))
    >>> a_persisted.chunksize
    (nan, 10)
    >>> a_persisted.shape
    (nan, 10)


    If I try to force a rechunk I get:



    >>> a_persisted.rechunk("auto")
    Traceback (most recent call last):
    File "<ipython-input-26-31162de022a0>", line 1, in <module>
    a_persisted.rechunk("auto")
    File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
    return rechunk(self, chunks, threshold, block_size_limit)
    File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
    dtype=x.dtype, previous_chunks=x.chunks)
    File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
    chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
    File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
    raise ValueError("Can not perform automatic rechunking with unknown "
    ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes



    What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?



    I can compute them very cheaply with:



    >>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
    ([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)


    My question is how to get a new dask array backed by the same chunks with some informative .shape, .chunk and .chunksize attributes (with no nans).



    >>> dask.__version__
    '1.1.0+9.gb1fef05'









    share|improve this question
























      3












      3








      3


      1






      Suppose I generate an array with a shape that depends on some computation, such as:



      >>> import dask.array as da
      >>> a = da.random.normal(size=(int(1e6), 10))
      >>> a = a[a.mean(axis=1) > 0]
      >>> a.shape
      (nan, 10)
      >>> a.chunks
      ((nan, nan, nan, nan, nan), (10,))
      >>> a.chunksize
      (nan, 10)


      The nan are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:



      >>> a_persisted = a.persist()
      >>> a_persisted.chunks
      ((nan, nan, nan, nan, nan), (10,))
      >>> a_persisted.chunksize
      (nan, 10)
      >>> a_persisted.shape
      (nan, 10)


      If I try to force a rechunk I get:



      >>> a_persisted.rechunk("auto")
      Traceback (most recent call last):
      File "<ipython-input-26-31162de022a0>", line 1, in <module>
      a_persisted.rechunk("auto")
      File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
      return rechunk(self, chunks, threshold, block_size_limit)
      File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
      dtype=x.dtype, previous_chunks=x.chunks)
      File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
      chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
      File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
      raise ValueError("Can not perform automatic rechunking with unknown "
      ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes



      What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?



      I can compute them very cheaply with:



      >>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
      ([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)


      My question is how to get a new dask array backed by the same chunks with some informative .shape, .chunk and .chunksize attributes (with no nans).



      >>> dask.__version__
      '1.1.0+9.gb1fef05'









      share|improve this question














      Suppose I generate an array with a shape that depends on some computation, such as:



      >>> import dask.array as da
      >>> a = da.random.normal(size=(int(1e6), 10))
      >>> a = a[a.mean(axis=1) > 0]
      >>> a.shape
      (nan, 10)
      >>> a.chunks
      ((nan, nan, nan, nan, nan), (10,))
      >>> a.chunksize
      (nan, 10)


      The nan are expected. When I persist the result of the computation on the dask workers, I would assume that this missing metadata could have been retrieved but apparently this is not the case:



      >>> a_persisted = a.persist()
      >>> a_persisted.chunks
      ((nan, nan, nan, nan, nan), (10,))
      >>> a_persisted.chunksize
      (nan, 10)
      >>> a_persisted.shape
      (nan, 10)


      If I try to force a rechunk I get:



      >>> a_persisted.rechunk("auto")
      Traceback (most recent call last):
      File "<ipython-input-26-31162de022a0>", line 1, in <module>
      a_persisted.rechunk("auto")
      File "/home/ogrisel/code/dask/dask/array/core.py", line 1647, in rechunk
      return rechunk(self, chunks, threshold, block_size_limit)
      File "/home/ogrisel/code/dask/dask/array/rechunk.py", line 226, in rechunk
      dtype=x.dtype, previous_chunks=x.chunks)
      File "/home/ogrisel/code/dask/dask/array/core.py", line 1872, in normalize_chunks
      chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
      File "/home/ogrisel/code/dask/dask/array/core.py", line 1949, in auto_chunks
      raise ValueError("Can not perform automatic rechunking with unknown "
      ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes



      What is the idiomatic way to update the metadata of my array with the actual size of the chunks that have already been computed on the worker?



      I can compute them very cheaply with:



      >>> dask.compute([chunk.shape for chunk in a_persisted.to_delayed().ravel()])
      ([(100108, 10), (99944, 10), (99545, 10), (99826, 10), (100099, 10)],)


      My question is how to get a new dask array backed by the same chunks with some informative .shape, .chunk and .chunksize attributes (with no nans).



      >>> dask.__version__
      '1.1.0+9.gb1fef05'






      python dask






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 28 at 14:45









      ogriselogrisel

      29.1k108095




      29.1k108095






















          2 Answers
          2






          active

          oldest

          votes


















          1














          There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.



          Edit: this is tracked here: https://github.com/dask/dask/issues/3293






          share|improve this answer























          • Thanks, I was too slow to follow up.

            – ogrisel
            Mar 7 at 17:43


















          0














          Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:



          import dask.array as da
          import dask.dataframe as dd
          a = da.random.normal(size=(int(1e6), 10))
          a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
          print(a.chunks)
          print(a.shape)

          ((100068, 100157, 100279, 100446, 99706), (10,))
          (500656, 10)





          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54928233%2fhow-to-update-the-shape-chunks-and-chunksize-metadata-of-a-dask-array-with-nan%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.



            Edit: this is tracked here: https://github.com/dask/dask/issues/3293






            share|improve this answer























            • Thanks, I was too slow to follow up.

              – ogrisel
              Mar 7 at 17:43















            1














            There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.



            Edit: this is tracked here: https://github.com/dask/dask/issues/3293






            share|improve this answer























            • Thanks, I was too slow to follow up.

              – ogrisel
              Mar 7 at 17:43













            1












            1








            1







            There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.



            Edit: this is tracked here: https://github.com/dask/dask/issues/3293






            share|improve this answer













            There isn't a good solution to this today, but there could be. I recommend raising an issue if one doesn't exist already. This is a commonly requested feature.



            Edit: this is tracked here: https://github.com/dask/dask/issues/3293







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 6 at 16:14









            MRocklinMRocklin

            26.8k1471129




            26.8k1471129












            • Thanks, I was too slow to follow up.

              – ogrisel
              Mar 7 at 17:43

















            • Thanks, I was too slow to follow up.

              – ogrisel
              Mar 7 at 17:43
















            Thanks, I was too slow to follow up.

            – ogrisel
            Mar 7 at 17:43





            Thanks, I was too slow to follow up.

            – ogrisel
            Mar 7 at 17:43













            0














            Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:



            import dask.array as da
            import dask.dataframe as dd
            a = da.random.normal(size=(int(1e6), 10))
            a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
            print(a.chunks)
            print(a.shape)

            ((100068, 100157, 100279, 100446, 99706), (10,))
            (500656, 10)





            share|improve this answer



























              0














              Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:



              import dask.array as da
              import dask.dataframe as dd
              a = da.random.normal(size=(int(1e6), 10))
              a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
              print(a.chunks)
              print(a.shape)

              ((100068, 100157, 100279, 100446, 99706), (10,))
              (500656, 10)





              share|improve this answer

























                0












                0








                0







                Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:



                import dask.array as da
                import dask.dataframe as dd
                a = da.random.normal(size=(int(1e6), 10))
                a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
                print(a.chunks)
                print(a.shape)

                ((100068, 100157, 100279, 100446, 99706), (10,))
                (500656, 10)





                share|improve this answer













                Looks like this will soon be solved internally in dask array (https://github.com/dask/dask/issues/3293). Until then, here is the workaround I use:



                import dask.array as da
                import dask.dataframe as dd
                a = da.random.normal(size=(int(1e6), 10))
                a = dd.from_dask_array(a[a.mean(axis=1) >0],columns=np.arange(a.shape[1])).to_dask_array(lengths=True).persist()
                print(a.chunks)
                print(a.shape)

                ((100068, 100157, 100279, 100446, 99706), (10,))
                (500656, 10)






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 8 at 18:55









                Rowan_GaffneyRowan_Gaffney

                8010




                8010



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54928233%2fhow-to-update-the-shape-chunks-and-chunksize-metadata-of-a-dask-array-with-nan%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    1928 у кіно

                    Захаров Федір Захарович

                    Ель Греко