Cassandra Data modelling : Timestamp as partition keys The Next CEO of Stack OverflowCassandra: How to use 'ORDER BY' without PRIMARY KEY restricted by EQ or IN?MongoDB vs. CassandraCassandra: choosing a Partition KeyDifference between partition key, composite key and clustering key in Cassandra?Querying Cassandra by a partial partition keyConfusion over data model in cassandraDelete data from Cassandra with part of the partition keyCassandra partition keys organisationDoes using all fields as a partitioning keys in a table a drawback in cassandra?Should every table in Cassandra have a partition key?Performance of token range based queries on partition keys?

What can we do to stop prior company from asking us questions?

What does "Its cash flow is deeply negative" mean?

Anatomically Correct Strange Women In Ponds Distributing Swords

What does this shorthand mean?

How do scammers retract money, while you can’t?

What makes a siege story/plot interesting?

Which organization defines CJK Unified Ideographs?

% symbol leads to superlong (forever?) compilations

Why doesn't a table tennis ball float on the surface? How do we calculate buoyancy here?

How can I open an app using Terminal?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

What's the point of interval inversion?

Does it take more energy to get to Venus or to Mars?

Why here is plural "We went to the movies last night."

Natural language into sentence logic

Horror movie/show or scene where a horse creature opens its mouth really wide and devours a man in a stables

Why did we only see the N-1 starfighters in one film?

Inappropriate reference requests from Journal reviewers

Was a professor correct to chastise me for writing "Prof. X" rather than "Professor X"?

Increase performance creating Mandelbrot set in python

Text adventure game code

How to make a variable always equal to the result of some calculations?

WOW air has ceased operation, can I get my tickets refunded?

How to write the block matrix in LaTex?



Cassandra Data modelling : Timestamp as partition keys



The Next CEO of Stack OverflowCassandra: How to use 'ORDER BY' without PRIMARY KEY restricted by EQ or IN?MongoDB vs. CassandraCassandra: choosing a Partition KeyDifference between partition key, composite key and clustering key in Cassandra?Querying Cassandra by a partial partition keyConfusion over data model in cassandraDelete data from Cassandra with part of the partition keyCassandra partition keys organisationDoes using all fields as a partitioning keys in a table a drawback in cassandra?Should every table in Cassandra have a partition key?Performance of token range based queries on partition keys?










4















I need to be able to return all users that performed an action during a specified interval. The table definition in Cassandra is just below:



create table t ( timestamp from, timestamp to, user text, PRIMARY KEY((from,to), user))


I'm trying to implement the following query in Cassandra:



select * from t WHERE from > :startInterval and to < :toInterval


However, this query will obviously not work because it represents a range query on the partition key, forcing Cassandra to search all nodes in the cluster, defeating its purpose as an efficient database.



Is there an efficient to model this query in Cassandra?



My solution would be to split both timestamps into their corresponding years and months and use those as the partition key. The table would look like this:



 create table t_updated ( yearFrom int, monthFrom int,yearTo int,monthTo int, timestamp from, timestamp to, user text, PRIMARY KEY((yearFrom,monthFrom,yearTo,monthTo), user) )


If i wanted the users that performed the action between Jan 2017 and July 2017 the query would look like the following:



select user from t_updated where yearFrom IN (2017) and monthFrom IN (1,2,3,4,5,6,7) and yearTo IN (2017) and monthTo IN (1,2,3,4,5,6,7)


Would there be a better way to model this query in Cassandra? How would you approach this issue?










share|improve this question


























    4















    I need to be able to return all users that performed an action during a specified interval. The table definition in Cassandra is just below:



    create table t ( timestamp from, timestamp to, user text, PRIMARY KEY((from,to), user))


    I'm trying to implement the following query in Cassandra:



    select * from t WHERE from > :startInterval and to < :toInterval


    However, this query will obviously not work because it represents a range query on the partition key, forcing Cassandra to search all nodes in the cluster, defeating its purpose as an efficient database.



    Is there an efficient to model this query in Cassandra?



    My solution would be to split both timestamps into their corresponding years and months and use those as the partition key. The table would look like this:



     create table t_updated ( yearFrom int, monthFrom int,yearTo int,monthTo int, timestamp from, timestamp to, user text, PRIMARY KEY((yearFrom,monthFrom,yearTo,monthTo), user) )


    If i wanted the users that performed the action between Jan 2017 and July 2017 the query would look like the following:



    select user from t_updated where yearFrom IN (2017) and monthFrom IN (1,2,3,4,5,6,7) and yearTo IN (2017) and monthTo IN (1,2,3,4,5,6,7)


    Would there be a better way to model this query in Cassandra? How would you approach this issue?










    share|improve this question
























      4












      4








      4








      I need to be able to return all users that performed an action during a specified interval. The table definition in Cassandra is just below:



      create table t ( timestamp from, timestamp to, user text, PRIMARY KEY((from,to), user))


      I'm trying to implement the following query in Cassandra:



      select * from t WHERE from > :startInterval and to < :toInterval


      However, this query will obviously not work because it represents a range query on the partition key, forcing Cassandra to search all nodes in the cluster, defeating its purpose as an efficient database.



      Is there an efficient to model this query in Cassandra?



      My solution would be to split both timestamps into their corresponding years and months and use those as the partition key. The table would look like this:



       create table t_updated ( yearFrom int, monthFrom int,yearTo int,monthTo int, timestamp from, timestamp to, user text, PRIMARY KEY((yearFrom,monthFrom,yearTo,monthTo), user) )


      If i wanted the users that performed the action between Jan 2017 and July 2017 the query would look like the following:



      select user from t_updated where yearFrom IN (2017) and monthFrom IN (1,2,3,4,5,6,7) and yearTo IN (2017) and monthTo IN (1,2,3,4,5,6,7)


      Would there be a better way to model this query in Cassandra? How would you approach this issue?










      share|improve this question














      I need to be able to return all users that performed an action during a specified interval. The table definition in Cassandra is just below:



      create table t ( timestamp from, timestamp to, user text, PRIMARY KEY((from,to), user))


      I'm trying to implement the following query in Cassandra:



      select * from t WHERE from > :startInterval and to < :toInterval


      However, this query will obviously not work because it represents a range query on the partition key, forcing Cassandra to search all nodes in the cluster, defeating its purpose as an efficient database.



      Is there an efficient to model this query in Cassandra?



      My solution would be to split both timestamps into their corresponding years and months and use those as the partition key. The table would look like this:



       create table t_updated ( yearFrom int, monthFrom int,yearTo int,monthTo int, timestamp from, timestamp to, user text, PRIMARY KEY((yearFrom,monthFrom,yearTo,monthTo), user) )


      If i wanted the users that performed the action between Jan 2017 and July 2017 the query would look like the following:



      select user from t_updated where yearFrom IN (2017) and monthFrom IN (1,2,3,4,5,6,7) and yearTo IN (2017) and monthTo IN (1,2,3,4,5,6,7)


      Would there be a better way to model this query in Cassandra? How would you approach this issue?







      cassandra cassandra-3.0






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 7 at 14:24









      theAskertheAsker

      498




      498






















          2 Answers
          2






          active

          oldest

          votes


















          1














          The answer depends on the expected number of entries. Thumb rule, is that a partition should not exceed 100mb. So if you expect a moderate number of entries, it would be enough to go with year as partition key.



          We use Week-First-Date as a partition key in a iot scenario, where values get written at most once a minute.






          share|improve this answer























          • I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

            – theAsker
            Mar 7 at 14:52











          • We take the date of the first week of current week. So the data is organised in week partitions

            – Alex Tbk
            Mar 7 at 14:58











          • How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

            – theAsker
            Mar 7 at 15:06











          • When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

            – Alex Tbk
            Mar 7 at 15:08











          • I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

            – theAsker
            Mar 7 at 15:15


















          1














          First, the partition key has to operate on equals operator. It is better to use PRIMARY KEY (BUCKET, TIME_STAMP) here where bucket can be combination of year, month (or include days, hrs etc depending on how big your data set is).



          It is better to execute multiple queries and combine the result in client side.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55046104%2fcassandra-data-modelling-timestamp-as-partition-keys%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The answer depends on the expected number of entries. Thumb rule, is that a partition should not exceed 100mb. So if you expect a moderate number of entries, it would be enough to go with year as partition key.



            We use Week-First-Date as a partition key in a iot scenario, where values get written at most once a minute.






            share|improve this answer























            • I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

              – theAsker
              Mar 7 at 14:52











            • We take the date of the first week of current week. So the data is organised in week partitions

              – Alex Tbk
              Mar 7 at 14:58











            • How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

              – theAsker
              Mar 7 at 15:06











            • When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

              – Alex Tbk
              Mar 7 at 15:08











            • I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

              – theAsker
              Mar 7 at 15:15















            1














            The answer depends on the expected number of entries. Thumb rule, is that a partition should not exceed 100mb. So if you expect a moderate number of entries, it would be enough to go with year as partition key.



            We use Week-First-Date as a partition key in a iot scenario, where values get written at most once a minute.






            share|improve this answer























            • I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

              – theAsker
              Mar 7 at 14:52











            • We take the date of the first week of current week. So the data is organised in week partitions

              – Alex Tbk
              Mar 7 at 14:58











            • How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

              – theAsker
              Mar 7 at 15:06











            • When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

              – Alex Tbk
              Mar 7 at 15:08











            • I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

              – theAsker
              Mar 7 at 15:15













            1












            1








            1







            The answer depends on the expected number of entries. Thumb rule, is that a partition should not exceed 100mb. So if you expect a moderate number of entries, it would be enough to go with year as partition key.



            We use Week-First-Date as a partition key in a iot scenario, where values get written at most once a minute.






            share|improve this answer













            The answer depends on the expected number of entries. Thumb rule, is that a partition should not exceed 100mb. So if you expect a moderate number of entries, it would be enough to go with year as partition key.



            We use Week-First-Date as a partition key in a iot scenario, where values get written at most once a minute.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 7 at 14:42









            Alex TbkAlex Tbk

            636724




            636724












            • I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

              – theAsker
              Mar 7 at 14:52











            • We take the date of the first week of current week. So the data is organised in week partitions

              – Alex Tbk
              Mar 7 at 14:58











            • How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

              – theAsker
              Mar 7 at 15:06











            • When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

              – Alex Tbk
              Mar 7 at 15:08











            • I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

              – theAsker
              Mar 7 at 15:15

















            • I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

              – theAsker
              Mar 7 at 14:52











            • We take the date of the first week of current week. So the data is organised in week partitions

              – Alex Tbk
              Mar 7 at 14:58











            • How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

              – theAsker
              Mar 7 at 15:06











            • When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

              – Alex Tbk
              Mar 7 at 15:08











            • I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

              – theAsker
              Mar 7 at 15:15
















            I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

            – theAsker
            Mar 7 at 14:52





            I think there could be several millions entries in the table, maybe even tens of millions. What do you mean by Week-First-Date as partition key? Thanks.

            – theAsker
            Mar 7 at 14:52













            We take the date of the first week of current week. So the data is organised in week partitions

            – Alex Tbk
            Mar 7 at 14:58





            We take the date of the first week of current week. So the data is organised in week partitions

            – Alex Tbk
            Mar 7 at 14:58













            How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

            – theAsker
            Mar 7 at 15:06





            How did you query the data if you needed to perform range queries on the partition key? Did you use a similar query with the one in my example using the IN operator?

            – theAsker
            Mar 7 at 15:06













            When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

            – Alex Tbk
            Mar 7 at 15:08





            When I query data by range, I calculate the matching partition key. So you iterate over the wanted period and create a list of queries that you send in the end. That has the advantage that the coordinator has to process small batches of data instead a large one

            – Alex Tbk
            Mar 7 at 15:08













            I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

            – theAsker
            Mar 7 at 15:15





            I understand your point of view. However, this has the drawback that the number of queries is proportional with how large the period of time you give as input is.

            – theAsker
            Mar 7 at 15:15













            1














            First, the partition key has to operate on equals operator. It is better to use PRIMARY KEY (BUCKET, TIME_STAMP) here where bucket can be combination of year, month (or include days, hrs etc depending on how big your data set is).



            It is better to execute multiple queries and combine the result in client side.






            share|improve this answer



























              1














              First, the partition key has to operate on equals operator. It is better to use PRIMARY KEY (BUCKET, TIME_STAMP) here where bucket can be combination of year, month (or include days, hrs etc depending on how big your data set is).



              It is better to execute multiple queries and combine the result in client side.






              share|improve this answer

























                1












                1








                1







                First, the partition key has to operate on equals operator. It is better to use PRIMARY KEY (BUCKET, TIME_STAMP) here where bucket can be combination of year, month (or include days, hrs etc depending on how big your data set is).



                It is better to execute multiple queries and combine the result in client side.






                share|improve this answer













                First, the partition key has to operate on equals operator. It is better to use PRIMARY KEY (BUCKET, TIME_STAMP) here where bucket can be combination of year, month (or include days, hrs etc depending on how big your data set is).



                It is better to execute multiple queries and combine the result in client side.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 7 at 17:36









                MathanMathan

                111




                111



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55046104%2fcassandra-data-modelling-timestamp-as-partition-keys%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

                    Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

                    Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved