Find the number of times a combination occurs in a numpy 2D array The Next CEO of Stack OverflowBest way to find if an item is in a JavaScript array?Is there a NumPy function to return the first index of something in an array?How to find the sum of an array of numbersHow to sum array of numbers in Ruby?How to print the full NumPy array, without truncation?Find nearest value in numpy arrayNumpy array dimensionsHow to access the ith column of a NumPy multidimensional array?Dump a NumPy array into a csv fileFind object by id in an array of JavaScript objects

Why don't programming languages automatically manage the synchronous/asynchronous problem?

Interfacing a button to MCU (and PC) with 50m long cable

How to avoid supervisors with prejudiced views?

Why do airplanes bank sharply to the right after air-to-air refueling?

How to count occurrences of text in a file?

Is "for causing autism in X" grammatical?

Would a galaxy be visible from outside, but nearby?

How did the Bene Gesserit know how to make a Kwisatz Haderach?

Is it possible to search for a directory/file combination?

How do we know the LHC results are robust?

How to Reset Passwords on Multiple Websites Easily?

Anatomically Correct Strange Women In Ponds Distributing Swords

How does the mv command work with external drives?

What can we do to stop prior company from asking us questions?

To not tell, not take, and not want

Rotate a column

Why do we use the plural of movies in this phrase "We went to the movies last night."?

Contours of a clandestine nature

Skipping indices in a product

Sending manuscript to multiple publishers

Is it professional to write unrelated content in an almost-empty email?

Why do professional authors make "consistency" mistakes? And how to avoid them?

Is there a way to save my career from absolute disaster?

WOW air has ceased operation, can I get my tickets refunded?



Find the number of times a combination occurs in a numpy 2D array



The Next CEO of Stack OverflowBest way to find if an item is in a JavaScript array?Is there a NumPy function to return the first index of something in an array?How to find the sum of an array of numbersHow to sum array of numbers in Ruby?How to print the full NumPy array, without truncation?Find nearest value in numpy arrayNumpy array dimensionsHow to access the ith column of a NumPy multidimensional array?Dump a NumPy array into a csv fileFind object by id in an array of JavaScript objects










2















I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :



import numpy as np
#@profile
def myfunc(arr1,arr2):
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
pdt = len(unique_arr1)*len(unique_arr2)
count = np.zeros(pdt).astype(int)

## getting the number of possible combinations and storing them in arr1_n and arr2_n
if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
arr1_n = unique_arr1.repeat(len(unique_arr2))
arr2_n = np.tile(unique_arr2,len(unique_arr1))
## Finding the number of times a particular combination has occured
for i in np.arange(0,pdt):
pos1 = np.where(arr1==arr1_n[i])[0]
pos2 = np.where(arr2==arr2_n[i])[0]
count[i] = len(np.intersect1d(pos1,pos2))
return count

np.random.seed(1)
myarr = np.random.randint(20,size=(80000,4))
a = myfunc(myarr[:,1],myarr[:,2])


The following is the profiling results when i run line_profiler on this code.



Timer unit: 1e-06 s



Total time: 18.1849 s
File: testcode3.py
Function: myfunc at line 2



Line # Hits Time Per Hit % Time Line Contents



 2 @profile
3 def myfunc(arr1,arr2):
4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
8
9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
16 1 4.0 4.0 0.0 return count


As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
In future I will have to work with real data much larger than this one, hence I need to optimize this code.










share|improve this question


























    2















    I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :



    import numpy as np
    #@profile
    def myfunc(arr1,arr2):
    unique_arr1 = np.unique(arr1)
    unique_arr2 = np.unique(arr2)
    pdt = len(unique_arr1)*len(unique_arr2)
    count = np.zeros(pdt).astype(int)

    ## getting the number of possible combinations and storing them in arr1_n and arr2_n
    if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
    arr1_n = unique_arr1.repeat(len(unique_arr2))
    arr2_n = np.tile(unique_arr2,len(unique_arr1))
    ## Finding the number of times a particular combination has occured
    for i in np.arange(0,pdt):
    pos1 = np.where(arr1==arr1_n[i])[0]
    pos2 = np.where(arr2==arr2_n[i])[0]
    count[i] = len(np.intersect1d(pos1,pos2))
    return count

    np.random.seed(1)
    myarr = np.random.randint(20,size=(80000,4))
    a = myfunc(myarr[:,1],myarr[:,2])


    The following is the profiling results when i run line_profiler on this code.



    Timer unit: 1e-06 s



    Total time: 18.1849 s
    File: testcode3.py
    Function: myfunc at line 2



    Line # Hits Time Per Hit % Time Line Contents



     2 @profile
    3 def myfunc(arr1,arr2):
    4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
    5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
    6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
    7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
    8
    9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
    10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
    11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
    12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
    13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
    14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
    15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
    16 1 4.0 4.0 0.0 return count


    As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
    In future I will have to work with real data much larger than this one, hence I need to optimize this code.










    share|improve this question
























      2












      2








      2








      I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :



      import numpy as np
      #@profile
      def myfunc(arr1,arr2):
      unique_arr1 = np.unique(arr1)
      unique_arr2 = np.unique(arr2)
      pdt = len(unique_arr1)*len(unique_arr2)
      count = np.zeros(pdt).astype(int)

      ## getting the number of possible combinations and storing them in arr1_n and arr2_n
      if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
      arr1_n = unique_arr1.repeat(len(unique_arr2))
      arr2_n = np.tile(unique_arr2,len(unique_arr1))
      ## Finding the number of times a particular combination has occured
      for i in np.arange(0,pdt):
      pos1 = np.where(arr1==arr1_n[i])[0]
      pos2 = np.where(arr2==arr2_n[i])[0]
      count[i] = len(np.intersect1d(pos1,pos2))
      return count

      np.random.seed(1)
      myarr = np.random.randint(20,size=(80000,4))
      a = myfunc(myarr[:,1],myarr[:,2])


      The following is the profiling results when i run line_profiler on this code.



      Timer unit: 1e-06 s



      Total time: 18.1849 s
      File: testcode3.py
      Function: myfunc at line 2



      Line # Hits Time Per Hit % Time Line Contents



       2 @profile
      3 def myfunc(arr1,arr2):
      4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
      5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
      6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
      7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
      8
      9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
      10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
      11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
      12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
      13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
      14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
      15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
      16 1 4.0 4.0 0.0 return count


      As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
      In future I will have to work with real data much larger than this one, hence I need to optimize this code.










      share|improve this question














      I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :



      import numpy as np
      #@profile
      def myfunc(arr1,arr2):
      unique_arr1 = np.unique(arr1)
      unique_arr2 = np.unique(arr2)
      pdt = len(unique_arr1)*len(unique_arr2)
      count = np.zeros(pdt).astype(int)

      ## getting the number of possible combinations and storing them in arr1_n and arr2_n
      if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
      arr1_n = unique_arr1.repeat(len(unique_arr2))
      arr2_n = np.tile(unique_arr2,len(unique_arr1))
      ## Finding the number of times a particular combination has occured
      for i in np.arange(0,pdt):
      pos1 = np.where(arr1==arr1_n[i])[0]
      pos2 = np.where(arr2==arr2_n[i])[0]
      count[i] = len(np.intersect1d(pos1,pos2))
      return count

      np.random.seed(1)
      myarr = np.random.randint(20,size=(80000,4))
      a = myfunc(myarr[:,1],myarr[:,2])


      The following is the profiling results when i run line_profiler on this code.



      Timer unit: 1e-06 s



      Total time: 18.1849 s
      File: testcode3.py
      Function: myfunc at line 2



      Line # Hits Time Per Hit % Time Line Contents



       2 @profile
      3 def myfunc(arr1,arr2):
      4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
      5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
      6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
      7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
      8
      9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
      10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
      11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
      12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
      13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
      14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
      15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
      16 1 4.0 4.0 0.0 return count


      As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
      In future I will have to work with real data much larger than this one, hence I need to optimize this code.







      arrays numpy python-3.6 line-profiler






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 13 at 6:16









      Bidisha DasBidisha Das

      634




      634






















          2 Answers
          2






          active

          oldest

          votes


















          0














          To meet Bidisha Das requirements:



          Code:



          def myfunc3(arr1, arr2):

          order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)

          #complete_arr = arr1*order_mag + arr2
          complete_arr = np.multiply(arr1, order_mag) + arr2

          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          unique_arr1 = np.unique(arr1)
          unique_arr2 = np.unique(arr2)

          r = np.zeros((len(unique_arr1), len(unique_arr2)))

          for i in range(len(unique_elements)):
          i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
          i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))

          r[i1,i2] += counts_elements[i]

          r = r.flatten()

          return r


          Test Code:



          times_f3 = []
          times_f1 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc3(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f3.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          Results:



          Results






          share|improve this answer

























          • Did this code work for you? @bidisha

            – Guille Sanchez
            Feb 27 at 12:06












          • I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

            – Bidisha Das
            Mar 5 at 6:29











          • Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

            – Guille Sanchez
            Mar 7 at 15:26



















          0














          Knowing the maximum possible value of your columns you can use:



          def myfunc2(arr1,arr2):
          # The *100 depends on your maximum possible value
          complete_arr = myarr[:,1]*100 + myarr[:,2]
          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          return counts_elements


          Results with 8·10e5 and 8·10e6 rows:



          N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
          Equal?: True
          N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
          Equal?: True


          Testing code:



          times_f1 = []
          times_f2 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc2(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f2.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          The time complexity seems to be O(n) for both cases:



          Compute times






          share|improve this answer

























          • when finding unique, the complete_arr gets sorted, hence the counts get messed up.

            – Bidisha Das
            Feb 15 at 5:35











          • Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

            – Bidisha Das
            Feb 15 at 5:36











          • @BidishaDas Did the second answer work for you?

            – Guille Sanchez
            Feb 27 at 12:09











          • Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

            – Bidisha Das
            Mar 1 at 11:29











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54663738%2ffind-the-number-of-times-a-combination-occurs-in-a-numpy-2d-array%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          To meet Bidisha Das requirements:



          Code:



          def myfunc3(arr1, arr2):

          order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)

          #complete_arr = arr1*order_mag + arr2
          complete_arr = np.multiply(arr1, order_mag) + arr2

          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          unique_arr1 = np.unique(arr1)
          unique_arr2 = np.unique(arr2)

          r = np.zeros((len(unique_arr1), len(unique_arr2)))

          for i in range(len(unique_elements)):
          i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
          i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))

          r[i1,i2] += counts_elements[i]

          r = r.flatten()

          return r


          Test Code:



          times_f3 = []
          times_f1 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc3(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f3.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          Results:



          Results






          share|improve this answer

























          • Did this code work for you? @bidisha

            – Guille Sanchez
            Feb 27 at 12:06












          • I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

            – Bidisha Das
            Mar 5 at 6:29











          • Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

            – Guille Sanchez
            Mar 7 at 15:26
















          0














          To meet Bidisha Das requirements:



          Code:



          def myfunc3(arr1, arr2):

          order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)

          #complete_arr = arr1*order_mag + arr2
          complete_arr = np.multiply(arr1, order_mag) + arr2

          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          unique_arr1 = np.unique(arr1)
          unique_arr2 = np.unique(arr2)

          r = np.zeros((len(unique_arr1), len(unique_arr2)))

          for i in range(len(unique_elements)):
          i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
          i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))

          r[i1,i2] += counts_elements[i]

          r = r.flatten()

          return r


          Test Code:



          times_f3 = []
          times_f1 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc3(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f3.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          Results:



          Results






          share|improve this answer

























          • Did this code work for you? @bidisha

            – Guille Sanchez
            Feb 27 at 12:06












          • I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

            – Bidisha Das
            Mar 5 at 6:29











          • Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

            – Guille Sanchez
            Mar 7 at 15:26














          0












          0








          0







          To meet Bidisha Das requirements:



          Code:



          def myfunc3(arr1, arr2):

          order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)

          #complete_arr = arr1*order_mag + arr2
          complete_arr = np.multiply(arr1, order_mag) + arr2

          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          unique_arr1 = np.unique(arr1)
          unique_arr2 = np.unique(arr2)

          r = np.zeros((len(unique_arr1), len(unique_arr2)))

          for i in range(len(unique_elements)):
          i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
          i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))

          r[i1,i2] += counts_elements[i]

          r = r.flatten()

          return r


          Test Code:



          times_f3 = []
          times_f1 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc3(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f3.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          Results:



          Results






          share|improve this answer















          To meet Bidisha Das requirements:



          Code:



          def myfunc3(arr1, arr2):

          order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)

          #complete_arr = arr1*order_mag + arr2
          complete_arr = np.multiply(arr1, order_mag) + arr2

          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          unique_arr1 = np.unique(arr1)
          unique_arr2 = np.unique(arr2)

          r = np.zeros((len(unique_arr1), len(unique_arr2)))

          for i in range(len(unique_elements)):
          i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
          i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))

          r[i1,i2] += counts_elements[i]

          r = r.flatten()

          return r


          Test Code:



          times_f3 = []
          times_f1 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc3(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f3.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          Results:



          Results







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 7 at 15:25

























          answered Feb 15 at 8:05









          Guille SanchezGuille Sanchez

          816




          816












          • Did this code work for you? @bidisha

            – Guille Sanchez
            Feb 27 at 12:06












          • I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

            – Bidisha Das
            Mar 5 at 6:29











          • Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

            – Guille Sanchez
            Mar 7 at 15:26


















          • Did this code work for you? @bidisha

            – Guille Sanchez
            Feb 27 at 12:06












          • I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

            – Bidisha Das
            Mar 5 at 6:29











          • Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

            – Guille Sanchez
            Mar 7 at 15:26

















          Did this code work for you? @bidisha

          – Guille Sanchez
          Feb 27 at 12:06






          Did this code work for you? @bidisha

          – Guille Sanchez
          Feb 27 at 12:06














          I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

          – Bidisha Das
          Mar 5 at 6:29





          I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]

          – Bidisha Das
          Mar 5 at 6:29













          Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

          – Guille Sanchez
          Mar 7 at 15:26






          Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2

          – Guille Sanchez
          Mar 7 at 15:26














          0














          Knowing the maximum possible value of your columns you can use:



          def myfunc2(arr1,arr2):
          # The *100 depends on your maximum possible value
          complete_arr = myarr[:,1]*100 + myarr[:,2]
          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          return counts_elements


          Results with 8·10e5 and 8·10e6 rows:



          N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
          Equal?: True
          N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
          Equal?: True


          Testing code:



          times_f1 = []
          times_f2 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc2(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f2.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          The time complexity seems to be O(n) for both cases:



          Compute times






          share|improve this answer

























          • when finding unique, the complete_arr gets sorted, hence the counts get messed up.

            – Bidisha Das
            Feb 15 at 5:35











          • Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

            – Bidisha Das
            Feb 15 at 5:36











          • @BidishaDas Did the second answer work for you?

            – Guille Sanchez
            Feb 27 at 12:09











          • Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

            – Bidisha Das
            Mar 1 at 11:29















          0














          Knowing the maximum possible value of your columns you can use:



          def myfunc2(arr1,arr2):
          # The *100 depends on your maximum possible value
          complete_arr = myarr[:,1]*100 + myarr[:,2]
          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          return counts_elements


          Results with 8·10e5 and 8·10e6 rows:



          N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
          Equal?: True
          N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
          Equal?: True


          Testing code:



          times_f1 = []
          times_f2 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc2(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f2.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          The time complexity seems to be O(n) for both cases:



          Compute times






          share|improve this answer

























          • when finding unique, the complete_arr gets sorted, hence the counts get messed up.

            – Bidisha Das
            Feb 15 at 5:35











          • Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

            – Bidisha Das
            Feb 15 at 5:36











          • @BidishaDas Did the second answer work for you?

            – Guille Sanchez
            Feb 27 at 12:09











          • Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

            – Bidisha Das
            Mar 1 at 11:29













          0












          0








          0







          Knowing the maximum possible value of your columns you can use:



          def myfunc2(arr1,arr2):
          # The *100 depends on your maximum possible value
          complete_arr = myarr[:,1]*100 + myarr[:,2]
          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          return counts_elements


          Results with 8·10e5 and 8·10e6 rows:



          N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
          Equal?: True
          N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
          Equal?: True


          Testing code:



          times_f1 = []
          times_f2 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc2(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f2.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          The time complexity seems to be O(n) for both cases:



          Compute times






          share|improve this answer















          Knowing the maximum possible value of your columns you can use:



          def myfunc2(arr1,arr2):
          # The *100 depends on your maximum possible value
          complete_arr = myarr[:,1]*100 + myarr[:,2]
          unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)

          return counts_elements


          Results with 8·10e5 and 8·10e6 rows:



          N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
          Equal?: True
          N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
          Equal?: True


          Testing code:



          times_f1 = []
          times_f2 = []
          ns = 8*10**np.linspace(3, 6, 10)

          for i in ns:
          np.random.seed(1)
          myarr = np.random.randint(20,size=(int(i),4))

          start1 = time.time()
          a = myfunc2(myarr[:,1],myarr[:,2])
          end1 = time.time()
          times_f2.append(end1-start1)

          start2 = time.time()
          b = myfunc(myarr[:,1],myarr[:,2])
          end2 = time.time()
          times_f1.append(end2-start2)

          print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
          print("Equal?: " + str(np.array_equal(a,b)))


          The time complexity seems to be O(n) for both cases:



          Compute times







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 13 at 12:43

























          answered Feb 13 at 7:56









          Guille SanchezGuille Sanchez

          816




          816












          • when finding unique, the complete_arr gets sorted, hence the counts get messed up.

            – Bidisha Das
            Feb 15 at 5:35











          • Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

            – Bidisha Das
            Feb 15 at 5:36











          • @BidishaDas Did the second answer work for you?

            – Guille Sanchez
            Feb 27 at 12:09











          • Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

            – Bidisha Das
            Mar 1 at 11:29

















          • when finding unique, the complete_arr gets sorted, hence the counts get messed up.

            – Bidisha Das
            Feb 15 at 5:35











          • Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

            – Bidisha Das
            Feb 15 at 5:36











          • @BidishaDas Did the second answer work for you?

            – Guille Sanchez
            Feb 27 at 12:09











          • Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

            – Bidisha Das
            Mar 1 at 11:29
















          when finding unique, the complete_arr gets sorted, hence the counts get messed up.

          – Bidisha Das
          Feb 15 at 5:35





          when finding unique, the complete_arr gets sorted, hence the counts get messed up.

          – Bidisha Das
          Feb 15 at 5:35













          Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

          – Bidisha Das
          Feb 15 at 5:36





          Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that

          – Bidisha Das
          Feb 15 at 5:36













          @BidishaDas Did the second answer work for you?

          – Guille Sanchez
          Feb 27 at 12:09





          @BidishaDas Did the second answer work for you?

          – Guille Sanchez
          Feb 27 at 12:09













          Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

          – Bidisha Das
          Mar 1 at 11:29





          Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez

          – Bidisha Das
          Mar 1 at 11:29

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54663738%2ffind-the-number-of-times-a-combination-occurs-in-a-numpy-2d-array%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          1928 у кіно

          Захаров Федір Захарович

          Ель Греко