Find the number of times a combination occurs in a numpy 2D array The Next CEO of Stack OverflowBest way to find if an item is in a JavaScript array?Is there a NumPy function to return the first index of something in an array?How to find the sum of an array of numbersHow to sum array of numbers in Ruby?How to print the full NumPy array, without truncation?Find nearest value in numpy arrayNumpy array dimensionsHow to access the ith column of a NumPy multidimensional array?Dump a NumPy array into a csv fileFind object by id in an array of JavaScript objects
Why don't programming languages automatically manage the synchronous/asynchronous problem?
Interfacing a button to MCU (and PC) with 50m long cable
How to avoid supervisors with prejudiced views?
Why do airplanes bank sharply to the right after air-to-air refueling?
How to count occurrences of text in a file?
Is "for causing autism in X" grammatical?
Would a galaxy be visible from outside, but nearby?
How did the Bene Gesserit know how to make a Kwisatz Haderach?
Is it possible to search for a directory/file combination?
How do we know the LHC results are robust?
How to Reset Passwords on Multiple Websites Easily?
Anatomically Correct Strange Women In Ponds Distributing Swords
How does the mv command work with external drives?
What can we do to stop prior company from asking us questions?
To not tell, not take, and not want
Rotate a column
Why do we use the plural of movies in this phrase "We went to the movies last night."?
Contours of a clandestine nature
Skipping indices in a product
Sending manuscript to multiple publishers
Is it professional to write unrelated content in an almost-empty email?
Why do professional authors make "consistency" mistakes? And how to avoid them?
Is there a way to save my career from absolute disaster?
WOW air has ceased operation, can I get my tickets refunded?
Find the number of times a combination occurs in a numpy 2D array
The Next CEO of Stack OverflowBest way to find if an item is in a JavaScript array?Is there a NumPy function to return the first index of something in an array?How to find the sum of an array of numbersHow to sum array of numbers in Ruby?How to print the full NumPy array, without truncation?Find nearest value in numpy arrayNumpy array dimensionsHow to access the ith column of a NumPy multidimensional array?Dump a NumPy array into a csv fileFind object by id in an array of JavaScript objects
I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :
import numpy as np
#@profile
def myfunc(arr1,arr2):
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
pdt = len(unique_arr1)*len(unique_arr2)
count = np.zeros(pdt).astype(int)
## getting the number of possible combinations and storing them in arr1_n and arr2_n
if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
arr1_n = unique_arr1.repeat(len(unique_arr2))
arr2_n = np.tile(unique_arr2,len(unique_arr1))
## Finding the number of times a particular combination has occured
for i in np.arange(0,pdt):
pos1 = np.where(arr1==arr1_n[i])[0]
pos2 = np.where(arr2==arr2_n[i])[0]
count[i] = len(np.intersect1d(pos1,pos2))
return count
np.random.seed(1)
myarr = np.random.randint(20,size=(80000,4))
a = myfunc(myarr[:,1],myarr[:,2])
The following is the profiling results when i run line_profiler on this code.
Timer unit: 1e-06 s
Total time: 18.1849 s
File: testcode3.py
Function: myfunc at line 2
Line # Hits Time Per Hit % Time Line Contents
2 @profile
3 def myfunc(arr1,arr2):
4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
8
9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
16 1 4.0 4.0 0.0 return count
As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
In future I will have to work with real data much larger than this one, hence I need to optimize this code.
arrays numpy python-3.6 line-profiler
add a comment |
I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :
import numpy as np
#@profile
def myfunc(arr1,arr2):
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
pdt = len(unique_arr1)*len(unique_arr2)
count = np.zeros(pdt).astype(int)
## getting the number of possible combinations and storing them in arr1_n and arr2_n
if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
arr1_n = unique_arr1.repeat(len(unique_arr2))
arr2_n = np.tile(unique_arr2,len(unique_arr1))
## Finding the number of times a particular combination has occured
for i in np.arange(0,pdt):
pos1 = np.where(arr1==arr1_n[i])[0]
pos2 = np.where(arr2==arr2_n[i])[0]
count[i] = len(np.intersect1d(pos1,pos2))
return count
np.random.seed(1)
myarr = np.random.randint(20,size=(80000,4))
a = myfunc(myarr[:,1],myarr[:,2])
The following is the profiling results when i run line_profiler on this code.
Timer unit: 1e-06 s
Total time: 18.1849 s
File: testcode3.py
Function: myfunc at line 2
Line # Hits Time Per Hit % Time Line Contents
2 @profile
3 def myfunc(arr1,arr2):
4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
8
9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
16 1 4.0 4.0 0.0 return count
As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
In future I will have to work with real data much larger than this one, hence I need to optimize this code.
arrays numpy python-3.6 line-profiler
add a comment |
I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :
import numpy as np
#@profile
def myfunc(arr1,arr2):
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
pdt = len(unique_arr1)*len(unique_arr2)
count = np.zeros(pdt).astype(int)
## getting the number of possible combinations and storing them in arr1_n and arr2_n
if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
arr1_n = unique_arr1.repeat(len(unique_arr2))
arr2_n = np.tile(unique_arr2,len(unique_arr1))
## Finding the number of times a particular combination has occured
for i in np.arange(0,pdt):
pos1 = np.where(arr1==arr1_n[i])[0]
pos2 = np.where(arr2==arr2_n[i])[0]
count[i] = len(np.intersect1d(pos1,pos2))
return count
np.random.seed(1)
myarr = np.random.randint(20,size=(80000,4))
a = myfunc(myarr[:,1],myarr[:,2])
The following is the profiling results when i run line_profiler on this code.
Timer unit: 1e-06 s
Total time: 18.1849 s
File: testcode3.py
Function: myfunc at line 2
Line # Hits Time Per Hit % Time Line Contents
2 @profile
3 def myfunc(arr1,arr2):
4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
8
9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
16 1 4.0 4.0 0.0 return count
As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
In future I will have to work with real data much larger than this one, hence I need to optimize this code.
arrays numpy python-3.6 line-profiler
I have a 2D numpy array, and I want a function operating on col1 and col2 of the array, If 'M' is the number of unique values from col1 and 'N' is the number of unique values from col2, then the output 1D array will have size (M * N) For example, suppose there are 3 unique values in col1: A1, A2 and A3 and 2 unique values in col2: X1 and X2. Then, the possible combinations are:(A1 X1),(A1 X2),(A2 X1),(A2 X2),(A3 X1),(A3 X2). Now I want to find out how many times each combination occurs together in the same row, i.e how many rows are there that contain the combination (A1,X1) and so on.. I want to return the count as a 1D array. This is my code :
import numpy as np
#@profile
def myfunc(arr1,arr2):
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
pdt = len(unique_arr1)*len(unique_arr2)
count = np.zeros(pdt).astype(int)
## getting the number of possible combinations and storing them in arr1_n and arr2_n
if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
arr1_n = unique_arr1.repeat(len(unique_arr2))
arr2_n = np.tile(unique_arr2,len(unique_arr1))
## Finding the number of times a particular combination has occured
for i in np.arange(0,pdt):
pos1 = np.where(arr1==arr1_n[i])[0]
pos2 = np.where(arr2==arr2_n[i])[0]
count[i] = len(np.intersect1d(pos1,pos2))
return count
np.random.seed(1)
myarr = np.random.randint(20,size=(80000,4))
a = myfunc(myarr[:,1],myarr[:,2])
The following is the profiling results when i run line_profiler on this code.
Timer unit: 1e-06 s
Total time: 18.1849 s
File: testcode3.py
Function: myfunc at line 2
Line # Hits Time Per Hit % Time Line Contents
2 @profile
3 def myfunc(arr1,arr2):
4 1 74549.0 74549.0 0.4 unique_arr1 = np.unique(arr1)
5 1 72970.0 72970.0 0.4 unique_arr2 = np.unique(arr2)
6 1 9.0 9.0 0.0 pdt = len(unique_arr1)*len(unique_arr2)
7 1 48.0 48.0 0.0 count = np.zeros(pdt).astype(int)
8
9 1 5.0 5.0 0.0 if ((len(unique_arr2)>0) and (len(unique_arr1)>0)):
10 1 16.0 16.0 0.0 arr1_n = unique_arr1.repeat(len(unique_arr2))
11 1 105.0 105.0 0.0 arr2_n = np.tile(unique_arr2,len(unique_arr1))
12 401 5200.0 13.0 0.0 for i in np.arange(0,pdt):
13 400 6870931.0 17177.3 37.8 pos1 = np.where(arr1==arr1_n[i])[0]
14 400 6844999.0 17112.5 37.6 pos2 = np.where(arr2==arr2_n[i])[0]
15 400 4316035.0 10790.1 23.7 count[i] = len(np.intersect1d(pos1,pos2))
16 1 4.0 4.0 0.0 return count
As you can see, np.where and np.intersect1D take up a lot of time. Can anyone suggest faster methods to do this?
In future I will have to work with real data much larger than this one, hence I need to optimize this code.
arrays numpy python-3.6 line-profiler
arrays numpy python-3.6 line-profiler
asked Feb 13 at 6:16
Bidisha DasBidisha Das
634
634
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
To meet Bidisha Das requirements:
Code:
def myfunc3(arr1, arr2):
order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)
#complete_arr = arr1*order_mag + arr2
complete_arr = np.multiply(arr1, order_mag) + arr2
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
r = np.zeros((len(unique_arr1), len(unique_arr2)))
for i in range(len(unique_elements)):
i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))
r[i1,i2] += counts_elements[i]
r = r.flatten()
return r
Test Code:
times_f3 = []
times_f1 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc3(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f3.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
Results:
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
add a comment |
Knowing the maximum possible value of your columns you can use:
def myfunc2(arr1,arr2):
# The *100 depends on your maximum possible value
complete_arr = myarr[:,1]*100 + myarr[:,2]
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
return counts_elements
Results with 8·10e5 and 8·10e6 rows:
N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
Equal?: True
N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
Equal?: True
Testing code:
times_f1 = []
times_f2 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc2(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f2.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
The time complexity seems to be O(n) for both cases:
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54663738%2ffind-the-number-of-times-a-combination-occurs-in-a-numpy-2d-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
To meet Bidisha Das requirements:
Code:
def myfunc3(arr1, arr2):
order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)
#complete_arr = arr1*order_mag + arr2
complete_arr = np.multiply(arr1, order_mag) + arr2
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
r = np.zeros((len(unique_arr1), len(unique_arr2)))
for i in range(len(unique_elements)):
i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))
r[i1,i2] += counts_elements[i]
r = r.flatten()
return r
Test Code:
times_f3 = []
times_f1 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc3(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f3.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
Results:
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
add a comment |
To meet Bidisha Das requirements:
Code:
def myfunc3(arr1, arr2):
order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)
#complete_arr = arr1*order_mag + arr2
complete_arr = np.multiply(arr1, order_mag) + arr2
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
r = np.zeros((len(unique_arr1), len(unique_arr2)))
for i in range(len(unique_elements)):
i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))
r[i1,i2] += counts_elements[i]
r = r.flatten()
return r
Test Code:
times_f3 = []
times_f1 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc3(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f3.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
Results:
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
add a comment |
To meet Bidisha Das requirements:
Code:
def myfunc3(arr1, arr2):
order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)
#complete_arr = arr1*order_mag + arr2
complete_arr = np.multiply(arr1, order_mag) + arr2
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
r = np.zeros((len(unique_arr1), len(unique_arr2)))
for i in range(len(unique_elements)):
i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))
r[i1,i2] += counts_elements[i]
r = r.flatten()
return r
Test Code:
times_f3 = []
times_f1 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc3(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f3.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
Results:
To meet Bidisha Das requirements:
Code:
def myfunc3(arr1, arr2):
order_mag = 10**(int(math.log10(np.amax([arr1, arr2]))) + 1)
#complete_arr = arr1*order_mag + arr2
complete_arr = np.multiply(arr1, order_mag) + arr2
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
unique_arr1 = np.unique(arr1)
unique_arr2 = np.unique(arr2)
r = np.zeros((len(unique_arr1), len(unique_arr2)))
for i in range(len(unique_elements)):
i1 = np.where(unique_arr1==int(unique_elements[i]/order_mag))
i2 = np.where(unique_arr2==(unique_elements[i]%order_mag))
r[i1,i2] += counts_elements[i]
r = r.flatten()
return r
Test Code:
times_f3 = []
times_f1 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc3(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f3.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
Results:
edited Mar 7 at 15:25
answered Feb 15 at 8:05
Guille SanchezGuille Sanchez
816
816
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
add a comment |
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
Did this code work for you? @bidisha
– Guille Sanchez
Feb 27 at 12:06
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
I am trying to run your code in the following way: a = [100,200,100,100,300,200] b = [1,1,2,1,2,1] x = myfunc3(a,b) print(x) outputs [0. 0. 0. 0. 0. 0.] whereas the expected output is [2 1 2 0 0 1]
– Bidisha Das
Mar 5 at 6:29
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
Fixed the error in the code: complete_arr = np.multiply(arr1, order_mag) + arr2 instead of complete_arr = arr1*order_mag + arr2
– Guille Sanchez
Mar 7 at 15:26
add a comment |
Knowing the maximum possible value of your columns you can use:
def myfunc2(arr1,arr2):
# The *100 depends on your maximum possible value
complete_arr = myarr[:,1]*100 + myarr[:,2]
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
return counts_elements
Results with 8·10e5 and 8·10e6 rows:
N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
Equal?: True
N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
Equal?: True
Testing code:
times_f1 = []
times_f2 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc2(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f2.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
The time complexity seems to be O(n) for both cases:
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
add a comment |
Knowing the maximum possible value of your columns you can use:
def myfunc2(arr1,arr2):
# The *100 depends on your maximum possible value
complete_arr = myarr[:,1]*100 + myarr[:,2]
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
return counts_elements
Results with 8·10e5 and 8·10e6 rows:
N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
Equal?: True
N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
Equal?: True
Testing code:
times_f1 = []
times_f2 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc2(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f2.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
The time complexity seems to be O(n) for both cases:
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
add a comment |
Knowing the maximum possible value of your columns you can use:
def myfunc2(arr1,arr2):
# The *100 depends on your maximum possible value
complete_arr = myarr[:,1]*100 + myarr[:,2]
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
return counts_elements
Results with 8·10e5 and 8·10e6 rows:
N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
Equal?: True
N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
Equal?: True
Testing code:
times_f1 = []
times_f2 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc2(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f2.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
The time complexity seems to be O(n) for both cases:
Knowing the maximum possible value of your columns you can use:
def myfunc2(arr1,arr2):
# The *100 depends on your maximum possible value
complete_arr = myarr[:,1]*100 + myarr[:,2]
unique_elements, counts_elements = np.unique(complete_arr, return_counts=True)
return counts_elements
Results with 8·10e5 and 8·10e6 rows:
N: 800000, myfucn2 time: 78.287 ms, myfucn time: 6556.748 ms
Equal?: True
N: 8000000, myfucn2 time: 736.020 ms, myfucn time: 100544.354 ms
Equal?: True
Testing code:
times_f1 = []
times_f2 = []
ns = 8*10**np.linspace(3, 6, 10)
for i in ns:
np.random.seed(1)
myarr = np.random.randint(20,size=(int(i),4))
start1 = time.time()
a = myfunc2(myarr[:,1],myarr[:,2])
end1 = time.time()
times_f2.append(end1-start1)
start2 = time.time()
b = myfunc(myarr[:,1],myarr[:,2])
end2 = time.time()
times_f1.append(end2-start2)
print("N: :1>d, myfucn2 time: :.3f ms, myfucn time: :.3f ms".format(int(i), (end1-start1)*1000.0, (end2-start2)*1000.0))
print("Equal?: " + str(np.array_equal(a,b)))
The time complexity seems to be O(n) for both cases:
edited Feb 13 at 12:43
answered Feb 13 at 7:56
Guille SanchezGuille Sanchez
816
816
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
add a comment |
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
when finding unique, the complete_arr gets sorted, hence the counts get messed up.
– Bidisha Das
Feb 15 at 5:35
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
Also, if a particular combination does'nt occur, I want count[at that position] to be 0, your code does'nt satisfy that
– Bidisha Das
Feb 15 at 5:36
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
@BidishaDas Did the second answer work for you?
– Guille Sanchez
Feb 27 at 12:09
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
Busy with something else right now. Will let you know as soon as I try out @Guile Sanchez
– Bidisha Das
Mar 1 at 11:29
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54663738%2ffind-the-number-of-times-a-combination-occurs-in-a-numpy-2d-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown