-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes 3428 putmask optimization #3749
Closes 3428 putmask optimization #3749
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just some minor comments.
@drculhane I was in the process of reviewing this and was seeing a lot of the same things as @ajpotts before I realized that the changes made in response to her review comments hadn't been pushed yet. so I'm just dropping this note in case that's not on purpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this hasn't been updated yet but I wanted to go ahead and post this review cause I found a small perf bug
Dropping here my findings for the performance I'm seeing on my laptop for I copied @drculhane's tests for the different cases into a python script with problem size python code for `putmask_timing.py`import arkouda as ak
ak.connect()
import numpy as np
import time
prob_size = 10**6
# stealing from andy's tests for the different edge cases
# values same size as data
print("Case 1: Values same size")
nda = np.random.randint(0, 10, prob_size)
pda = ak.array(nda)
nda2 = (nda**2)
pda2 = ak.array(nda2)
hold_that_thought = nda.copy()
np.putmask(nda, nda > 5, nda2)
t1 = time.time()
ak.putmask(pda, pda > 5, pda2)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")
# values potentially much shorter than data
print("Case 2: Values much shorter")
nda = hold_that_thought.copy()
pda = ak.array(nda)
npvalues = np.arange(3)
akvalues = ak.array(npvalues)
np.putmask(nda, nda > 5, npvalues)
t1 = time.time()
ak.putmask(pda, pda > 5, akvalues)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")
# values shorter than data, but likely not to fit on one locale in a multi-locale test
print("Case 3: Values shorter but not small enough to fit on a single locale")
nda = hold_that_thought.copy()
pda = ak.array(nda)
npvalues = np.arange(prob_size // 2 + 1)
akvalues = ak.array(npvalues)
np.putmask(nda, nda > 5, npvalues)
t1 = time.time()
ak.putmask(pda, pda > 5, akvalues)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")
# values longer than data
print("Case 4: Values longer")
nda = hold_that_thought.copy()
pda = ak.array(nda)
npvalues = np.arange(prob_size + 1000)
akvalues = ak.array(npvalues)
np.putmask(nda, nda > 5, npvalues)
t1 = time.time()
ak.putmask(pda, pda > 5, akvalues)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n") Timings on master:
Timings using this PR:
so I'm seeing better performance across the board, but the significant gains come from case 2, where we localize the values to each locale when it is small enough. I'm glad this is the case considering this was the biggest reason I suggested the optimization in the first place (my comment on original PR) Great job @drculhane!!!! 🎉 |
4688a30
to
de49188
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the logic looks good to me and a great speedup!! 🚀
Adds the distributed/aggregated code for putmask
closes #3428