Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes 3428 putmask optimization #3749

Merged

Conversation

drculhane
Copy link
Contributor

@drculhane drculhane commented Sep 5, 2024

Adds the distributed/aggregated code for putmask

closes #3428

Copy link
Contributor

@ajpotts ajpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just some minor comments.

arkouda/numeric.py Outdated Show resolved Hide resolved
arkouda/numeric.py Outdated Show resolved Hide resolved
arkouda/numeric.py Outdated Show resolved Hide resolved
tests/numeric_test.py Outdated Show resolved Hide resolved
arkouda/numeric.py Outdated Show resolved Hide resolved
src/EfuncMsg.chpl Show resolved Hide resolved
src/EfuncMsg.chpl Outdated Show resolved Hide resolved
@stress-tess
Copy link
Member

@drculhane I was in the process of reviewing this and was seeing a lot of the same things as @ajpotts before I realized that the changes made in response to her review comments hadn't been pushed yet. so I'm just dropping this note in case that's not on purpose

Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this hasn't been updated yet but I wanted to go ahead and post this review cause I found a small perf bug

src/EfuncMsg.chpl Show resolved Hide resolved
src/EfuncMsg.chpl Outdated Show resolved Hide resolved
@stress-tess
Copy link
Member

Dropping here my findings for the performance I'm seeing on my laptop for putmask before and after this PR. This isn't as accurate as the communication numbers since this is only simulated multi-locale, but I don't know of a great way to do comm diagnostics on the original since it was implemented on the python side (i.e. it calls out to lots of different chpl functions)

I copied @drculhane's tests for the different cases into a python script with problem size 10**6

python code for `putmask_timing.py`
import arkouda as ak
ak.connect()
import numpy as np
import time
prob_size = 10**6

# stealing from andy's tests for the different edge cases

# values same size as data
print("Case 1: Values same size")
nda = np.random.randint(0, 10, prob_size)
pda = ak.array(nda)
nda2 = (nda**2)
pda2 = ak.array(nda2)
hold_that_thought = nda.copy()
np.putmask(nda, nda > 5, nda2)

t1 = time.time()
ak.putmask(pda, pda > 5, pda2)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")

#  values potentially much shorter than data
print("Case 2: Values much shorter")
nda = hold_that_thought.copy()
pda = ak.array(nda)
npvalues = np.arange(3)
akvalues = ak.array(npvalues)
np.putmask(nda, nda > 5, npvalues)

t1 = time.time()
ak.putmask(pda, pda > 5, akvalues)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")

# values shorter than data, but likely not to fit on one locale in a multi-locale test
print("Case 3: Values shorter but not small enough to fit on a single locale")
nda = hold_that_thought.copy()
pda = ak.array(nda)
npvalues = np.arange(prob_size // 2 + 1)
akvalues = ak.array(npvalues)
np.putmask(nda, nda > 5, npvalues)

t1 = time.time()
ak.putmask(pda, pda > 5, akvalues)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")

# values longer than data
print("Case 4: Values longer")
nda = hold_that_thought.copy()
pda = ak.array(nda)
npvalues = np.arange(prob_size + 1000)
akvalues = ak.array(npvalues)
np.putmask(nda, nda > 5, npvalues)

t1 = time.time()
ak.putmask(pda, pda > 5, akvalues)
t2 = time.time()
print(f"took {t2-t1} seconds")
print(f"matches numpy: {np.allclose(nda, pda.to_ndarray())}\n")

Timings on master:

Case 1: Values same size
took 0.09863877296447754 seconds
matches numpy: True

Case 2: Values much shorter
took 188.12161302566528 seconds
matches numpy: True

Case 3: Values shorter but not small enough to fit on a single locale
took 0.24857282638549805 seconds
matches numpy: True

Case 4: Values longer
took 0.13264894485473633 seconds
matches numpy: True

Timings using this PR:

Case 1: Values same size
took 0.03622913360595703 seconds
matches numpy: True

Case 2: Values much shorter
took 0.060797929763793945 seconds
matches numpy: True

Case 3: Values shorter but not small enough to fit on a single locale
took 0.19672393798828125 seconds
matches numpy: True

Case 4: Values longer
took 0.058113813400268555 seconds
matches numpy: True

so I'm seeing better performance across the board, but the significant gains come from case 2, where we localize the values to each locale when it is small enough. I'm glad this is the case considering this was the biggest reason I suggested the optimization in the first place (my comment on original PR)

Great job @drculhane!!!! 🎉

@drculhane drculhane force-pushed the Closes-3428-putmask-optimization branch from 4688a30 to de49188 Compare September 18, 2024 17:59
Copy link
Contributor

@jaketrookman jaketrookman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the logic looks good to me and a great speedup!! 🚀

@stress-tess stress-tess added this pull request to the merge queue Sep 18, 2024
Merged via the queue into Bears-R-Us:master with commit 07dc332 Sep 18, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize putmask
4 participants