Skip to content

Conversation

wsmoses
Copy link
Member

@wsmoses wsmoses commented Sep 3, 2025

No description provided.

@wsmoses
Copy link
Member Author

wsmoses commented Sep 4, 2025

ah dangit accidentally added copilot as a reviewer

@giordano giordano review requested due to automatic review settings September 4, 2025 14:24
@giordano
Copy link
Member

giordano commented Sep 4, 2025

The large grid full GB-25 run had the message

W0000 00:00:1756998431.440298    5182 hlo_rematerialization.cc:3183] Can't reduce memory use below 10.64GiB (11428802214 bytes) by rematerialization; only reduced to 71.27GiB (76523901244 bytes), down from 73.51GiB (78931543988 bytes) originally

while in https://github.com/EnzymeAD/Enzyme-JAX/actions/runs/17466827030/job/49604725985?pr=1363#step:19:749 we had

2025-08-02 23:46:25.965629: W external/xla/xla/hlo/transforms/simplifiers/hlo_rematerialization.cc:3423] Can't reduce memory use below 10.64GiB (11429599125 bytes) by rematerialization; only reduced to 67.69GiB (72679996812 bytes), down from 71.05GiB (76287081352 bytes) originally

which is an increase in the extra total memory, but I'm not sure this is a 1:1 comparison, some things changed in the meantime.

@wsmoses
Copy link
Member Author

wsmoses commented Sep 4, 2025

Hm can we set up a one to one comparison just to confirm?

@wsmoses
Copy link
Member Author

wsmoses commented Sep 4, 2025

Also just double checking (I did locally earlier too), we remain good on all-X?

@giordano
Copy link
Member

giordano commented Sep 4, 2025

Hm can we set up a one to one comparison just to confirm?

#1365

Also just double checking (I did locally earlier too), we remain good on all-X?

I think so, I didn't see them in this dump: https://github.com/EnzymeAD/Enzyme-JAX/actions/runs/17466827030/artifacts/3928490250

@giordano
Copy link
Member

giordano commented Sep 4, 2025

This what we currently get on main, when trying to use a larger grid: https://github.com/EnzymeAD/Enzyme-JAX/actions/runs/17470175868/job/49619910051?pr=1365#step:19:749

W0000 00:00:1757008837.018719   44140 hlo_rematerialization.cc:3183] Can't reduce memory use below 10.64GiB (11428802214 bytes) by rematerialization; only reduced to 57.29GiB (61511625930 bytes), down from 74.39GiB (79874237692 bytes) originally

So this PR seems to cause more memory usage overall?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants