-
Notifications
You must be signed in to change notification settings - Fork 3
Sharded minimal gradient #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mplement loss functions
Minimal gradient on the sharded version: merge the minimal gradient branch into the sharded pipeline branch
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
| "# NBVAL_SKIP\n", | ||
| "import os\n", | ||
| "#os.environ['SPS_HOME'] = '/mnt/storage/annalena_data/sps_fsps'\n", | ||
| "#os.environ['SPS_HOME'] = '/home/annalena/sps_fsps'\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a rule, don't hardcode paths in your home dir or with your name in it. use pathlib.Path.home() for instance.
| "\n", | ||
| "def loss_only_wrt_age_metallicity(age, metallicity, base_data, target):\n", | ||
| " \n", | ||
| " base_data.stars.age = age*20\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a mutating operation, and as such will not play well with jax.grad, i.e., the result will be undefined. This might possibly be the problem with the broken gradient on the gpu. However, I was not able to check this because the loss gives nan after a few optimization rounds every time. Is there a quick solution to this?
As a general rule, I don't think mutating the parameters in the loss function is the best way to do this. More long-term, maybe we should think about modifying the pipeline in such a way that parameters and input data are conceptually separated, such that we can write gradients more easily.... wasn't quite obvious that this would turn out this way...
|
this PR is also far too big. can we maybe restructure it into multiple? do we need all the jupyter notebooks? |
for more information, see https://pre-commit.ci


Minimal version of the gradient is also working on the sharded pipeline.
Here we calculate the gradient with respect to age and metallicity for two identical particles.
Next step for the gradient is to include the distribution function and calculate a gradient with respect to the parameters of the distribution function.
The only problem here: If I run the notebook gradient_age_metallicity_adamoptimizer.ipynb on 2 cpus (e.g. my local laptop) I get good looking results. If I run the exact same notebook 2 gpus I get very different and bad looking results. I did not find out yet, what is going wrong.