Sophiex/dev/ssl losses 1043 #1205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

sophie-xhonneux wants to merge 11 commits into develop from sophiex/dev/ssl-losses-1043

Contributor

sophie-xhonneux commented Nov 5, 2025

Description

[DRAFT] PR for introducing the losses for SSL student-teacher latent losses. This PR will rely on both the abstract loss calculator #1178 as well as the abstract target/aux class #1179

The idea is to get early feedback and notice issues my making code more concrete

Issue Number

Closes #1043

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

sophie-xhonneux added 6 commits

October 30, 2025 17:27


          Abstract class for target/aux computation

3f1bb7d

Implemented Identity class

TODO: implement EMATeacher


          Start implementing the EMA Teacher

03ed148

The big question on the EMA teacher side to me is how to allow for a
fleixble teacher and student architecture that can differ

We updated some APIs of the abstract base class to allow the ema_model
forward, subject to change given the loss calculator, which is imho the
second big question mark


          Option for constructing teacher model flexibly

192beb6


          Extract get batch size util function

aac7e29

Easier to read and as batchsize gets more complicated in SSL this will
be a useful abstraction


          Fix mismatched dtypes in the target computation

145d18a

It runs so far. Next steps:
 - Route all the config options
 - Start writing the loss functions to understand the state requirements


          Lay groundwork for SSL losses

4af17bd

This involves creating stateful classes for each of the losses and the
EMATeacher being able to run additional neural network heads for these
losses.

github-project-automation bot added this to WeatherGen-dev

github-actions bot added initiative model labels

sophie-xhonneux added 2 commits

November 6, 2025 10:37


          Add the SSL Loss Processing classes

875cbee


          Write part of the TargetProcessing forward

818ca6e

TODO: create the various teacher head modules and run them.
TODO: merge the abstract loss calculator and create the SSL one

clessig reviewed

View reviewed changes

Collaborator

clessig left a comment

Didn't look through the actual computations line by line since it seems this copy-paste from the reference code?

src/weathergen/train/ssl_losses_utils.py

		@@ -0,0 +1,304 @@
		# (C) Copyright 2025 WeatherGenerator contributors.

Collaborator

clessig Nov 6, 2025

This file should go to . They need to be torch.nn.modules because this are NNs, even if they are not necessarily themselves trained. I think ssl_target_processing.py (since you probably still don't like ssl_target_predictors.py)

src/weathergen/train/ssl_losses_utils.py

		import torch.nn.functional as F


		def lossfunc(t, s, temp):

Collaborator

clessig Nov 6, 2025

The name is not very descriptive :) Maybe latent_logit_loss.py? JEPA uses MAE (and one could conceivably replace by MSE) which are already implemented in loss.py. Ideally we could reuse what is there.

src/weathergen/train/ssl_losses_utils.py

+                      Q *= B  # the columns must sum to 1 so that Q is an assignment
+                      return Q.t()
+                  # def forward(self, student_patch_tokens, teacher_patch_tokens, student_masks_flat):

Collaborator

clessig Nov 6, 2025

Can we remove the stale code? What does it implement?

Contributor Author

sophie-xhonneux Nov 7, 2025

the stale code is there for reference because it needs to go to the loss calculator later

I will do all the clean-up once we are much closer to actually merging :)

src/weathergen/train/ssl_losses_utils.py Show resolved Hide resolved

src/weathergen/train/ssl_losses_utils.py

+                  def __init__(
+                      self,
+                      patch_out_dim,

Collaborator

clessig Nov 6, 2025

Would it be better to take a dict as arg if we potentially want to implement *TargetProcessing that requires different args.

sophie-xhonneux added 3 commits

November 7, 2025 10:49


          Add latent prediction heads to the Model

c25f70a

After much consideration I decided to add the latent prediction heads to
the Model, because they also need to benefit from exponential moving
average of the weights and this gets unnecessarily cumbersome if they
are outside the Model.

TODO: make JEPA different between student and teacher
TODO: use this new structure in EMATeacher


          Adapt forward function for latent prediction heads

3ed8185

To prevent crazy nesting of model output values we created a ModelOutput
Dataclass (akin to how it is done in huggingface), and we run all the
latent_prediction heads.


          Start piping configs through model, trainer, etc

97405fa

Will need adapting based on the abstract loss calculator

Currently is awaiting the streams data branch to check piping of data
and configuring this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

initiative model