Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the "RMSProp" optimizer when training the YoloV11 model #1013

Open
1 task done
alon-12345 opened this issue Feb 7, 2025 · 15 comments
Open
1 task done

Using the "RMSProp" optimizer when training the YoloV11 model #1013

alon-12345 opened this issue Feb 7, 2025 · 15 comments
Labels
detect Object Detection issues, PR's question Further information is requested

Comments

@alon-12345
Copy link

Search before asking

Question

I would like to use the “RMSprop” optimizer, when training the YoloV11 model, but I did not understand how to set the parameters “rho” (Decay rate for RMSProp) and “epsilon” (constant to prevent division by zero) by myself.
When using the Ultralytics API for training and specifying parameters as:

model.train(data="data.yaml",
device=0,
epochs=400,
optimizer="RMSprop",
batch=16,
lr0=0.001,
lrf=0.01,
weight_decay=0.0001,
momentum=0.9,
rho=0.9,
epsilon=1e-8,
workers=8,
single_cls=True,
degrees=45,
perspective=0.001,
erasing=0.0,
mosaic=0,
multi_scale=True,
patience=30)

I am getting an error message that “rho” and “epsilon” are not valid parameters for training YoloV11 model.

Additional

No response

@alon-12345 alon-12345 added the question Further information is requested label Feb 7, 2025
@UltralyticsAssistant UltralyticsAssistant added the detect Object Detection issues, PR's label Feb 7, 2025
@UltralyticsAssistant
Copy link
Member

👋 Hello @alon-12345, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

It looks like you're encountering an issue while configuring the RMSProp optimizer parameters rho and epsilon for training the YoloV11 model 🔧. If this is a 🐛 Bug Report, kindly provide a Minimum Reproducible Example (MRE), including the line or code snippet where the error occurs, the exact error message, and any relevant configuration (model version, libraries, and environment details). You can refer to our Minimum Reproducible Example guide for tips on how to provide clear and helpful information.

If this is a ❓ Question, additional details like library versions, platforms used, and full code snippets where applicable will help us provide a better response. Additionally, please note that some parameters may not be supported in the current implementation of the YoloV11 model and its training API.

This is an automated response 🤖, but don't worry—an Ultralytics engineer will review your issue and provide further assistance as soon as possible. Thank you for your patience 🚀!

@pderrenger
Copy link
Member

pderrenger commented Feb 7, 2025

@alon-12345 thanks for your question about customizing RMSProp parameters in YOLO11 training! 🚀 While the Ultralytics YOLO training interface provides many optimizer options, some advanced parameters like rho and epsilon for RMSProp aren't directly exposed through the high-level API at this time.

For those wanting full control over optimizer parameters, you can implement a custom trainer by subclassing the default trainer. Here's a quick example:

from ultralytics import YOLO
from ultralytics.engine.trainer import BaseTrainer
import torch.optim as optim

class CustomTrainer(BaseTrainer):
    def build_optimizer(self, model, name="RMSprop", lr=0.001, momentum=0.9, decay=1e-5, **kwargs):
        g = self.model.split_params()
        return optim.RMSprop(g[2], lr=lr, momentum=momentum, 
                           weight_decay=decay, 
                           eps=1e-8,  # your epsilon value
                           alpha=0.9)  # rho parameter is called alpha in PyTorch

# Then use your custom trainer:
model = YOLO("yolo11n.yaml")
model.train(trainer=CustomTrainer, data="coco8.yaml", epochs=100, optimizer="RMSprop")

The current implementation uses PyTorch's RMSprop defaults (alpha=0.99, eps=1e-8), but you can modify these in the custom optimizer. For most use cases, we find the default parameters work well, but advanced users might want to tune these for specific datasets.

If you need these parameters added to the main API, feel free to open a feature request on our GitHub repo! We always appreciate community input to make YOLO more flexible. 💡

For more details on optimizer configurations, check out our Training Configuration docs.

@YustasDev
Copy link

Dear Paula,

I used the code you specified to train the model on google colab, but I get error "NotImplementedError: This task trainer doesn't support loading cfg files"

Image

What could be the problem?

@alon-12345
Copy link
Author

g = self.model.split_params()

In addition, the model has no attribute/method 'split_params'

@Y-T-G
Copy link

Y-T-G commented Feb 11, 2025

Ultralytics doesn't support setting those parameters. You would need to modify the source code.

@Laughing-q
Copy link
Member

@alon-12345 @YustasDev Hey guys! you could modify the trainer a bit to use RMSProp optimizer:

from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
import torch.optim as optim


class CustomTrainer(DetectionTrainer):
    def build_optimizer(self, model, name="RMSprop", lr=0.001, momentum=0.9, decay=1e-5, **kwargs):
        return optim.RMSprop(
            model.parameters(),
            lr=lr,
            momentum=momentum,
            weight_decay=decay,
            eps=1e-8,  # your epsilon value
            alpha=0.9,
        )  # rho parameter is called alpha in PyTorch


# Then use your custom trainer:
model = YOLO("yolo11n.yaml")
model.train(trainer=CustomTrainer, data="coco8.yaml", epochs=100)

@alon-12345
Copy link
Author

@Laughing-q ,
thank you for taking part in the discussion!
In the case you suggested, the training actually starts, but if you look at the arguments with which it started, you can see that "optimizer=auto".
If you add the argument "optimizer="RMSProp", like this:
model.train(trainer=CustomTrainer, data="data.yaml", epochs=300, optimizer="RMSprop")

Then the training also starts, and if you look at the arguments with which the training started, you can see that "optimizer=RMSProp"

But if you look at the rest of the arguments, you will see that the model training started with the default settings, and not with those defined in the "CustomTrainer".
Maybe, after all, it is somehow possible to pass hyperparameters for "RMSProp"?

Image

@Y-T-G
Copy link

Y-T-G commented Feb 11, 2025

The arguments shown don't have an effect here. They are just showing the defaults. The optimizer is being overridden in the custom code

@Laughing-q
Copy link
Member

@alon-12345 Hi the method here is to directly use the new build_optimizer we modified in CustomTrainer and pass to our model.train() method, if you want to modify some arguments yourself, you could directly modify the values in build_optimizer method, for example we want lf=0.1 and momentum=0.9:

    def build_optimizer(self, model, name="RMSprop", lr=0.001, momentum=0.9, decay=1e-5, **kwargs):
        return optim.RMSprop(
            model.parameters(),
            lr=0.1,
            momentum=0.9,
            weight_decay=decay,
            eps=1e-8,  # your epsilon value
            alpha=0.9,
        )  

then directly launch the training with CustomTrainer, no need to pass lr or momentum again

model.train(trainer=CustomTrainer, data="coco8.yaml", epochs=100)

@alon-12345
Copy link
Author

Colleagues, thank you so much for your advices!
You claim that the arguments specified in the "args.yaml" file reflect only the default settings. As far as I understand (correct me if I'm wrong), in "args.yaml" file get only those arguments that we explicitly specified in the "train()" method + default values for those arguments that we did not specify. And if you use a custom "trainer", its parameters won't end up in the "args.yaml" file, isn't?
And even though in "trainer" we, for example, use "momentum" with the value 0.9, in "args.yaml" the default value 0.937 will be specified, since we did not pass it directly to the "train()" method, right?

@pderrenger
Copy link
Member

Great observation! � You're absolutely correct. The args.yaml file captures only the explicit arguments passed to model.train() plus any defaults for unspecified parameters. When using a custom trainer, parameters modified internally in the trainer class (like optimizer hyperparameters) won't be reflected in args.yaml, since they're not passed through the train() method's argument interface.

To address this while maintaining full control over optimizer parameters, here's a professional solution:

from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
import torch.optim as optim

class CustomTrainer(DetectionTrainer):
    def build_optimizer(self, model, name="RMSprop", lr=0.001, momentum=0.9, decay=1e-5, **kwargs):
        # Your custom parameters here
        return optim.RMSprop(model.parameters(), 
                           lr=0.1,          # Custom learning rate
                           momentum=0.9,    # Your modified momentum
                           eps=1e-8, 
                           alpha=0.9)

# Train with explicit logging of critical parameters
model = YOLO("yolov11n.yaml")
results = model.train(
    trainer=CustomTrainer,
    data="coco8.yaml",
    epochs=100,
    optimizer="RMSprop",
    momentum=0.9,  # Still pass here for args.yaml logging
    lr0=0.1        # Matches custom optimizer lr
)

While the momentum in args.yaml will show the default 0.937, you can verify your actual training parameters through:

  1. The training console logs (which show the instantiated optimizer config)
  2. Directly accessing the trainer after training:
print(f"Actual momentum used: {model.trainer.optimizer.param_groups[0]['momentum']}")
print(f"Actual alpha (rho) used: {model.trainer.optimizer.param_groups[0]['alpha']}")

For permanent logging, you could add this to your custom trainer:

def build_optimizer(...):
    # ... custom optimizer creation ...
    self.args.momentum = 0.9  # Force-update args namespace
    return optimizer

This architecture pattern ensures compatibility with Ultralytics' configuration system while allowing deep customization. If you'd like to see first-class support for these parameters in future releases, feel free to open a Feature Request!

Keep up the great experimentation – this level of parameter tuning is exactly how cutting-edge results are achieved. 🔬

@alon-12345
Copy link
Author

@pderrenger
I am grateful to you for the detailed answer!
One thing remains unclear...
When training the model using optimizers: SGD, Adam, AdamW, NAdam, RAdam - training occurs normally, losses decrease, metrics (precision/recall) increase.
When using the same training parameters, but with the "RMSProp" optimizer, all losses take the value "None", and the metrics have a value equal to 0.
What could be the reason for this?

Image

Image

@pderrenger
Copy link
Member

@alon-12345 You're very welcome! I'm glad you found the previous responses helpful. It's excellent that you're diving deep into optimizer behavior. The issue you're encountering with RMSProp, where losses are None and metrics are 0, is unusual and points to a potential problem in how the loss is being computed or propagated when RMSProp is used in this specific configuration.

Here's a breakdown of potential causes and how to debug them, along with a refined code example:

Potential Causes and Debugging Steps

  1. Gradient Issues:

    • Vanishing/Exploding Gradients: RMSProp is sensitive to learning rate. With some datasets, the learning rate, even a default one, might cause extremely large or small updates.
    • Debugging:
      • Lower Learning Rate: Significantly reduce lr0 (e.g., to 1e-5 or even lower) within your CustomTrainer. This is the most likely culprit.
      • Gradient Clipping: Although not currently directly supported via the train() API, gradient clipping can help in some extreme situations. This would require more significant code modifications.
  2. Loss Function Compatibility:

    • NaN/Inf Values: There's a small chance (though less likely with standard YOLO loss functions) that some combination of your data and the RMSProp updates is leading to NaN (Not a Number) or infinite values in the loss calculation. This can happen if divisions by zero or other numerical instabilities occur.
    • Debugging:
      • Add the method below to your CustomTrainer to check for inf or nan values.
        python def on_train_epoch_end(self, *args, **kwargs): """Checks for inf / nan values in training, raises warning if found.""" if not torch.isfinite(self.loss).all(): raise FloatingPointError('Loss is inf or nan, please check model, data and all hyperparameters') if not torch.isfinite(self.tloss).all(): raise FloatingPointError('Train loss is inf or nan, please check model, data and all hyperparameters')
  3. Data Loading/Preprocessing:

    • Incorrect Data Format: While less likely if other optimizers work, it's possible (though rare) that some subtle data issue only manifests with RMSProp's specific update rule.
    • Debugging:
      • Double-check data.yaml: Ensure your dataset paths, class names, and image/label formats are absolutely correct. Use a very small subset of your data (e.g., 2 images) to rule out widespread data issues.
      • Visualize Data: Use Ultralytics' built-in dataset visualization tools to inspect your loaded data within the training loop to see if anything looks amiss.
  4. Model Initialization/Freezing:

    • Incorrect Layer Freezing: If you are freezing layers, ensure that some layers are still trainable. If all layers are frozen, the model won't learn.
    • Debugging:
      • Check Frozen Layers: Print model.modules() and inspect which layers have requires_grad=False.

Refined Code Example (with integrated debugging checks):

from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
import torch.optim as optim
import torch

class CustomTrainer(DetectionTrainer):

    def build_optimizer(self, model, name="RMSprop", lr=0.001, momentum=0.9, decay=1e-5, **kwargs):
        # VERY IMPORTANT: Try a much lower learning rate first.
        return optim.RMSprop(model.parameters(), 
                               lr=1e-5,       # Start very small
                               momentum=momentum,
                               weight_decay=decay,
                               eps=1e-8,
                               alpha=0.9)

    def on_train_epoch_end(self, *args, **kwargs):
        """Checks for inf / nan values in training, raises warning if found."""
        if not torch.isfinite(self.loss).all():
            raise FloatingPointError('Loss is inf or nan, please check model, data and all hyperparameters')
        if not torch.isfinite(self.tloss).all():
            raise FloatingPointError('Train loss is inf or nan, please check model, data and all hyperparameters')

# Train with the custom trainer and EXPLICITLY set parameters
model = YOLO("yolov11n.yaml")  # or your model
results = model.train(
    trainer=CustomTrainer,
    data="coco8.yaml",  # or your data.yaml
    epochs=100,
    imgsz=640,
    batch=16,
    device=0,
    # optimizer="RMSprop",  # No need to specify here, handled by CustomTrainer
    lr0=1e-5,  # MUST match the lr in CustomTrainer.  For logging in args.yaml.
    momentum=0.9, # For logging purposes
)

# After training, check the optimizer's actual learning rate
print(f"Actual learning rate used: {model.trainer.optimizer.param_groups[0]['lr']}")

Key Changes and Why:

  • Lower Learning Rate (Crucial): The lr=1e-5 in build_optimizer is the most important change. Start very low.
  • on_train_epoch_end: This method will check for inf or nan values.
  • Explicit lr0 and momentum in model.train: While the CustomTrainer sets the optimizer, passing these to model.train ensures they are correctly recorded in args.yaml for reproducibility and logging. The lr0 value must match what's in CustomTrainer.
  • Post-Training Check: The print statement after training verifies the actual learning rate used by the optimizer.

Troubleshooting Steps (in order):

  1. Run the Refined Code: Use the code above exactly as provided, with the very low learning rate.
  2. Check for Errors: See if the FloatingPointError is raised. If so, it confirms numerical instability.
  3. Gradually Increase lr: If the low learning rate works (losses are no longer None), slowly increase lr (e.g., 1e-5, 5e-5, 1e-4, etc.) in build_optimizer and lr0 in model.train, re-running each time, until you find a value that works well or until you encounter the None loss again.
  4. Inspect args.yaml: Check the saved args.yaml in your runs directory to confirm that the parameters you set are being recorded.

If you still have issues after these steps, please provide the following, and I'll assist further:

  • The complete output of the training run (including any error messages).
  • Your data.yaml file (or a minimal version that reproduces the problem).
  • Confirmation that you've tried the exact code provided above.

By systematically working through these steps, we can pinpoint the root cause of the issue and get your RMSProp training working correctly.

@alon-12345
Copy link
Author

@pderrenger
Thank you so much for the clarification, I wasn't even hoping for such a detailed answer.

@pderrenger
Copy link
Member

@alon-12345 You're very welcome! I'm thrilled to hear the detailed explanation was helpful. That's what we strive for – empowering users like you to understand and push the boundaries of what's possible with YOLO. 😊 We appreciate you taking the time to engage so thoroughly with the debugging process. It helps us improve the framework for everyone. Don't hesitate to ask if you have any more questions as you continue your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detect Object Detection issues, PR's question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants