🚀 Composer v0.10.0

Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

pip install --upgrade mosaicml==0.10.0

New Features

☄️ Comet Experiment Tracking (#1490)

We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:
```
from composer import Trainer
from composer.loggers import CometMLLogger

cometml_logger = CometMLLogger()

trainer = Trainer(
    ...
    loggers=[cometml_logger],
)
```
Please see our Logging and CometMLLogger docs pages for details on usage.
🪄 Automatic Evaluation Batch Size Selection (#1417)

Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.
🎯 Evaluation API Changes (#1479)

The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:
```
trainer = Trainer(
    eval_dataloader=...
)
trainer.eval()
```
Alternatively, the eval_dataloader can be passed directly to the eval() method:
```
trainer = Trainer(
    ...
)
trainer.eval(
    eval_dataloader=...
)
```
The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.
🪵 Simplified Logging (#1416)

We've significantly simplified our internal logging interface:
- Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.
- For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.
🎯 validate --> eval_forward (#1411 , #1419)

Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

An example implementation for classification can be found in our ComposerClassifer base class:
```
    def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None:
        _, targets = batch
        metric.update(outputs, targets)

    def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
        return outputs if outputs is not None else self.forward(batch)
```
🕵️‍♀️ Evaluator changes

The Evaluator class now stores evaluation metric names instead of metric instances. For example:
```
glue_mrpc_task = Evaluator(
    label='glue_mrpc',
    dataloader=mrpc_dataloader,
    metric_names=['BinaryF1Score', 'Accuracy']
)
```
These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.
🚧 Streaming Datasets Repository Preview

We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.
❌ YAHP deprecation

We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

Bug Fixes

Documentation fixes (#1408, #1422, #1425, #1413, #1432, #1403, #1426, #1396, #1446, #1466, #1443)
Upgrade WandB version (#1440)
fix import (#1442)
fix wrong extra deps group (#1449)
wandb bug fix (#1488)
Reset train metrics every batch (#1496)
fix auto grad accum (#1515)
Fix compression file remote download exception handling (#1526)
Add Pandoc to Docker images, bump version to 2.19.2 (#1550)

What's Changed

current metrics docs by @A-Jacobson in #1402
merge nlp+hf notebooks by @A-Jacobson in #1406
Add break epoch exception by @mvpatel2000 in #1415
Upgrade to torch 1.12.1 by @abhi-mosaic in #1409
Metrics refactor pt1 by @ishanashastri in #1411
Use state algos by @mvpatel2000 in #1412
Add default ignore index by @moinnadeem in #1421
Update default hparams for ResNet model card by @abhi-mosaic in #1423
update colout link in custom speedup notebook by @A-Jacobson in #1408
Clean up prose in key files by @dblalock in #1422
Relax codeowners by @bandish-shah in #1424
Fix typo by @Landanjs in #1425
Fix pre-commit checks failing on fresh checkout of dev by @dblalock in #1414
Have docs use preferred import paths, not longest import paths by @dblalock in #1413
Fix missing indent by @Landanjs in #1432
eval_batch_size=auto by @mvpatel2000 in #1417
Simplify helper for conflicting files by @hanlint in #1427
add install from dev instructions by @A-Jacobson in #1403
Style/tone consistency update for tutorial notebooks by @alextrott16 in #1426
Dynamic quantization + minor improvements in inference APIs by @dskhudia in #1433
Upgrade WandB version by @moinnadeem in #1440
Log multiple losses by @Landanjs in #1375
Fix attribute by @mvpatel2000 in #1442
Expand evaluation doc by @alextrott16 in #1396
Metrics Refactor Part 2 by @ishanashastri in #1419
Create dependabot.yml by @mvpatel2000 in #1448
Methods overview fix by @growlix in #1446
Bump custom-inherit from 2.3.2 to 2.4.0 by @dependabot in #1451
Bump junitparser from 2.4.3 to 2.8.0 by @dependabot in #1453
Update moto[s3] requirement from <3.2,>=3.1.12 to >=4.0.1,<5 by @dependabot in #1450
Update monai requirement from <0.9,>=0.8.0 to >=0.9.0,<0.10 by @dependabot in #1452
Update torch-optimizer requirement from <0.2,>=0.1.0 to >=0.3.0,<0.4 by @dependabot in #1454
Bump cryptography from 37.0.2 to 37.0.4 by @dependabot in #1457
Bump sphinxext-opengraph from 0.6.1 to 0.6.3 by @dependabot in #1458
Bump coverage[toml] from 6.3.2 to 6.4.4 by @dependabot in #1460
Bump nbsphinx from 0.8.8 to 0.8.9 by @dependabot in #1459
Fix incorrect deps group in streaming requirement by @hanlint in #1449
Logger Destination Refactor by @eracah in #1416
Bump sphinx-markdown-tables from 0.0.15 to 0.0.17 by @dependabot in #1463
Bump traitlets from 5.1.1 to 5.3.0 by @dependabot in #1462
Bump vit-pytorch from 0.27 to 0.35.8 by @dependabot in #1465
Bump furo from 2022.3.4 to 2022.6.21 by @dependabot in #1467
Bump ipykernel from 6.9.2 to 6.15.1 by @dependabot in #1470
Bump pytest from 7.1.0 to 7.1.2 by @dependabot in #1469
Bump sphinxcontrib-katex from 0.8.6 to 0.9.0 by @dependabot in #1476
Bump tabulate from 0.8.9 to 0.8.10 by @dependabot in #1478
Bump yamllint from 1.26.3 to 1.27.1 by @dependabot in #1481
Bump ipykernel from 6.15.1 to 6.15.2 by @dependabot in #1482
Refactor CheckpointSaver by @hanlint in #1428
Clean up docs Makefile by @eracah in #1466
Model surgery info -> debug by @mvpatel2000 in #1485
Docker image with Flash Attention by @abhi-mosaic in #1471
Fix WandBLogger bug with inaccurate step count by @eracah in #1488
Update Eval API by @hanlint in #1479
Random Names with Fixed Seed by @mvpatel2000 in #1487
ResNet50 on ImageNet training script example by @Landanjs in #1434
Remove hparams from test_precision and test_state by @hanlint in #1486
Clean up save_checkpoint by @hanlint in #1484
Remove hparams from test_ddp by @hanlint in #1489
update model token embeddings according to tokenizer len by @ananyahjha93 in #1493
BERT classifier metrics depend on num_labels by @alextrott16 in #1495
Reset train metrics every batch by @abhi-mosaic in #1496
Algolia doc search by @nqn in #1443
Squelch Engine debug logs by @hanlint in #1497
Remove TODO by @mvpatel2000 in #1499
Remove hparams from checkpoint tests by @hanlint in #1491
[Docs] Training ResNet-50 on AWS tutorial by @bandish-shah in #1444
Refactor hparams in tests by @hanlint in #1498
Bump pytest from 7.1.2 to 7.1.3 by @dependabot in #1500
Improved comments and improved test code by @karan6181 in #1502
Refactor GLUE fine-tune queuing to improve efficiency and add task-specific seed sweeps by @alextrott16 in #1363
Raise ValueError for Profiler + Auto Grad Accum by @mvpatel2000 in #1504
add yahp deprecation warnings by @hanlint in #1505
Move logic from initialize_object to object store class by @hanlint in #1508
Fix run name comment by @mvpatel2000 in #1509
Add CometML Support by @eracah in #1490
Raise ValueError if missing a surgery algorithm by @mvpatel2000 in #1506
remove datasets from gitignore by @hanlint in #1513
fix auto grad accum by @mvpatel2000 in #1515
Use eval context by @mvpatel2000 in #1516
Update tensorflow-io requirement from <0.27,>=0.26.0 to >=0.26.0,<0.28 by @dependabot in #1522
Bump cryptography from 37.0.4 to 38.0.1 by @dependabot in #1521
Fix SAM loss by @mvpatel2000 in #1518
Fixed remote path in streaming dataloader facesynthetics jupyter notebook by @karan6181 in #1519
Rework auto grad accum checks by @mvpatel2000 in #1517
[xs] remove libcloudhparams from test_filehelpers.py by @hanlint in #1514
Add v2 datasets behind a version flag by @knighton in #1507
Fix compression file remote download exception handling. by @knighton in #1526

New Contributors

@ananyahjha93 made their first contribution in #1493

Full Changelog: v0.9.0...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.0

🚀 Composer v0.10.0

New Features

Bug Fixes

What's Changed

New Contributors

Contributors