v0.10.0
🚀 Composer v0.10.0
Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!
pip install --upgrade mosaicml==0.10.0
New Features
-
☄️ Comet Experiment Tracking (#1490)
We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the
Trainer
object at initialization:from composer import Trainer from composer.loggers import CometMLLogger cometml_logger = CometMLLogger() trainer = Trainer( ... loggers=[cometml_logger], )
Please see our Logging and CometMLLogger docs pages for details on usage.
-
🪄 Automatic Evaluation Batch Size Selection (#1417)
Composer now supports
eval_batch_size='auto'
, which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction withgrad_accum='auto'
, you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs. -
🎯 Evaluation API Changes (#1479)
The Evaluation API has been updated to be consistent with the Trainer API. If the
eval_dataloader
was provided to the Trainer during initialization,eval
can be invoked without needing to provide anything additional:trainer = Trainer( eval_dataloader=... ) trainer.eval()
Alternatively, the
eval_dataloader
can be passed directly to theeval()
method:trainer = Trainer( ... ) trainer.eval( eval_dataloader=... )
The
eval_dataloader
can be a pytorch dataloader, or for multiple metrics, a list ofEvaluator
objects. -
🪵 Simplified Logging (#1416)
We've significantly simplified our internal logging interface:
- Removed the use of
LogLevel
throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger. - For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods:
log_metrics
,log_hyperparameters
, andlog_artifacts
. Previous calls todata_fit, data_epeoch, ..
have been removed.
- Removed the use of
-
🎯 validate --> eval_forward (#1411 , #1419)
Previously,
ComposerModel
implemented thevalidate(batch: Any) -> Tuple[Any, Any]
method which returns an(input, target)
tuple, and the Trainer handles updating the metrics. Inv0.10
, we return the metrics updating control to the user.Now, models instead implement
def eval_forward(batch: Any)
which returns the outputs of evaluation, and alsodef update_metric(batch, outputs, metric)
which updates the metric.An example implementation for classification can be found in our
ComposerClassifer
base class:def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None: _, targets = batch metric.update(outputs, targets) def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any: return outputs if outputs is not None else self.forward(batch)
-
🕵️♀️ Evaluator changes
The
Evaluator
class now stores evaluation metric names instead of metric instances. For example:glue_mrpc_task = Evaluator( label='glue_mrpc', dataloader=mrpc_dataloader, metric_names=['BinaryF1Score', 'Accuracy'] )
These metric names are matched against the metrics returned by the
ComposerModel
. The metric instances are now stored as deep copies in theState
class asstate.train_metrics
orstate.eval_metrics
. -
🚧 Streaming Datasets Repository Preview
We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch
IterableDataset
objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo. -
❌ YAHP deprecation
We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.
Bug Fixes
- Documentation fixes (#1408, #1422, #1425, #1413, #1432, #1403, #1426, #1396, #1446, #1466, #1443)
- Upgrade WandB version (#1440)
- fix import (#1442)
- fix wrong extra deps group (#1449)
- wandb bug fix (#1488)
- Reset train metrics every batch (#1496)
- fix auto grad accum (#1515)
- Fix compression file remote download exception handling (#1526)
- Add Pandoc to Docker images, bump version to 2.19.2 (#1550)
What's Changed
- current metrics docs by @A-Jacobson in #1402
- merge nlp+hf notebooks by @A-Jacobson in #1406
- Add break epoch exception by @mvpatel2000 in #1415
- Upgrade to torch 1.12.1 by @abhi-mosaic in #1409
- Metrics refactor pt1 by @ishanashastri in #1411
- Use state algos by @mvpatel2000 in #1412
- Add default ignore index by @moinnadeem in #1421
- Update default hparams for ResNet model card by @abhi-mosaic in #1423
- update colout link in custom speedup notebook by @A-Jacobson in #1408
- Clean up prose in key files by @dblalock in #1422
- Relax codeowners by @bandish-shah in #1424
- Fix typo by @Landanjs in #1425
- Fix pre-commit checks failing on fresh checkout of dev by @dblalock in #1414
- Have docs use preferred import paths, not longest import paths by @dblalock in #1413
- Fix missing indent by @Landanjs in #1432
- eval_batch_size=auto by @mvpatel2000 in #1417
- Simplify helper for conflicting files by @hanlint in #1427
- add install from dev instructions by @A-Jacobson in #1403
- Style/tone consistency update for tutorial notebooks by @alextrott16 in #1426
- Dynamic quantization + minor improvements in inference APIs by @dskhudia in #1433
- Upgrade WandB version by @moinnadeem in #1440
- Log multiple losses by @Landanjs in #1375
- Fix attribute by @mvpatel2000 in #1442
- Expand evaluation doc by @alextrott16 in #1396
- Metrics Refactor Part 2 by @ishanashastri in #1419
- Create dependabot.yml by @mvpatel2000 in #1448
- Methods overview fix by @growlix in #1446
- Bump custom-inherit from 2.3.2 to 2.4.0 by @dependabot in #1451
- Bump junitparser from 2.4.3 to 2.8.0 by @dependabot in #1453
- Update moto[s3] requirement from <3.2,>=3.1.12 to >=4.0.1,<5 by @dependabot in #1450
- Update monai requirement from <0.9,>=0.8.0 to >=0.9.0,<0.10 by @dependabot in #1452
- Update torch-optimizer requirement from <0.2,>=0.1.0 to >=0.3.0,<0.4 by @dependabot in #1454
- Bump cryptography from 37.0.2 to 37.0.4 by @dependabot in #1457
- Bump sphinxext-opengraph from 0.6.1 to 0.6.3 by @dependabot in #1458
- Bump coverage[toml] from 6.3.2 to 6.4.4 by @dependabot in #1460
- Bump nbsphinx from 0.8.8 to 0.8.9 by @dependabot in #1459
- Fix incorrect deps group in
streaming
requirement by @hanlint in #1449 - Logger Destination Refactor by @eracah in #1416
- Bump sphinx-markdown-tables from 0.0.15 to 0.0.17 by @dependabot in #1463
- Bump traitlets from 5.1.1 to 5.3.0 by @dependabot in #1462
- Bump vit-pytorch from 0.27 to 0.35.8 by @dependabot in #1465
- Bump furo from 2022.3.4 to 2022.6.21 by @dependabot in #1467
- Bump ipykernel from 6.9.2 to 6.15.1 by @dependabot in #1470
- Bump pytest from 7.1.0 to 7.1.2 by @dependabot in #1469
- Bump sphinxcontrib-katex from 0.8.6 to 0.9.0 by @dependabot in #1476
- Bump tabulate from 0.8.9 to 0.8.10 by @dependabot in #1478
- Bump yamllint from 1.26.3 to 1.27.1 by @dependabot in #1481
- Bump ipykernel from 6.15.1 to 6.15.2 by @dependabot in #1482
- Refactor CheckpointSaver by @hanlint in #1428
- Clean up docs Makefile by @eracah in #1466
- Model surgery info -> debug by @mvpatel2000 in #1485
- Docker image with Flash Attention by @abhi-mosaic in #1471
- Fix WandBLogger bug with inaccurate step count by @eracah in #1488
- Update Eval API by @hanlint in #1479
- Random Names with Fixed Seed by @mvpatel2000 in #1487
- ResNet50 on ImageNet training script example by @Landanjs in #1434
- Remove hparams from
test_precision
andtest_state
by @hanlint in #1486 - Clean up
save_checkpoint
by @hanlint in #1484 - Remove hparams from test_ddp by @hanlint in #1489
- update model token embeddings according to tokenizer len by @ananyahjha93 in #1493
- BERT classifier metrics depend on num_labels by @alextrott16 in #1495
- Reset train metrics every batch by @abhi-mosaic in #1496
- Algolia doc search by @nqn in #1443
- Squelch Engine debug logs by @hanlint in #1497
- Remove TODO by @mvpatel2000 in #1499
- Remove hparams from checkpoint tests by @hanlint in #1491
- [Docs] Training ResNet-50 on AWS tutorial by @bandish-shah in #1444
- Refactor hparams in tests by @hanlint in #1498
- Bump pytest from 7.1.2 to 7.1.3 by @dependabot in #1500
- Improved comments and improved test code by @karan6181 in #1502
- Refactor GLUE fine-tune queuing to improve efficiency and add task-specific seed sweeps by @alextrott16 in #1363
- Raise ValueError for Profiler + Auto Grad Accum by @mvpatel2000 in #1504
- add yahp deprecation warnings by @hanlint in #1505
- Move logic from
initialize_object
to object store class by @hanlint in #1508 - Fix run name comment by @mvpatel2000 in #1509
- Add CometML Support by @eracah in #1490
- Raise ValueError if missing a surgery algorithm by @mvpatel2000 in #1506
- remove datasets from gitignore by @hanlint in #1513
- fix auto grad accum by @mvpatel2000 in #1515
- Use eval context by @mvpatel2000 in #1516
- Update tensorflow-io requirement from <0.27,>=0.26.0 to >=0.26.0,<0.28 by @dependabot in #1522
- Bump cryptography from 37.0.4 to 38.0.1 by @dependabot in #1521
- Fix SAM loss by @mvpatel2000 in #1518
- Fixed remote path in streaming dataloader facesynthetics jupyter notebook by @karan6181 in #1519
- Rework auto grad accum checks by @mvpatel2000 in #1517
- [xs] remove libcloudhparams from
test_filehelpers.py
by @hanlint in #1514 - Add v2 datasets behind a version flag by @knighton in #1507
- Fix compression file remote download exception handling. by @knighton in #1526
New Contributors
- @ananyahjha93 made their first contribution in #1493
Full Changelog: v0.9.0...v0.10.0