Skip to content

Releases: mosaicml/composer

v0.13.1

07 Mar 03:11
Compare
Choose a tag to compare

🚀 Composer v0.13.1

Introducing the composer PyPi package!

Composer v0.13.1 is released!

Composer can also now be installed using the new composer PyPi package via pip:

pip install composer==0.13.1

The legacy package name still works via pip:

pip install mosaicml==0.13.1

Note: The mosaicml==0.13.0 PyPi package was yanked due to some minor packaging issues discovered after release. The package was re-released as Composer v0.13.1, thus these release notes contain details for both v0.13.0 and v0.13.1.

New Features

  1. 🤙 New and Updated Callbacks

    • New HealthChecker Callback (#2002)

      The callback will log a warning if the GPUs on a given node appear to be in poor health (low utilization). The callback can also be configured to send a Slack message!

      from composer import Trainer
      from composer.callbacks import HealthChecker
      
      # Warn if GPU utilization difference drops below 10%
      health_checker = HealthChecker(
          threshold = 10
      )
      
      # Construct Trainer
      trainer = Trainer(
          ...,
          callbacks=health_checker,
      )
      
      # Train!
      trainer.fit()
    • Updated MemoryMonitor to use GigaBytes (GB) units (#1940)

    • New RuntimeEstimator Callback (#1991)

      Estimate the remaining runtime of your job! Approximates the time remaining by observing the throughput and comparing to the number of batches remaining.

      from composer import Trainer
      from composer.callbacks import RuntimeEstimator
      
      # Construct trainer with RuntimeEstimator callback
      trainer = Trainer(
          ...,
          callbacks=RuntimeEestimator(),
      )
      
      # Train!
      trainer.fit()
    • Updated SpeedMonitor throughput metrics (#1987)

      Expands throughput metrics to track relative to several different time units and per device:

      • throughput/batches_per_sec and throughput/device/batches_per_sec
      • throughput/tokens_per_sec and throughput/device/tokens_per_sec
      • throughput/flops_per_sec and throughput/device/flops_per_sec
      • throughput/device/samples_per_sec

      Also adds throughput/device/mfu metric to compute per device MFU. Simply enable the SpeedMonitor callback per usual to log these new metrics! Please see SpeedMonitor documentation for more information.

  2. ⣿ FSDP Sharded Checkpoints (#1902)

    Users can now specify the state_dict_type in the fsdp_config dictionary to enable sharded checkpoints. For example:

    from composer import Trainer
    
    fsdp_confnig = {
        'sharding_strategy': 'FULL_SHARD',
        'state_dict_type': 'local',
    }
    
    trainer = Trainer(
        ...,
        fsdp_config=fsdp_config,
        save_folder='checkpoints',
        save_filename='ba{batch}_rank{rank}.pt',
        save_interval='10ba',
    )

    Please see the PyTorch FSDP docs and Composer's Distributed Training notes for more information.

  3. 🤗 HuggingFace Improvements

    • Update HuggingFaceModel class to support encoder-decoder batches without decoder_input_ids (#1950)
    • Allow evaluation metrics to be passed to HuggingFaceModel directly (#1971)
    • Add a utility function to load a Composer checkpoint of a HuggingFaceModel and write out the expected config.json and pytorch_model.bin in the HuggingFace pretrained folder (#1974)
  4. 🛟 Nvidia H100 Alpha Support - Added amp_fp8 data type

    In preparation for H100's arrival, we've added the amp_fp8 precision type. Currently setting amp_fp8 specifies a new precision context using transformer_engine.pytorch.fp8_autocast. For more details, please see Nvidia's new Transformer Engine and the specific fp8 recipe we utilize.

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        precision='amp_fp8',
    )

API changes

  • The torchmetrics package has been upgraded to 0.11.x.

    The torchmetrics.Accuracy metric now requires a task argument which can take on a value of binary, multiclass or multilabel. Please see Torchmetrics Accuracy docs for details.

    Additonally, since specifying value='multiclass' requires an additional field of num_classes to be specified, we've had to update ComposerClassifier to accept the additional num_classes argument. Please see PR's #2017 and #2025 for additional details

  • Surgery algorithms used in functional form return a value of None (#1543)

Deprecations

  • Deprecate HFCrossEntropy and Perplexity (#1857)
  • Remove Jenkins CI (#1943, #1954)
  • Change Deprecation Warnings to Warnings for specifying ProgressBarLogger and ConsoleLogger to loggers (#1846)

Bug Fixes

  • Fixed an issue introduced in 0.12.1 where HuggingFaceModel crashes if config.return_dict = False (#1948)
  • Refactor EMA to improve memory efficiency (#1941)
  • Make wandb checkpoint logging compatible with wandb model registry (#1973)
  • Fix ICL race conditions (#1978)
  • Update epoch metric name to trainer/epoch (#1986)
  • reset scaler (#1999)
  • Bug/sync optimization logger across ranks (#1970)
  • Update Docker images to fix resolve vulnerability scan issues (#2007)
  • Fix eval duplicate logging issue (#2018)
  • extend test and patch bug (#2028)
  • Protect for missing slack_sdk import (#2031)

Known Issues

  • Docker Image Security Vulnerability
    • CVE-2022-45907: The mosaicml/pytorch:1.12.1*, mosaicml/pytorch:1.11.0*, mosaicml/pytorch_vision:1.12.1* and mosaicml/pytorch_vision:1.11.0* images are impacted and currently supported for legacy use cases. We recommend users upgrade to images with PyTorch >1.13. The affected images will be removed in the next Composer release.

What's Changed

Read more

v0.13.0

07 Mar 03:10
3618c63
Compare
Choose a tag to compare

This release has been yanked due to a minor packaging issue, please skip directly to Composer v0.13.1

What's Changed

New Contributors

Full Changelog: v0.12.1...v0.13.0

v0.12.1

05 Feb 09:19
Compare
Choose a tag to compare

🚀 Composer v0.12.1

Composer v0.12.1 is released! Install via pip:

pip install --upgrade mosaicml==0.12.1

New Features

  1. 📚 In-Context Learning (#1876)

    With Composer and MosaicML Cloud you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. Please see our "Blazingly Fast LLM Evaluation for In-Context Learning" blog post for more details!

  2. 💾 Added support for Coreweave Object Storage (#1915)

    Coreweave object store is compatible with boto3. Uploading objects to Coreweave object store is almost exactly like writing to using S3, except an endpoint_url must be set via the S3_ENDPOINT_URLenvironment variable. For example:

    import os
    os.environ['S3_ENDPOINT_URL'] = 'https://object.las1.coreweave.com'
    
    from composer.trainer import Trainer
    
    # Save checkpoints every epoch to s3://my_bucket/checkpoints
    trainer = Trainer(
        model=model,
        train_dataloader=train_dataloader,
        max_duration='10ep',
        save_folder='s3://my_bucket/checkpoints',
        save_interval='1ep',
        save_overwrite=True,
        save_filename='ep{epoch}.pt',
        save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
     )
    
     trainer.fit()

    Please see our checkpointing documentation for more details.

  3. 🪵 Automatic logging of Trainer hparams (#1855)

    Hyperparameter arguments passed to the Trainer are now automatically logged. Simply set the Trainer argument auto_log_hparams=True.

Bug Fixes

  • Update Docker images to use ‘posix_prefix’ paths (#1854)
  • Disable new notebook in CI (#1875)
  • [Fix] Enable logging of metrics from Callbacks to ConsoleLogging (#1884)
  • Ensure loggers run init event before callbacks in Engine (#1890)
  • Raise an error in FSDP meta tensor initialization if there's no initialization functions, fix associated flaky FSDP test (#1905)
  • Add primitive list support (#1906)
  • Add logic for shifting labels before computing metrics (#1913)
  • Fixes mis specified dependency (#1919)
  • pin setuptools in build requirements (#1926)
  • Pin pip<23 in Docker images (#1936)
  • Fix bug in trainer.eval and add test cases for test_console_logger (#1937)

What's Changed

Read more

v0.12.0

23 Dec 00:13
Compare
Choose a tag to compare

🚀 Composer v0.12.0

Composer v0.12.0 is released! Install via pip:

pip install mosaicml==0.12.0

New Features

  1. 🪵 Logging and ObjectStore Enhancements

    There are multiple improvements to our logging and object store support in this release.

    • Image visualization using our CometMLLogger (#1710)

      We've added support for using our ImageVisualizer callback with CometML to log images and segmentation masks to CometML.

      from composer.trainer import Trainer
      
      trainer = Trainer(...,
          callbacks=[ImageVisualizer()],
          loggers=[CometMLLogger()]
      )
    • Added direct support for Oracle Cloud Infrastructure (OCI) as an ObjectStore (#1774) and support for Google Cloud Storage (GCS) via URI (#1833)

      To use, you can simply set your save_folder or load_path to a URI beginning with oci:// or gs://, to save and load with OCI and GCS respectively.

      from composer.trainer import Trainer
      
      # Checkpoint saving to Google Cloud Storage.
      trainer = Trainer(
          model=model,
          save_folder="gs://my-bucket/{run_name}/checkpoints",
          run_name='my-run',
          save_interval="1ep",
          save_filename="ep{epoch}.pt",
          save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
          ...
      )
      
      trainer.fit()
    • Added basic support for logging with MLFlow (#1795)

      We've added basic support for using MLFlow to log experiment metrics.

      from composer.loggers import MLFlowLogger
      from composer.trainer import Trainer
      
      mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name,
                                   run_name=mlflow_run_name,
                                   tracking_uri=mlflow_uri)
      trainer = Trainer(..., loggers=[mlflow_logger])
    • Simplified console and progress bar logging (#1694)

      To turn off the progress bar, set progress_bar=False. To turn on logging directly to the console, set log_to_console=True. To control the frequency of logging to console, set console_log_interval (e.g. to 1ep or 1ba).

    • getfile supports URIs (#1750)

      Our get_file utility now supports URIs directly (s3://, oci://, and gs://) for downloading files.

  2. 🏃‍♀️ Support for Mid-Epoch Resumption with the latest release of Streaming

    We've added support in Composer for the latest release of our Streaming library. This includes awesome new features like instant mid epoch resumption and deterministic shuffling, regardless of the number of nodes. See the Streaming release notes for more!

  3. 🚨 New algorithm - GyroDropout!

    Thanks to @jelite for adding a new algorithm, GyroDropout to Composer! Please see the method card for more details.

  4. 🤗 HuggingFace + Composer improvements

    We've added a new utility to load a 🤗 HuggingFace model and tokenizer out of a Composer checkpoint (#1754), making the pretraining -> finetuning workflow even easier in Composer. Check out the docs for more details, and our example notebook for a full tutorial (#1775)!

  5. 🎓 GradMonitor -> OptimizerMonitor

    Renames our GradMonitor callback to OptimizerMonitor, and adds the ability to track optimizer specific metrics. Check out the docs for more details, and add to your code just like any other callback!

    from composer.callbacks import OptimizerMonitor
    from composer.trainer import Trainer
    
    trainer = Trainer(
        ..., 
        callbacks=[OptimizerMonitor(log_optimizer_metrics=log_optimizer_metrics)]
    )
  6. 🐳 New PyTorch and CUDA versions

    We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:

    • mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04
    • mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04

    The mosaicml/pytorch:latest, mosaicml/pytorch:cpu_latest and mosaicml/composer:0.12.0 tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.

API changes

  1. Replace grad_accum with device_train_microbatch_size (#1749, #1776)

    We're deprecating the grad_accum Trainer argument in favor of the more intuitive device_train_microbatch_size. Instead of thinking about how to divide your specified minibatch into microbatches, simply specify the size of your microbatch. For example, let's say you want to split your minibatch of 2048 into two microbatches of 1024:

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        device_train_microbatch_size=1024,
    )

    If you want Composer to tune the microbatch for you automatically, enable automatic microbatching as follows:

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        device_train_microbatch_size='auto',
    )

    The grad_accum argument is still supported but will be deprecated in the next Composer release.

  2. Renamed precisions (#1761)

    We've renamed precision attributes for clarity. The following values have been removed: ['amp', 'fp16', bf16'].

    We have added the following values, prefixed with 'amp' to clarify when an Automatic Mixed Precision type is being used: ['amp_fp16', 'amp_bf16'].

    The fp32 precision value remains unchanged.

Deprecations

  1. Removed support for YAHP (#1512)
  2. Removed COCO and SSD datasets (#1717)
  3. Fully removed Streaming v1 support, please see the mosaicml/streaming project for our next-gen streaming datasets (#1787)
  4. Deprecated FusedLayerNorm algorithm (#1789)
  5. Fully removed grad_clip_norm training argument, please use the GradientClipping algorithm instead (#1768)
  6. Removed data_fit, data_epoch, and data_batch from Logger (#1826)

Bug Fixes

  • Fix FSDP checkpoint strategy (#1734)
  • Fix gradient clipping with FSDP (#1740)
  • Adds more supported FSDP config flags (sync_module_states, forward_prefecth, limit_all_gathers) (#1794)
  • Allow FULL precision with FSDP (#1796)
  • Fix eval_microbatch modification on EVAL_BEFORE_FORWARD event (#1739)
  • Fix algorithm API backwards compatibility in checkpoints (#1741)
  • Fixes a bad None check preventing setting device_id to 0 (#1767)
  • Unregister engine to make cleaning up memory easier (#1769)
  • Fix issue if metric_names is not a list (#1798)
  • Match implementation for list and tensor batch splitting (#1804)
  • Fixes infinite eval issue (#1815)

What's Changed

Read more

v0.11.1

16 Nov 21:41
Compare
Choose a tag to compare

🚀 Composer v0.11.1

Composer v0.11.1 is released! Install via pip:

pip install --upgrade mosaicml==0.11.1

Bug Fixes

  • Fixes for Notebooks (#1659)
  • Documentation updates and fixes (#1685, #1696, #1702, #1709)
  • Addressed warnings and speed improvements for Torchmetrics (#1674)
  • Fixes to Gated Linear Units method (#1575, #1689)
  • Set NCCL_ASYNC_ERROR_HANDLING ENV variable in Composer launcher to enable distributed timeout (#1695)
  • Fix epoch count when eval is called before fit (#1697)
  • Constrain PyTorch package versions to avoid unintended upgrades (#1688)
  • Fix Optimizer state sharding issue with FSDP (#1732)
  • Rase ValueError with if evaluation dataloader of infinite length is specified

Full Changelog: v0.11.0...v0.11.1

v0.11.0

25 Oct 00:36
Compare
Choose a tag to compare

🚀 Composer v0.11.0

Composer v0.11.0 is released! Install via pip:

pip install --upgrade mosaicml==0.11.0

New Features

  1. 🧰 FSDP Beta Support

    Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.

    Here's how easy it is to use FSDP with Composer:

    import torch.nn as nn
    from composer import Trainer
    
    class Block (nn.Module):
        ...
    
    # Your custom model
    class Model(nn.Module):
        def __init__(self, n_layers):
            super().__init__()
            self.blocks = nn.ModuleList([
                Block(...) for _ in range(n_layers)
            ]),
            self.head = nn.Linear(...)
        def forward(self, inputs):
            ...
    
        # FSDP Wrap Function
        def fsdp_wrap_fn(self, module):
            return isinstance(module, Block)
    
        # Activation Checkpointing Function
        def activation_checkpointing_fn(self, module):
            return isinstance(module, Block)
    
    # ComposerModel wrapper, used by the Trainer
    # to compute loss, metrics, etc.
    class MyComposerModel(ComposerModel):
    
        def __init__(self, n_layers):
            super().__init__()
            self.model = Model(n_layers)
            ...
    
        def forward(self, batch):
            ...
    
        def eval_forward(self, batch, outputs=None):
            ...
    
        def loss(self, outputs, batch):
            ...
    
    # Pass your ComposerModel and fsdp_config into the Trainer
    composer_model = MyComposerModel(n_layers=3)
    fsdp_config = {
        'sharding_strategy': 'FULL_SHARD',
        'min_params': 1e8,
        'cpu_offload': False, # Not supported yet
        'mixed_precision': 'DEFAULT',
        'backward_prefetch': 'BACKWARD_POST',
        'activation_checkpointing': False,
        'activation_cpu_offload': False,
        'verbose': True
    }
    
    trainer = Trainer(
        model=composer_model,
        fsdp_config=fsdp_config,
        ...
    )
    
    trainer.fit()

    For more information, please see our FSDP docs.

  2. 🚰 Streaming v0.1

    We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for Torch IterableDataset, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

    To get started, install the Streaming PyPi package:

    pip install mosaicml-streaming

    You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:

    import torch
    from streaming import Dataset
    
    dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))

    For more information, please check out the Streaming docs.

  3. ✔👉 Simplified Checkpointing Interface

    With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

    To save checkpoints to S3, all you need to do is:

    • Specify with save_folder your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints')
    • Optionally, set save_filename to the pattern you want for your checkpoint file names
    from composer.trainer import Trainer
    
    # Checkpoint saving to S3.
    trainer = Trainer(
        model=model,
        save_folder="s3://my-bucket/{run_name}/checkpoints",
            run_name='my-run',
        save_interval="1ep",
        save_filename="ep{epoch}.pt",
        save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
            ...
    )
    
    trainer.fit()

    Likewise, to load checkpoints from S3, all you have to do is:

    • Set load_path to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt')
    from composer.trainer import Trainer
    
    # Checkpoint loading from S3.
    new_trainer = Trainer(
        model=model,
        train_dataloader=train_dataloader,
        max_duration="10ep",
        load_path="s3://my-bucket/my-run/checkpoints/ep13.pt",
       )
    
        new_trainer.fit()

    For more information, please see our Checkpointing guide.

  4. 𐄳 Improved Distributed Experience

    We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

    For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:

    import datetime
    from composer.trainer.devices import DeviceGPU
    from composer.utils import dist
    
    dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module
    
    if dist.get_local_rank() == 0: # Download dataset on rank zero
        dataset = download_my_dataset()
    dist.barrier() # All ranks wait until dataset is downloaded
    
    # Create and train your model!

    For more information, please check out our Distributed API docs.

Bug Fixes

  • fix loss and eval_forward for HF models (#1597)
  • add more robust casting to int for fsdp min_params (#1608)
  • Deepspeed Docs Typo (#1605)
  • Fix mmdet typo (#1618)
  • Blurpool idempotent (#1625)
  • When model is not on meta device, initialization should occur on compute device not CPU (#1623)
  • Auto resumption (#1615)
  • Adjust speed monitor (#1645)
  • Hot fix console logging (#1643)
  • Lazy Logging + pretty print dict for hparams (#1653)
  • Fix many failing notebook tests (#1646)

What's Changed

Read more

v0.10.1

06 Oct 00:17
Compare
Choose a tag to compare

🚀 Composer v0.10.1

Composer v0.10.1 is released! Install via pip:

pip install --upgrade mosaicml==0.10.1

New Features

  1. 𐄷 Weight Standardization

    Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.

    Using Weight Standardization with the Composer Trainer:

    import composer
     
    # Apply Weight Standardization (when training is initialized)
    weight_std = composer.algorithms.WeightStandardization()
    
    # Train with Weight Standardization
    trainer = composer.trainer.Trainer(
        ...
        algorithms=[weight_std]
    )
    trainer.fit()

    Using Weight Standardization with the Composer functional interface:

    import composer
    from torchvision.models import resnet50
     
    my_model = resnet50()
     
    # Apply weight standardization to model
    my_model = composer.functional.weight_standardization(my_model)

    Please see the Weight Standardization Method Card for more details.

Bug Fixes

  • Fix for checkpoints not being saved automatically at the end of a run (#1552)
  • Fix Onnx export for Composer HuggingFaceModels (#1557)
  • Fix for MIoU metric producing NaN's (#1558)
  • CometML logger documentation updates and fixes (#1567, #1570, #1571)
  • WandB image visualizer fix (#1591)

What's Changed

New Contributors

Full Changelog: v0.10.0...v0.10.1

v0.10.0

22 Sep 06:25
Compare
Choose a tag to compare

🚀 Composer v0.10.0

Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

pip install --upgrade mosaicml==0.10.0

New Features

  1. ☄️ Comet Experiment Tracking (#1490)

    We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:

    from composer import Trainer
    from composer.loggers import CometMLLogger
    
    cometml_logger = CometMLLogger()
    
    trainer = Trainer(
        ...
        loggers=[cometml_logger],
    )

    Please see our Logging and CometMLLogger docs pages for details on usage.

  2. 🪄 Automatic Evaluation Batch Size Selection (#1417)

    Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.

  3. 🎯 Evaluation API Changes (#1479)

    The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:

    trainer = Trainer(
        eval_dataloader=...
    )
    trainer.eval()

    Alternatively, the eval_dataloader can be passed directly to the eval() method:

    trainer = Trainer(
        ...
    )
    trainer.eval(
        eval_dataloader=...
    )

    The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.

  4. 🪵 Simplified Logging (#1416)

    We've significantly simplified our internal logging interface:

    • Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.
    • For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.
  5. 🎯 validate --> eval_forward (#1411 , #1419)

    Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

    Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

    An example implementation for classification can be found in our ComposerClassifer base class:

        def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None:
            _, targets = batch
            metric.update(outputs, targets)
    
        def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
            return outputs if outputs is not None else self.forward(batch)
  6. 🕵️‍♀️ Evaluator changes

    The Evaluator class now stores evaluation metric names instead of metric instances. For example:

    glue_mrpc_task = Evaluator(
        label='glue_mrpc',
        dataloader=mrpc_dataloader,
        metric_names=['BinaryF1Score', 'Accuracy']
    )

    These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.

  7. 🚧 Streaming Datasets Repository Preview

    We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.

  8. YAHP deprecation

    We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

Bug Fixes

What's Changed

Read more

v0.9.0

16 Aug 06:11
Compare
Choose a tag to compare

🚀 Composer v0.9.0

Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors 🙌 !

pip install --upgrade mosaicml==0.9.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.9.0

New Features

  1. 📦 Export for inference APIs

    Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

    For example, here’s how to export a model in torchscript format:

    from composer.utils import export_for_inference
    
    # Invoking export with a trained model
    export_for_inference(model=model, 
                         save_format='torchscript', 
                         save_path=model_save_path)

    Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

    from composer.callbacks import ExportForInferenceCallback
    
    # Initializing Trainer with the export callback
    callback = ExportForInferenceCallback(save_format='onnx', 
                                                                                save_path=model_save_path)
    trainer = Trainer(model=model,
                                    callbacks=callback,
                                    train_dataloader=dataloader,
                                    max_duration='10ep')
    
    # Model will be exported at the end of training
    trainer.fit()

    Please see our Exporting for Inference notebook for more information.

  2. 📈 ALiBi support for BERT training

    You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

    ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

    Example of using ALiBi as an algorithm with the Composer Trainer:

    # Create an instance of a BERT masked language model
    model = composer.models.create_bert_mlm()
    
    # Apply ALiBi (when training is initialized)
    alibi = composer.algorithms.alibi(max_sequence_length=1024)
    
    # Train with ALiBi
    trainer = composer.trainer.Trainer(
        model=model,
        train_dataloader=train_dataloader,
        algorithms=[alibi]
    )
    trainer.fit()

    Example using the Composer Functional API:

    import composer.functional as cf
    
    # Create an instance of a BERT masked language model
    model = composer.models.create_bert_mlm()
    
    # Apply ALiBi and expand the model's maximum sequence length to 1024
    cf.apply_alibi(model=model, max_sequence_length=1024)

    AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.

  3. 🧐 Entry point for GLUE tasks pre-training and fine-tuning

    You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

    Example of launching the entrypoint:

    # This runs pre-training followed by fine-tuning.
    # --training_scheme can take either pretrain, finetune, or all depending on the task!
    python run_glue_trainer.py -f glue_example.yaml --training_scheme all

    Please see our GLUE entrypoint notebook for more information.

  4. 🤖 TPU support (in beta)

    You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

    To use TPUs with Composer, simply specify a tpu device:

    # Set device to `tpu`
    trainer = composer.trainer.Trainer(
        model=model,
        train_dataloader=train_dataloader,
        max_duration=train_epochs,
        device='tpu')
    
    # Run fit
    trainer.fit()

    Please see our Training with TPUs notebook for more information.

  5. 🍎 Apple Silicon support (beta)

    Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:

    trainer = Trainer(
        ...,
        device='mps'
    )

    We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.

    For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.

  6. 🚧 Contrib repository

    Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.

    Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

Additional API Changes

  1. 🔢 Passes Module

    The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.

  2. 🗄️ Default Checkpoint Extension

    The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.

  3. 👁️ Models Refactor

    Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.

    # before
    from composer.models import ComposerResNet
    model = ComposerResNet(..)
    
    from composer.models import composer_resnet  # after
    model = composer_resnet(..)

    The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

    See #1227 (vision) and #1130 (NLP) for more details.

  4. ➕ Misc API Changes

    • BreakEpochException has been removed.
    • state.is_model_deepspeed has been moved to composer.utils.is_model_deepspeed.
    • Helper function monitored_barrier has been added to composer distributed.

Bug Fixes

  • Add informative error for infer batch size issues (#1401)
  • Fix ImagenetDatasetHparams bug (#1392), resolves #1111
  • Fix hparams error condition checking (#1394)
  • Fix AMP resumption with grad scaler (#1376)
  • Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
  • Fix default precision (#1369)
  • Fix the profiler on multi-node training (#1358), resolves #1270
  • Retry SFTP on Size Mismatch (#1300)
  • Fix scheduler edge cases (#1350), resolves #1077
  • Fix a race condition in the object store logger (#1328)
  • Fix WandB load from checkpoint (#1326)
  • Fix Notebook Progress Bars (#1313)

Commits

What's Changed

Read more

v0.8.2

27 Jul 23:36
Compare
Choose a tag to compare

🚀 Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

pip install --upgrade mosaicml==0.8.2

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.2

🐛 Bug Fixes

  1. Fixed Notebook Progress Bars in Colab

    Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
    UnsupportedOperation: fileno.

    Closes #1312. Fixed in PR #1314.

Changelog

v0.8.1...v0.8.2