Releases: allenai/allennlp
v2.0.0rc1
This is the first (and hopefully only) release candidate for AllenNLP 2.0. Please note that this is a release candidate, and the APIs are still subject to change until the final 2.0 release. We'll provide a detailed writeup with the final 2.0 release, including a migration guide. In the meantime, here are the headline features of AllenNLP 2.0:
- Support for models that combine language and vision features
- Transformer Toolkit, a suite of classes and components that make it easy to experiment with transformer architectures
- A framework for multitask training
- Revamped data loading, for improved performance and flexibility
What's new
Added 🎉
- Added
TensorCache
class for caching tensors on disk - Added abstraction and concrete implementation for image loading
- Added abstraction and concrete implementation for
GridEmbedder
- Added abstraction and demo implementation for an image augmentation module.
- Added abstraction and concrete implementation for region detectors.
- A new high-performance default
DataLoader
:MultiProcessDataLoading
. - A
MultiTaskModel
and abstractions to use with it, includingBackbone
andHead
. The
MultiTaskModel
first runs its inputs through theBackbone
, then passes the result (and
whatever other relevant inputs it got) to eachHead
that's in use. - A
MultiTaskDataLoader
, with a correspondingMultiTaskDatasetReader
, and a couple of new
configuration objects:MultiTaskEpochSampler
(for deciding what proportion to sample from each
dataset at every epoch) and aMultiTaskScheduler
(for ordering the instances within an epoch). - Transformer toolkit to plug and play with modular components of transformer architectures.
- Added a command to count the number of instances we're going to be training with
- Added a
FileLock
class tocommon.file_utils
. This is just like theFileLock
from thefilelock
library, except that
it adds an optional flagread_only_ok: bool
, which when set toTrue
changes the behavior so that a warning will be emitted
instead of an exception when lacking write permissions on an existing file lock.
This makes it possible to use theFileLock
class on a read-only file system. - Added a new learning rate scheduler:
CombinedLearningRateScheduler
. This can be used to combine different LR schedulers, using one after the other. - Added an official CUDA 10.1 Docker image.
- Moving
ModelCard
andTaskCard
abstractions into the main repository. - Added a util function
allennlp.nn.util.dist_reduce(...)
for handling distributed reductions.
This is especially useful when implementing a distributedMetric
.
Changed ⚠️
DatasetReader
s are now always lazy. This means there is nolazy
parameter in the base
class, and the_read()
method should always be a generator.- The
DataLoader
now decides whether to load instances lazily or not.
With thePyTorchDataLoader
this is controlled with thelazy
parameter, but with
theMultiProcessDataLoading
this is controlled by themax_instances_in_memory
setting. ArrayField
is now calledTensorField
, and implemented in terms of torch tensors, not numpy.- Improved
nn.util.move_to_device
function by avoiding an unnecessary recursive check for tensors and
adding anon_blocking
optional argument, which is the same argument as intorch.Tensor.to()
. - If you are trying to create a heterogeneous batch, you now get a better error message.
- Readers using the new vision features now explicitly log how they are featurizing images.
master_addr
andmaster_port
renamed toprimary_addr
andprimary_port
, respectively.is_master
parameter for training callbacks renamed tois_primary
.master
branch renamed tomain
- Torch version bumped to 1.7.1 in Docker images.
Removed 👋
- Removed
nn.util.has_tensor
.
Fixed ✅
- The
build-vocab
command no longer crashes when the resulting vocab file is
in the current working directory. - Fixed typo with
LabelField
string representation: removed trailing apostrophe. Vocabulary.from_files
andcached_path
will issue a warning, instead of failing, when a lock on an existing resource
can't be acquired because the file system is read-only.TrackEpochCallback
is now aEpochCallback
.
Commits
9a4a424 Moves vision models to allennlp-models (#4918)
412896b fix merge conflicts
ed322eb A helper for distributed reductions (#4920)
9ab2bf0 add CUDA 10.1 Docker image (#4921)
d82287e Update transformers requirement from <4.1,>=4.0 to >=4.0,<4.2 (#4872)
5497394 Multitask example (#4898)
0f00d4d resolve _read type (#4916)
5229da8 Toolkit decoder (#4914)
4183a49 Update mkdocs-material requirement from <6.2.0,>=5.5.0 to >=5.5.0,<6.3.0 (#4880)
d7c9eab improve worker error handling in MultiProcessDataLoader (#4912)
94dd9cc rename 'master' -> 'primary' for distributed training (#4910)
c9585af fix imports in file_utils
03c7ffb Merge branch 'main' into vision
effcc4e improve data loading docs (#4909)
2f54570 remove PyTorchDataLoader, add SimpleDataLoader for testing (#4907)
31ec6a5 MultiProcessDataLoader takes PathLike data_path (#4908)
5e3757b rename 'multi_process_*' -> 'multiprocess' for consistency (#4906)
df36636 Data loading cuda device (#4879)
aedd3be Toolkit: Cleaning up TransformerEmbeddings (#4900)
54e85ee disable codecov annotations (#4902)
2623c4b Making TrackEpochCallback an EpochCallback (#4893)
1d21c75 issue warning instead of failing when lock can't be acquired on a resource that exists in a read-only file system (#4867)
ec197c3 Create pull_request_template.md (#4891)
15d32da Make GQA work (#4884)
fbab0bd import MultiTaskDataLoader to data_loaders/init.py (#4885)
d1cc146 Merge branch 'main' into vision
abacc01 Adding f1 score (#4890)
9cf41b2 fix navbar link
9635af8 rename 'master' -> 'main' (#4887)
d0a07fb docs: fix simple typo, multplication -> multiplication (#4883)
d1f032d Moving modelcard and taskcard abstractions to main repo (#4881)
f62b819 Make images easier to find for Visual Entailment (#4878)
1fff7ca Update docker torch version (#4873)
7a7c7ea Only cache, no featurizing (#4870)
d2aea97 Fix typo in str (#4874)
1c72a30 Merge branch 'master' into vision
6a8d425 add CombinedLearningRateScheduler (#4871)
85d38ff doc fixes
c4e3f77 Switch to torchvision for vision components 👀, simplify and improve MultiProcessDataLoader (#4821)
3da8e62 Merge branch 'master' into vision
a3732d0 Fix cache volume (#4869)
832901e Turn superfluous warning to info when extending the vocab in the embedding matrix (#4854)
147fefe Merge branch 'master' into vision
87e3536 Make tests work again (#4865)
d16a5c7 Merge remote-tracking branch 'origin/master' into vision
457e56e Merge branch 'master' into vision
c8521d8 Toolkit: Adding documentation and small changes for BiModalAttention
(#4859)
ddbc740 gqa reader fixes during vilbert training (#4851)
50e50df Generalizing transformer layers (#4776)
52fdd75 adding multilabel option (#4843)
7887119 Other VQA datasets (#4834)
e729e9a Added GQA reader (#4832)
52e9dd9 Visual entailment model code (#4822)
01f3a2d Merge remote-tracking branch 'origin/master' into vision
3be6c97 SNLI_VE dataset reader (#4799)
b659e66 VQAv2 (#4639)
c787230 Merge remote-tracking branch 'origin/master' into vision
db2d1d3 Merge branch 'master' into vision
6bf1924 Merge branch 'master' into vision
167bcaa remove vision push trigger
7591465 Merge remote-tracking branch 'origin/master' into vision
22d4633 improve independence of vision components (#4793)
98018cc fix merge conflicts
c780315 fix merge conflicts
5d22ce6 Merge remote-tracking branch 'origin/master' into vision
602399c update with master
ffafaf6 Multitask data loading and scheduling (#4625)
7c47c3a Merge branch 'master' into vision
12c8d1b Generalizing self attention (#4756)
63f61f0 Merge remote-tracking branch 'origin/master' into vision
b48347b Merge remote-tracking branch 'origin/master' into vision
81892db fix failing tests
98edd25 update torch requirement
8da3508 update with master
cc53afe separating TransformerPooler as a new module (#4730)
4ccfa88 Transformer toolkit: BiModalEncoder now has separate num_attention_heads
for both modalities (#4728)
91631ef Transformer toolkit (#4577)
677a9ce Merge remote-tracking branch 'origin/master' into vision
2985236 This should have been part of the previously merged PR
c5d264a Detectron NLVR2 (#4481)
e39a5f6 Merge remote-tracking branch 'origin/master' into vision
f1e46fd Add MultiTaskModel (#4601)
fa22f73 Merge remote-tracking branch 'origin/master' into vision
41872ae Merge remote-tracking branch 'origin/master' into vision
f886fd0 Merge remote-tracking branch 'origin/master' into vision
191b641 make existing readers work with multi-process loading (#4597)
d7124d4 fix len calculation for new data loader (#4618)
8746361 Merge branch 'master' into vision
319794a remove duplicate padding calculations in collate fn (#4617)
de9165e rename 'node_rank' to 'global_rank' in dataset reader 'DistributedInfo' (#4608)
3d11419 Formatting updates for new version of black (#4607)
cde06e6 Changelog
1b08fd6 ensure models check runs on right branch
44c8791 ensure vision CI runs on each commit (#4582)
95e8253 Merge branch 'master' into vision
e74a736 new data loading (#4497)
6f82005 Merge remote-tracking branch 'origin/master' into vision
a7d45de Initializing a VilBERT model from a pre-trained transformer (#4495)
3833f7a Merge branch 'master' into vision
71d7cb4 Merge branch 'master' into vision
3137961 Merge remote-tracking branch 'origin/master' into vision
6cc508d Merge branch 'master' into vision
f87df83 Merge remote-tracking branch 'origin/master' into vision
0bbe84b An initial VilBERT model for NLVR...
v1.3.0
What's new
Added 🎉
- Added links to source code in docs.
- Added
get_embedding_layer
andget_text_field_embedder
to thePredictor
class; to specify embedding layers for non-AllenNLP models. - Added Gaussian Error Linear Unit (GELU) as an Activation.
Changed ⚠️
- Renamed module
allennlp.data.tokenizers.token
toallennlp.data.tokenizers.token_class
to avoid
this bug. transformers
dependency updated to version 4.0.1.
Fixed ✅
- Fixed a lot of instances where tensors were first created and then sent to a device
with.to(device)
. Instead, these tensors are now created directly on the target device. - Fixed issue with
GradientDescentTrainer
when constructed withvalidation_data_loader=None
andlearning_rate_scheduler!=None
. - Fixed a bug when removing all handlers in root logger.
ShardedDatasetReader
now inherits parameters frombase_reader
when required.- Fixed an issue in
FromParams
where parameters in theparams
object used to a construct a class
were not passed to the constructor if the value of the parameter was equal to the default value.
This caused bugs in some edge cases where a subclass that takes**kwargs
needs to inspect
kwargs
before passing them to its superclass. - Improved the band-aid solution for segmentation faults and the "ImportError: dlopen: cannot load any more object with static TLS"
by adding atransformers
import. - Added safety checks for extracting tar files
Commits
d408f41 log import errors for default plugins (#4866)
f2a5331 Adds a safety check for tar files (#4858)
84a36a0 Update transformers requirement from <3.6,>=3.4 to >=4.0,<4.1 (#4831)
fdad31a Add ability to specify the embedding layer if the model does not use TextFieldEmbedder
(#4836)
41c5224 Improve the band-aid solution for seg faults and the static TLS error (#4846)
63b6d16 fix FromParams bug (#4841)
6c3238e rename token.py -> token_class.py (#4842)
cec9209 Several micro optimizations (#4833)
48a4865 Add GELU activation (#4828)
3e62365 Bugfix for attribute inheritance in ShardedDatasetReader (#4830)
458c4c2 fix the way handlers are removed from the root logger (#4829)
5b30658 Fix bug in GradientDescentTrainer when validation data is absent (#4811)
f353c6c add link to source code in docs (#4807)
0a83271 No Docker auth on PRs (#4802)
ad8e8a0 no ssh setup on PRs (#4801)
v1.2.2
What's new
Added 🎉
- Added Docker builds for other torch-supported versions of CUDA.
- Adds
allennlp-semparse
as an official, default plugin.
Fixed ✅
GumbelSampler
now sorts the beams by their true log prob.
Commits
023d9bc Prepare for release v1.2.2
7b0826c push commit images for both CUDA versions
3cad5b4 fix AUC test (#4795)
efde092 upgrade ssh-agent action (#4797)
ec37dd4 Docker builds for other CUDA versions, improve CI (#4796)
0d8873c doc link quickfix
e4cc95c improve plugin section in README (#4789)
d99f7f8 ensure Gumbel sorts beams by true log prob (#4786)
9fe8d90 Makes the transformer cache work with custom kwargs (#4781)
1e7492d Update transformers requirement from <3.5,>=3.4 to >=3.4,<3.6 (#4784)
f27ef38 Fixes pretrained embeddings for transformers that don't have end tokens (#4732)
v1.2.1
What's new
Added 🎉
- Added an optional
seed
parameter toModelTestCase.set_up_model
which sets the random
seed forrandom
,numpy
, andtorch
. - Added support for a global plugins file at
~/.allennlp/plugins
. - Added more documentation about plugins.
- Added sampler class and parameter in beam search for non-deterministic search, with several
implementations, includingMultinomialSampler
,TopKSampler
,TopPSampler
, and
GumbelMaxSampler
. UtilizingGumbelMaxSampler
will give Stochastic Beam Search.
Changed ⚠️
- Pass batch metrics to
BatchCallback
.
Fixed ✅
- Fixed a bug where forward hooks were not cleaned up with saliency interpreters if there
was an exception. - Fixed the computation of saliency maps in the Interpret code when using mismatched indexing.
Previously, we would compute gradients from the top of the transformer, after aggregation from
wordpieces to tokens, which gives results that are not very informative. Now, we compute gradients
with respect to the embedding layer, and aggregate wordpieces to tokens separately. - Fixed the heuristics for finding embedding layers in the case of RoBERTa. An update in the
transformers
library broke our old heuristic. - Fixed typo with registered name of ROUGE metric. Previously was
rogue
, fixed torouge
. - Fixed default masks that were erroneously created on the CPU even when a GPU is available.
Commits
04247fa support global plugins file, improve plugins docs (#4779)
9f7cc24 Add sampling strategies to beam search (#4768)
f6fe8c6 pin urllib3 in dev reqs for responses (#4780)
764bbe2 Pass batch metrics to BatchCallback
(#4764)
dc3a4f6 clean up forward hooks on exception (#4778)
fcc3a70 Fix: typo in metric, rogue -> rouge (#4777)
b89320c Set the device for an auto-created mask (#4774)
92a844a RoBERTa embeddings are no longer a type of BERT embeddings (#4771)
23f0a8a Ensure cnn_encoder respects masking (#4746)
b4f1a7a add seed option to ModelTestCase.set_up_model (#4769)
b7cec51 Made Interpret code handle mismatched cases better (#4733)
9759b15 allow TextFieldEmbedder to have EmptyEmbedder that may not be in input (#4761)
v1.2.0
What's new
Changed ⚠️
- Enforced stricter typing requirements around the use of
Optional[T]
types. - Changed the behavior of
Lazy
types infrom_params
methods. Previously, if you defined aLazy
parameter like
foo: Lazy[Foo] = None
in a customfrom_params
classmethod, thenfoo
would actually never beNone
.
This behavior is now different. If no params were given forfoo
, it will beNone
.
You can also now set default values for foo likefoo: Lazy[Foo] = Lazy(Foo)
.
Or, if you want you want a default value but also want to allow forNone
values, you can
write it like this:foo: Optional[Lazy[Foo]] = Lazy(Foo)
. - Added support for PyTorch version 1.7.
Fixed ✅
- Made it possible to instantiate
TrainerCallback
from config files. - Fixed the remaining broken internal links in the API docs.
- Fixed a bug where Hotflip would crash with a model that had multiple TokenIndexers and the input
used rare vocabulary items. - Fixed a bug where
BeamSearch
would fail ifmax_steps
was equal to 1.
Commits
7f85c74 fix docker build (#4762)
cc9ac0f ensure dataclasses not installed in CI (#4754)
812ac57 Fix hotflip bug where vocab items were not re-encoded correctly (#4759)
aeb6d36 revert samplers and fix bug when max_steps=1 (#4760)
baca754 Make returning token type id default in transformers intra word tokenization. (#4758)
5d6670c Update torch requirement from <1.7.0,>=1.6.0 to >=1.6.0,<1.8.0 (#4753)
0ad228d a few small doc fixes (#4752)
71a98c2 stricter typing for Optional[T] types, improve handling of Lazy params (#4743)
27edfbf Add end+trainer callbacks to Trainer.from_partial_objects (#4751)
b792c83 Fix device mismatch bug for categorical accuracy metric in distributed training (#4744)
v1.2.0rc1
What's new
Added 🎉
- Added a warning when
batches_per_epoch
for the validation data loader is inherited from
the train data loader. - Added a
build-vocab
subcommand that can be used to build a vocabulary from a training config file. - Added
tokenizer_kwargs
argument toPretrainedTransformerMismatchedIndexer
. - Added
tokenizer_kwargs
andtransformer_kwargs
arguments toPretrainedTransformerMismatchedEmbedder
. - Added official support for Python 3.8.
- Added a script:
scripts/release_notes.py
, which automatically prepares markdown release notes from the
CHANGELOG and commit history. - Added a flag
--predictions-output-file
to theevaluate
command, which tells AllenNLP to write the
predictions from the given dataset to the file as JSON lines. - Added the ability to ignore certain missing keys when loading a model from an archive. This is done
by adding a class-level variable calledauthorized_missing_keys
to any PyTorch module that aModel
uses.
If defined,authorized_missing_keys
should be a list of regex string patterns. - Added
FBetaMultiLabelMeasure
, a multi-label Fbeta metric. This is a subclass of the existingFBetaMeasure
. - Added ability to pass additional key word arguments to
cached_transformers.get()
, which will be passed on toAutoModel.from_pretrained()
. - Added an
overrides
argument toPredictor.from_path()
. - Added a
cached-path
command. - Added a function
inspect_cache
tocommon.file_utils
that prints useful information about the cache. This can also
be used from thecached-path
command withallennlp cached-path --inspect
. - Added a function
remove_cache_entries
tocommon.file_utils
that removes any cache entries matching the given
glob patterns. This can used from thecached-path
command withallennlp cached-path --remove some-files-*
. - Added logging for the main process when running in distributed mode.
- Added a
TrainerCallback
object to support state sharing between batch and epoch-level training callbacks. - Added support for .tar.gz in PretrainedModelInitializer.
- Added classes:
nn/samplers/samplers.py
withMultinomialSampler
,TopKSampler
, andTopPSampler
for
sampling indices from log probabilities - Made
BeamSearch
registrable. - Added
top_k_sampling
andtype_p_sampling
BeamSearch
implementations. - Pass
serialization_dir
toModel
andDatasetReader
. - Added an optional
include_in_archive
parameter to the top-level of configuration files. When specified,include_in_archive
should be a list of paths relative to the serialization directory which will be bundled up with the final archived model from a training run.
Changed ⚠️
- Subcommands that don't require plugins will no longer cause plugins to be loaded or have an
--include-package
flag. - Allow overrides to be JSON string or
dict
. transformers
dependency updated to version 3.1.0.- When
cached_path
is called on a local archive withextract_archive=True
, the archive is now extracted into a unique subdirectory of the cache root instead of a subdirectory of the archive's directory. The extraction directory is also unique to the modification time of the archive, so if the file changes, subsequent calls tocached_path
will know to re-extract the archive. - Removed the
truncation_strategy
parameter toPretrainedTransformerTokenizer
. The way we're calling the tokenizer, the truncation strategy takes no effect anyways. - Don't use initializers when loading a model, as it is not needed.
- Distributed training will now automatically search for a local open port if the
master_port
parameter is not provided. - In training, save model weights before evaluation.
allennlp.common.util.peak_memory_mb
renamed topeak_cpu_memory
, andallennlp.common.util.gpu_memory_mb
renamed topeak_gpu_memory
,
and they both now return the results in bytes as integers. Also, thepeak_gpu_memory
function now utilizes PyTorch functions to find the memory
usage instead of shelling out to thenvidia-smi
command. This is more efficient and also more accurate because it only takes
into account the tensor allocations of the current PyTorch process.- Make sure weights are first loaded to the cpu when using PretrainedModelInitializer, preventing wasted GPU memory.
- Load dataset readers in
load_archive
. - Updated
AllenNlpTestCase
docstring to remove reference tounittest.TestCase
Removed 👋
- Removed
common.util.is_master
function.
Fixed ✅
- Fixed a bug where the reported
batch_loss
metric was incorrect when training with gradient accumulation. - Class decorators now displayed in API docs.
- Fixed up the documentation for the
allennlp.nn.beam_search
module. - Ignore
*args
when constructing classes withFromParams
. - Ensured some consistency in the types of the values that metrics return.
- Fix a PyTorch warning by explicitly providing the
as_tuple
argument (leaving
it as its default value ofFalse
) toTensor.nonzero()
. - Remove temporary directory when extracting model archive in
load_archive
at end of function rather than viaatexit
. - Fixed a bug where using
cached_path()
offline could return a cached resource's lock file instead
of the cache file. - Fixed a bug where
cached_path()
would fail if passed acache_dir
with the user home shortcut~/
. - Fixed a bug in our doc building script where markdown links did not render properly
if the "href" part of the link (the part inside the()
) was on a new line. - Changed how gradients are zeroed out with an optimization. See this video from NVIDIA
at around the 9 minute mark. - Fixed a bug where parameters to a
FromParams
class that are dictionaries wouldn't get logged
when an instance is instantiatedfrom_params
. - Fixed a bug in distributed training where the vocab would be saved from every worker, when it should have been saved by only the local master process.
- Fixed a bug in the calculation of rouge metrics during distributed training where the total sequence count was not being aggregated across GPUs.
- Fixed
allennlp.nn.util.add_sentence_boundary_token_ids()
to usedevice
parameter of input tensor. - Be sure to close the TensorBoard writer even when training doesn't finish.
- Fixed the docstring for
PyTorchSeq2VecWrapper
.
Commits
01644ca Pass serialization_dir to Model, DatasetReader, and support include_in_archive
(#4713)
1f29f35 Update transformers requirement from <3.4,>=3.1 to >=3.1,<3.5 (#4741)
6bb9ce9 warn about batches_per_epoch with validation loader (#4735)
00bb6c5 Be sure to close the TensorBoard writer (#4731)
3f23938 Update mkdocs-material requirement from <6.1.0,>=5.5.0 to >=5.5.0,<6.2.0 (#4738)
10c11ce Fix typo in PretrainedTransformerMismatchedEmbedder docstring (#4737)
0e64b4d fix docstring for PyTorchSeq2VecWrapper (#4734)
006bab4 Don't use PretrainedModelInitializer when loading a model (#4711)
ce14bdc Allow usage of .tar.gz with PretrainedModelInitializer (#4709)
c14a056 avoid defaulting to CPU device in add_sentence_boundary_token_ids() (#4727)
24519fd fix typehint on checkpointer method (#4726)
d3c69f7 Bump mypy from 0.782 to 0.790 (#4723)
cccad29 Updated AllenNlpTestCase docstring (#4722)
3a85e35 add reasonable timeout to gpu checks job (#4719)
1ff0658 Added logging for the main process when running in distributed mode (#4710)
b099b69 Add top_k and top_p sampling to BeamSearch (#4695)
bc6f15a Fixes rouge metric calculation corrected for distributed training (#4717)
ae7cf85 automatically find local open port in distributed training (#4696)
321d4f4 TrainerCallback with batch/epoch/end hooks (#4708)
001e1f7 new way of setting env variables in GH Actions (#4700)
c14ea40 Save checkpoint before running evaluation (#4704)
40bb47a Load weights to cpu with PretrainedModelInitializer (#4712)
327188b improve memory helper functions (#4699)
90f0037 fix reported batch_loss (#4706)
39ddb52 CLI improvements (#4692)
edcb6d3 Fix a bug in saving vocab during distributed training (#4705)
3506e3f ensure parameters that are actual dictionaries get logged (#4697)
eb7f256 Add StackOverflow link to README (#4694)
17c3b84 Fix small typo (#4686)
e0b2e26 display class decorators in API docs (#4685)
b9a9284 Update transformers requirement from <3.3,>=3.1 to >=3.1,<3.4 (#4684)
d9bdaa9 add build-vocab command (#4655)
ce604f1 Update mkdocs-material requirement from <5.6.0,>=5.5.0 to >=5.5.0,<6.1.0 (#4679)
c3b5ed7 zero grad optimization (#4673)
9dabf3f Add missing tokenizer/transformer kwargs (#4682)
9ac6c76 Allow overrides to be JSON string or dict (#4680)
55cfb47 The truncation setting doesn't do anything anymore (#4672)
990c9c1 clarify conda Python version in README.md
97db538 official support for Python 3.8 🐍 (#4671)
1e381bb Clean up the documentation for beam search (#4664)
11def8e Update bug_report.md
97fe88d Cached path command (#4652)
c9f376b Update transformers requirement from <3.2,>=3.1 to >=3.1,<3.3 (#4663)
e5e3d02 tick version for nightly releases
b833f90 fix multi-line links in docs (#4660)
d7c06fe Expose from_pretrained keyword arguments (#4651)
175c76b fix confusing distributed logging info (#4654)
fbd2ccc fix numbering in RELEASE_GUIDE
2d5f24b improve how cached_path extracts archives (#4645)
824f97d smooth out release process (#4648)
c7b7c00 Feature/prevent temp directory retention (#4643)
de5d68b Fix tensor.nonzero() function overload warning (#4644)
e8e89d5 add flag for saving predictions to 'evaluate' command (#4637)
e4fd5a0 Multi-label F-beta metric (#4562)
f0e7a78 Create Dependabot config file (#4635)
0e33b0b Return consistent types from metrics (#4632)
2df364f Update transformers requirement from <3.1,>=3.0 to >=3.0,<3.2 (#4621)
6d480aa Im...
v1.1.0
Highlights
Version 1.1 was mainly focused on bug fixes, but there are a few important new features such as gradient checkpointing with pretrained transformer embedders and official support for automatic mixed precision (AMP) training through the new torch.amp
module.
Details
Added
Predictor.capture_model_internals()
now accepts a regex specifying which modules to capture.- Added the option to specify
requires_grad: false
within an optimizer's parameter groups. - Added the
file-friendly-logging
flag back to thetrain
command. Also added this flag to thepredict
,evaluate
, andfind-learning-rate
commands. - Added an
EpochCallback
to track current epoch as a model class member. - Added the option to enable or disable gradient checkpointing for transformer token embedders via boolean parameter
gradient_checkpointing
. - Added a method to
ModelTestCase
for running basic model tests when you aren't using config files. - Added some convenience methods for reading files.
cached_path()
can now automatically extract and read files inside of archives.- Added the ability to pass an archive file instead of a local directory to
Vocab.from_files
. - Added the ability to pass an archive file instead of a glob to
ShardedDatasetReader
. - Added a new
"linear_with_warmup"
learning rate scheduler. - Added a check in
ShardedDatasetReader
that ensures the base reader doesn't implement manual distributed sharding itself. - Added an option to
PretrainedTransformerEmbedder
andPretrainedTransformerMismatchedEmbedder
to use a scalar mix of all hidden layers from the transformer model instead of just the last layer. To utilize this, just setlast_layer_only
toFalse
. - Training metrics now include
batch_loss
andbatch_reg_loss
in addition to aggregate loss across number of batches.
Changed
- Upgraded PyTorch requirement to 1.6.
- Beam search now supports multi-layer decoders.
- Replaced the NVIDIA Apex AMP module with torch's native AMP module. The default trainer (
GradientDescentTrainer
) now takes ause_amp: bool
parameter instead of the oldopt_level: str
parameter. - Not specifying a
cuda_device
now automatically determines whether to use a GPU or not. - Discovered plugins are logged so you can see what was loaded.
allennlp.data.DataLoader
is now an abstract registrable class. The default implementation remains the same, but was renamed toallennlp.data.PyTorchDataLoader
.BertPooler
can now unwrap and re-wrap extra dimensions if necessary.
Removed
- Removed the
opt_level
parameter toModel.load
andload_archive
. In order to use AMP with a loaded model now, just run the model's forward pass within torch'sautocast
context.
Fixed
- Fixed handling of some edge cases when constructing classes with
FromParams
where the class
accepts**kwargs
. - Fixed division by zero error when there are zero-length spans in the input to a
PretrainedTransformerMismatchedIndexer
. - Improved robustness of
cached_path
when extracting archives so that the cache won't be corrupted
if a failure occurs during extraction. - Fixed a bug with the
average
andevalb_bracketing_score
metrics in distributed training. - Fixed a bug in distributed metrics that caused nan values due to repeated addition of an accumulated variable.
- Fixed how truncation was handled with
PretrainedTransformerTokenizer
.
Previously, ifmax_length
was set toNone
, the tokenizer would still do truncation if the
transformer model had a default max length in its config.
Also, whenmax_length
was set to a non-None
value, several warnings would appear
for certain transformer models around the use of thetruncation
parameter. - Fixed evaluation of all metrics when using distributed training.
- Added a
py.typed
marker. Fixed type annotations inallennlp.training.util
. - Fixed problem with automatically detecting whether tokenization is necessary.
This affected primarily the Roberta SST model. - Improved help text for using the --overrides command line flag.
- Removed unnecessary warning about deadlocks in
DataLoader
. - Fixed testing models that only return a loss when they are in training mode.
- Fixed a bug in
FromParams
that caused silent failure in case of the parameter type beingOptional[Union[...]]
. - Fixed a bug where the program crashes if
evaluation_data_loader
is aAllennlpLazyDataset
. - Reduced the amount of log messages produced by
allennlp.common.file_utils
. - Fixed a bug where
PretrainedTransformerEmbedder
parameters appeared to be trainable
in the log output even whentrain_parameters
was set toFalse
. - Fixed a bug with the sharded dataset reader where it would only read a fraction of the instances
in distributed training. - Fixed checking equality of
ArrayField
s. - Fixed a bug where
NamespaceSwappingField
did not work correctly with.empty_field()
. - Put more sensible defaults on the
huggingface_adamw
optimizer. - Simplified logging so that all logging output always goes to one file.
- Fixed interaction with the python command line debugger.
- Log the grad norm properly even when we're not clipping it.
- Fixed a bug where
PretrainedModelInitializer
fails to initialize a model with a 0-dim tensor - Fixed a bug with the layer unfreezing schedule of the
SlantedTriangular
learning rate scheduler. - Fixed a regression with logging in the distributed setting. Only the main worker should write log output to the terminal.
- Pinned the version of boto3 for package managers (e.g. poetry).
- Fixed issue #4330 by updating the
tokenizers
dependency. - Fixed a bug in
TextClassificationPredictor
so that it passes tokenized inputs to theDatasetReader
in case it does not have a tokenizer. reg_loss
is only now returned for models that have some regularization penalty configured.- Fixed a bug that prevented
cached_path
from downloading assets from GitHub releases. - Fixed a bug that erroneously increased last label's false positive count in calculating fbeta metrics.
Tqdm
output now looks much better when the output is being piped or redirected.- Small improvements to how the API documentation is rendered.
- Only show validation progress bar from main process in distributed training.
Commits
dcc9cdc Prepare for release v1.1.0
aa750be fix Average metric (#4624)
e1aa57c improve robustness of cached_path when extracting archives (#4622)
711afaa Fix division by zero when there are zero-length spans in MismatchedEmbedder. (#4615)
be97943 Improve handling of **kwargs in FromParams (#4616)
187b24e add more tutorial links to README (#4613)
e840a58 s/logging/logger/ (#4609)
dbc3c3f Added batched versions of scatter and fill to util.py (#4598)
2c54cf8 reformat for new version of black (#4605)
2dd335e batched_span_select now guarantees element order in each span (#4511)
62f554f specify module names by a regex in predictor.capture_model_internals() (#4585)
f464aa3 Bump markdown-include from 0.5.1 to 0.6.0 (#4586)
d01cdff Update RELEASE_PROCESS.md to include allennlp-models (#4587)
3aedac9 Prepare for release v1.1.0rc4
87a61ad Bug fix in distributed metrics (#4570)
71a9a90 upgrade actions to cache@v2 (#4573)
bd9ee6a Give better usage info for overrides parameter (#4575)
0a456a7 Fix boolean and categorical accuracy for distributed (#4568)
8511274 add actions workflow for closing stale issues (#4561)
de41306 Static type checking fixes (#4545)
5a07009 Fix RoBERTa SST (#4548)
351941f Only pin mkdocs-material to minor version, ignore specific patch version (#4556)
0ac13a4 fix CHANGELOG
3b86f58 Prepare for release v1.1.0rc3
44d2847 Metrics in distributed setting (#4525)
1d61965 Bump mkdocs-material from 5.5.3 to 5.5.5 (#4547)
5b97780 tick version for nightly releases
b32608e add gradient checkpointing for transformer token embedders (#4544)
f639336 Fix logger being created twice (#4538)
660fdaf Fix handling of max length with transformer tokenizers (#4534)
15e288f EpochCallBack for tracking epoch (#4540)
9209bc9 Bump mkdocs-material from 5.5.0 to 5.5.3 (#4533)
bfecdc3 Ensure len(self.evaluation_data_loader) is not called (#4531)
5bc3b73 Fix typo in warning in file_utils (#4527)
e80d768 pin torch >= 1.6
73220d7 Prepare for release v1.1.0rc2
9415350 Update torch requirement from <1.6.0,>=1.5.0 to >=1.5.0,<1.7.0 (#4519)
146bd9e Remove link to self-attention modules. (#4512)
2401282 add back file-friendly-logging flag (#4509)
54e5c83 closes #4494 (#4508)
fa39d49 ensure call methods are rendered in docs (#4522)
e53d185 Bug fix for case when param type is Optional[Union...] (#4510)
14f63b7 Make sure we have a bool tensor where we expect one (#4505)
18a4eb3 add a requires_grad option to param groups (#4502)
6c848df Bump mkdocs-material from 5.4.0 to 5.5.0 (#4507)
d73f8a9 More BART changes (#4500)
1cab3bf Update beam_search.py (#4462)
478bf46 remove deadlock warning in DataLoader (#4487)
714334a Fix reported loss: Bug fix in batch_loss (#4485)
db20b1f use longer tqdm intervals when output being redirected (#4488)
53eeec1 tick version for nightly releases
d693cf1 PathLike (#4479)
2f87832 only show validation progress bar from main process (#4476)
9144918 Fix reported loss (#4477)
5c97083 fix release link in CHANGELOG and formatting in README
4eb9795 Prepare for release v1.1.0rc1
f195440 update 'Models' links in README (#4475)
9c801a3 add CHANGELOG to API docs, point to license on GitHub, improve API doc formatting (#4472)
69d2f03 Clean up Tqdm bars when output is being piped or redirected (#4470)
7b188c9 fixed bug that erronously increased last label's false positive count (#4473)
64db027 Skip ETag check if OSError (#4469)
b9d011e More BART ...
v1.1.0rc4
Changes since v1.1.0rc3
Added
- Added a workflow to GitHub Actions that will automatically close unassigned stale issues and
ping the assignees of assigned stale issues.
Fixed
- Fixed a bug in distributed metrics that caused nan values due to repeated addition of an accumulated variable.
Commits
87a61ad Bug fix in distributed metrics (#4570)
71a9a90 upgrade actions to cache@v2 (#4573)
bd9ee6a Give better usage info for overrides parameter (#4575)
0a456a7 Fix boolean and categorical accuracy for distributed (#4568)
8511274 add actions workflow for closing stale issues (#4561)
de41306 Static type checking fixes (#4545)
5a07009 Fix RoBERTa SST (#4548)
351941f Only pin mkdocs-material to minor version, ignore specific patch version (#4556)
v1.1.0rc3
Changes since v1.1.0rc2
Fixed
- Fixed how truncation was handled with
PretrainedTransformerTokenizer
.
Previously, ifmax_length
was set toNone
, the tokenizer would still do truncation if the
transformer model had a default max length in its config.
Also, whenmax_length
was set to a non-None
value, several warnings would appear
for certain transformer models around the use of thetruncation
parameter. - Fixed evaluation of all metrics when using distributed training.
Commits
0ac13a4 fix CHANGELOG
3b86f58 Prepare for release v1.1.0rc3
44d2847 Metrics in distributed setting (#4525)
1d61965 Bump mkdocs-material from 5.5.3 to 5.5.5 (#4547)
5b97780 tick version for nightly releases
b32608e add gradient checkpointing for transformer token embedders (#4544)
f639336 Fix logger being created twice (#4538)
660fdaf Fix handling of max length with transformer tokenizers (#4534)
15e288f EpochCallBack for tracking epoch (#4540)
9209bc9 Bump mkdocs-material from 5.5.0 to 5.5.3 (#4533)
bfecdc3 Ensure len(self.evaluation_data_loader) is not called (#4531)
5bc3b73 Fix typo in warning in file_utils (#4527)
e80d768 pin torch >= 1.6
v1.1.0rc2
What's new since v1.1.0rc1
Changed
- Upgraded PyTorch requirement to 1.6.
- Replaced the NVIDIA Apex AMP module with torch's native AMP module. The default trainer (
GradientDescentTrainer
)
now takes ause_amp: bool
parameter instead of the oldopt_level: str
parameter.
Fixed
- Removed unnecessary warning about deadlocks in
DataLoader
. - Fixed testing models that only return a loss when they are in training mode.
- Fixed a bug in
FromParams
that caused silent failure in case of the parameter type beingOptional[Union[...]]
.
Added
- Added the option to specify
requires_grad: false
within an optimizer's parameter groups. - Added the
file-friendly-logging
flag back to thetrain
command. Also added this flag to thepredict
,evaluate
, andfind-learning-rate
commands.
Removed
- Removed the
opt_level
parameter toModel.load
andload_archive
. In order to use AMP with a loaded
model now, just run the model's forward pass within torch'sautocast
context.
Commits
73220d7 Prepare for release v1.1.0rc2
9415350 Update torch requirement from <1.6.0,>=1.5.0 to >=1.5.0,<1.7.0 (#4519)
146bd9e Remove link to self-attention modules. (#4512)
2401282 add back file-friendly-logging flag (#4509)
54e5c83 closes #4494 (#4508)
fa39d49 ensure call methods are rendered in docs (#4522)
e53d185 Bug fix for case when param type is Optional[Union...] (#4510)
14f63b7 Make sure we have a bool tensor where we expect one (#4505)
18a4eb3 add a requires_grad option to param groups (#4502)
6c848df Bump mkdocs-material from 5.4.0 to 5.5.0 (#4507)
d73f8a9 More BART changes (#4500)
1cab3bf Update beam_search.py (#4462)
478bf46 remove deadlock warning in DataLoader (#4487)
714334a Fix reported loss: Bug fix in batch_loss (#4485)
db20b1f use longer tqdm intervals when output being redirected (#4488)
53eeec1 tick version for nightly releases
d693cf1 PathLike (#4479)
2f87832 only show validation progress bar from main process (#4476)
9144918 Fix reported loss (#4477)
5c97083 fix release link in CHANGELOG and formatting in README