Skip to content

Commit

Permalink
Init Xmod implementation
Browse files Browse the repository at this point in the history
Xmod class conversion

Fix issues after refactoring

Xmod docs & adapter-specific logic

Checkpoint conversion.

minor modifications
  • Loading branch information
calpt committed Aug 28, 2023
1 parent ac866eb commit 158f889
Show file tree
Hide file tree
Showing 21 changed files with 816 additions and 10 deletions.
23 changes: 23 additions & 0 deletions docs/classes/models/xmod.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
X-MOD
=====

.. note::
The X-MOD implementation integrated into Transformers already supports adapters.
To make this implementation compatible with Adapters, a few changes were necessary:

- In Adapters, the X-MOD classes rely on the usual adapter methods instead of the custom methods introduced in Transformers, i.e.:
- ``set_active_adapters()`` instead of ``set_default_language()``.
- ``AdapterSetup`` context instead of ``lang_ids`` parameter.
- We provide dedicated model checkpoints converted for usage with Adapters
- e.g. ``facebook/xmod-base`` is available as ``AdapterHub/xmod-base`` with languages adapters split into separate repos (e.g. ``AdapterHub/xmod-base-af_ZA``) for on-demand loading.

The abstract from the paper is the following:

*Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-MOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.*

XmodAdapterModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: adapters.XmodAdapterModel
:members:
:inherited-members: XmodPreTrainedModel
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ Currently, we support the PyTorch versions of all models as listed on the `Model
classes/models/t5
classes/models/vit
classes/models/xlmroberta
classes/models/xmod

.. toctree::
:maxdepth: 2
Expand Down
1 change: 1 addition & 0 deletions docs/model_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The table below further shows which model architectures support which adaptation
| [T5](classes/models/t5.html) ||||||||
| [ViT](classes/models/vit.html) ||||||||
| [XLM-RoBERTa](classes/models/xlmroberta.html) ||||||||
| [X-MOD](classes/models/xmod.html) ||||||||

(*) If the used encoder and decoder model class are supported.

Expand Down
2 changes: 2 additions & 0 deletions src/adapters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@
"models.t5": ["T5AdapterModel"],
"models.vit": ["ViTAdapterModel"],
"models.xlm_roberta": ["XLMRobertaAdapterModel"],
"models.xmod": ["XmodAdapterModel"],
"trainer": ["AdapterTrainer", "Seq2SeqAdapterTrainer"],
"training": [
"AdapterArguments",
Expand Down Expand Up @@ -206,6 +207,7 @@
from .models.t5 import T5AdapterModel
from .models.vit import ViTAdapterModel
from .models.xlm_roberta import XLMRobertaAdapterModel
from .models.xmod import XmodAdapterModel
from .trainer import AdapterTrainer, Seq2SeqAdapterTrainer
from .training import AdapterArguments, setup_adapter_training
from .utils import (
Expand Down
1 change: 1 addition & 0 deletions src/adapters/composition.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ def __init__(
"xlm-roberta",
"bert-generation",
"llama",
"xmod",
],
}

Expand Down
13 changes: 7 additions & 6 deletions src/adapters/configuration/adapter_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,9 +162,10 @@ class BnConfig(AdapterConfigBase):
use_gating (:obj:`bool`, optional):
Place a trainable gating module besides the added parameter module to control module activation. This is
e.g. used for UniPELT. Defaults to False.
residual_before_ln (:obj:`bool`, optional):
If True, take the residual connection around the adapter bottleneck before the layer normalization. Only
applicable if :obj:`original_ln_before` is True.
residual_before_ln (:obj:`bool` or :obj:`str`, optional):
If True, take the residual connection around the adapter bottleneck before the layer normalization.
If set to "post_add", take the residual connection around the adapter bottleneck after the previous residual connection.
Only applicable if :obj:`original_ln_before` is True.
adapter_residual_before_ln (:obj:`bool`, optional):
If True, apply the residual connection around the adapter modules before the new layer normalization within
the adapter. Only applicable if :obj:`ln_after` is True and :obj:`is_parallel` is False.
Expand Down Expand Up @@ -225,7 +226,7 @@ class BnConfig(AdapterConfigBase):
is_parallel: bool = False
scaling: Union[float, str] = 1.0
use_gating: bool = False
residual_before_ln: bool = True
residual_before_ln: Union[bool, str] = True
adapter_residual_before_ln: bool = False
inv_adapter: Optional[str] = None
inv_adapter_reduction_factor: Optional[float] = None
Expand Down Expand Up @@ -267,7 +268,7 @@ class SeqBnConfig(BnConfig):

original_ln_before: bool = True
original_ln_after: bool = True
residual_before_ln: bool = True
residual_before_ln: Union[bool, str] = True
adapter_residual_before_ln: bool = False
ln_before: bool = False
ln_after: bool = False
Expand Down Expand Up @@ -306,7 +307,7 @@ class DoubleSeqBnConfig(BnConfig):

original_ln_before: bool = False
original_ln_after: bool = True
residual_before_ln: bool = True
residual_before_ln: Union[bool, str] = True
adapter_residual_before_ln: bool = False
ln_before: bool = False
ln_after: bool = False
Expand Down
55 changes: 55 additions & 0 deletions src/adapters/head_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,61 @@
},
"layers": ["lm_head.dense", None, "lm_head.layer_norm", "lm_head.decoder"],
},
# Xmod
"XmodForSequenceClassification": {
"config": {
"head_type": "classification",
"layers": 2,
"activation_function": "tanh",
"use_pooler": False,
},
"layers": [None, "classifier.dense", None, None, "classifier.out_proj"],
},
"XmodForMultipleChoice": {
"config": {
"head_type": "multiple_choice",
"layers": 1,
"activation_function": None,
"use_pooler": True,
},
"layers": [None, "classifier"],
},
"XmodForTokenClassification": {
"config": {
"head_type": "tagging",
"layers": 1,
"activation_function": None,
},
"layers": [None, "classifier"],
},
"XmodForQuestionAnswering": {
"config": {
"head_type": "question_answering",
"layers": 1,
"activation_function": None,
},
"layers": [None, "qa_outputs"],
},
"XmodForMaskedLM": {
"config": {
"head_type": "masked_lm",
"layers": 2,
"activation_function": "gelu",
"layer_norm": True,
"bias": True,
},
"layers": ["lm_head.dense", None, "lm_head.layer_norm", "lm_head.decoder"],
},
"XmodForCausalLM": {
"config": {
"head_type": "causal_lm",
"layers": 2,
"activation_function": "gelu",
"layer_norm": True,
"bias": True,
},
"layers": ["lm_head.dense", None, "lm_head.layer_norm", "lm_head.decoder"],
},
# BART
"BartForSequenceClassification": {
"config": {
Expand Down
8 changes: 7 additions & 1 deletion src/adapters/layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,13 @@ def enable_adapters(self, adapter_setup: AdapterCompositionBlock, unfreeze_adapt
for param in self.adapter_fusion_layer[sub_setup.name].parameters():
param.requires_grad = True

def get_adapter(self, adapter_name):
def freeze_adapter(self, adapter_name: str, freeze: bool = True):
if adapter_name in self.adapters:
self.adapters[adapter_name].train(not freeze)
for param in self.adapters[adapter_name].parameters():
param.requires_grad = not freeze

def get_adapter(self, adapter_name: str):
if adapter_name in self.adapters:
return self.adapters[adapter_name]
else:
Expand Down
6 changes: 6 additions & 0 deletions src/adapters/lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,12 @@ def enable_adapters(self, adapter_setup: AdapterCompositionBlock, unfreeze_adapt
for param in self.loras[name].parameters():
param.requires_grad = True

def freeze_adapter(self, adapter_name: str, freeze: bool = True):
if adapter_name in self.loras:
self.loras[adapter_name].train(not freeze)
for param in self.loras[adapter_name].parameters():
param.requires_grad = not freeze

def get_adapter(self, adapter_name: str) -> nn.Module:
if adapter_name in self.loras:
return self.loras[adapter_name]
Expand Down
7 changes: 5 additions & 2 deletions src/adapters/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,15 +145,18 @@ def pre_forward(
"""
query = None

if self.residual_before_ln:
if self.residual_before_ln is True:
residual = hidden_states

if fusion_config is not None and fusion_config["query_before_ln"]:
query = hidden_states

if self.original_ln_before:
if layer_norm:
hidden_states = layer_norm(hidden_states + input_tensor)
hidden_states = hidden_states + input_tensor
if self.residual_before_ln == "post_add":
residual = hidden_states
hidden_states = layer_norm(hidden_states)
else:
hidden_states = hidden_states + input_tensor

Expand Down
3 changes: 3 additions & 0 deletions src/adapters/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from .llama.mixin_llama import LlamaModelAdapterMixin
from .t5.mixin_t5 import T5BlockAdaptersMixin, T5ModelAdaptersMixin, T5ModelAdaptersWithHeadsMixin
from .vit.mixin_vit import ViTIntermediateAdaptersMixin, ViTModelAdaptersMixin
from .xmod.mixin_xmod import XmodModelAdaptersMixin


# IMPORTANT: Only add classes to this mapping that are not copied into the adapters package
Expand Down Expand Up @@ -58,6 +59,8 @@
"ViTModel": ViTModelAdaptersMixin,
"XLMRobertaLayer": BertLayerAdaptersMixin,
"XLMRobertaModel": BertModelAdaptersMixin,
"XmodLayer": BertLayerAdaptersMixin,
"XmodModel": XmodModelAdaptersMixin,
"DebertaModel": BertModelAdaptersMixin,
"DebertaLayer": BertLayerAdaptersMixin,
"DebertaV2Model": BertModelAdaptersMixin,
Expand Down
1 change: 1 addition & 0 deletions src/adapters/models/auto/adapter_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
("t5", "T5AdapterModel"),
("vit", "ViTAdapterModel"),
("xlm-roberta", "XLMRobertaAdapterModel"),
("xmod", "XmodAdapterModel"),
]
)

Expand Down
39 changes: 39 additions & 0 deletions src/adapters/models/xmod/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.

# Copyright 2023 The Adapter-Hub Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING

from transformers.utils import _LazyModule


_import_structure = {
"adapter_model": ["XmodAdapterModel"],
}


if TYPE_CHECKING:
from .adapter_model import XmodAdapterModel

else:
import sys

sys.modules[__name__] = _LazyModule(
__name__,
globals()["__file__"],
_import_structure,
)
Loading

0 comments on commit 158f889

Please sign in to comment.