Library, built on PyTorch and Transformers, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks in wolof.
Wolof is a language spoken in Senegal in neighboring countries, many works are written in Wolof or the need to have a tool that allows us to know better this language.
Wolof library allows us to do several specific tasks in Wolof languages such as text classification, translation, automatic speech recognition.
- simple and easy to use
- customizable
- clean code
- Python >= 3.6
- Torch
- Transformers
wolof can be installed using pip as follows:
pip install wolof
pip install git+https://github.com/abdouaziz/wolof.git
from wolof import Speech2Text
asr = Speech2Text(model_name="abdouaziiz/wav2vec2-xls-r-300m-wolof")
audio_file = "audio.wav"
prediction = asr(audio_file)
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library,offering a simple API dedicated to several tasks Masked Language Modeling, Sentiment Analysis .
bert-base-wolof is pretrained bert-base model on wolof language . sora-wolof is pretrained roberta model on wolof language .
Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
---|---|---|---|---|
bert-base-wolof |
6 | 12 | 514 | 56931622 M |
soraberta-base |
6 | 12 | 514 | 83 M |
Let's use fill_mask
for masked language modeling . We mask a word with the token [MASK]
in the given input_text and the unmasker predict the right word corresponding to the token [MASK]
.
>>> from wolof import Pipeline
>>> unmasker = Pipeline(task='fill-mask', model_name='abdouaziiz/bert-base-wolof')
>>> unmasker("kuy yoot du [MASK].")
[{'sequence': '[CLS] kuy yoot du seqet. [SEP]',
'score': 0.09505125880241394,
'token': 13578},
{'sequence': '[CLS] kuy yoot du daw. [SEP]',
'score': 0.08882280439138412,
'token': 679},
{'sequence': '[CLS] kuy yoot du yoot. [SEP]',
'score': 0.057790059596300125,
'token': 5117},
{'sequence': '[CLS] kuy yoot du seqat. [SEP]',
'score': 0.05671025067567825,
'token': 4992},
{'sequence': '[CLS] kuy yoot du yaqu. [SEP]',
'score': 0.0469999685883522,
'token': 1735}]
...
for task
we can have the following values: 'fill-mask', 'sentiment-analysis'
You can checkout examples in examples/
- Abdou Aziz DIOP @abdouaziz
- email : abdouaziz@gmail.com
- linkedin : https://www.linkedin.com/in/abdouaziiz/