Norbench updates #5

sigdelina · 2022-08-23T16:56:32Z

The current issue provides information about the models that have been implemented within the framework of Norbench.

The information provided below will contain:

documentation for running scripts for the current (POS-tagging, Binary Sentiment Analysis, NER) task
list of models that can be used for each of the tasks (at the current moment)
future updates, new available models, etc.

The text was updated successfully, but these errors were encountered:

sigdelina · 2022-08-23T16:59:28Z

Early updates

The first version of documentation was described here.

sigdelina · 2022-08-23T22:14:28Z

Updates in documentation

The second version of documentation was uploaded to repository
Documentation for POS-tagging task and Binary Sentiment Analysis task was supplemented: two extra sections were added (Evaluation and Available Models) for both of the tasks.
The scripts for POS-tagging task and Binary Sentiment Analysis task were updated and now more models can be run on the current tasks (information about models for each of the task in Available Models is given).

What's next:

Updating scripts for NER-finetuning (as mentioned in early updates). Scores for XLM-Roberta from the first attempt of implementing could be found here -- but the code overall will be improved

sigdelina · 2022-09-22T14:00:36Z

Updates

The documentation for NorBench NorBench was updated
Some bugs in the scripts were eliminated
More models for POS-tagging, Binary Sentiment Analysis, and NER were evaluated (the scores for them could be found here)
To increase the number of models in the benchmark, it was decided to use models that were implemented in ScandEval -- the list of models and their scores constantly updated

What should be done next:

some models from ScandEval benchmark should be downloaded as a folder directly into the directory (e.g.Scandibert model). Automatically saving of such models is in the process of solving.

akutuzov · 2022-09-28T11:16:16Z

Two things more:

Decouple evaluation code from data (data paths should not be hard-coded in the scripts)
Create a single evaluation script which will run all the benchmarks for a given model.

akutuzov · 2022-11-13T23:46:27Z

Also:

Scripts should load the datasets in the original format (probably even pull them for the respective repositories if not found locally).
Every task should be "served" by two scripts: the one which fine-tunes a model and produces predictions (saved in a separate file) and the one which evaluates these predictions on the test set.
Argument names must be more sane (currently they are a bit weird, for example, this short_model_name which is in fact the path to the model, etc)

sigdelina closed this as completed Aug 23, 2022

sigdelina reopened this Aug 23, 2022

sigdelina added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers labels Aug 23, 2022

akutuzov assigned sigdelina Nov 13, 2022

sigdelina mentioned this issue Jan 11, 2023

Norbench fixes #6

Merged

akutuzov closed this as completed in 69b36bf Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Norbench updates #5

Norbench updates #5

sigdelina commented Aug 23, 2022

sigdelina commented Aug 23, 2022

sigdelina commented Aug 23, 2022

sigdelina commented Sep 22, 2022

akutuzov commented Sep 28, 2022

akutuzov commented Nov 13, 2022 •

edited

Loading

Norbench updates #5

Norbench updates #5

Comments

sigdelina commented Aug 23, 2022

sigdelina commented Aug 23, 2022

Early updates

sigdelina commented Aug 23, 2022

Updates in documentation

sigdelina commented Sep 22, 2022

Updates

akutuzov commented Sep 28, 2022

akutuzov commented Nov 13, 2022 • edited Loading

akutuzov commented Nov 13, 2022 •

edited

Loading