L3i++ at Semeval 2023-Task 2: CoNER

Introduction

This repository contains the source code for the L3i++ team at Semeval 2023-Task 2: CoNER.

Datasets

We use the dataset from the SemEval 2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition, which is available at here. This dataset contains 12 languages (English, Spanish, Swedish, Ukrainian, Portuguese, French, Farsi, German, Chinese, Hindi, Bangla, and Italian), divided into 3 parts: train, dev, and test. Each part contains a set of CoNLL files, which are the input data for the model. The CoNLL files are in the following format:

# id 0d88e010-c6e8-4409-9dec-a785e43eac16	domain=de
sie _ _ O
war _ _ O
die _ _ O
erste _ _ O
frau _ _ O
die _ _ O
beim _ _ O
großes _ _ B-Facility
auge _ _ I-Facility
beobachtet _ _ O
durfte _ _ O
. _ _ O

See the sample files in the public_data/DE-German/ folder.

Requirements

Run the following command to install the required packages:

pip install -r requirements.txt

Usage

To preprocess the data, run the following command:

python ./models/preprocess.py --input_dir './public_data/DE-German/' --output_dir './preprocessed_data/' --lang 'de'

See the sample files after preprocessing steps in the preprocessed_data folder.

To train the model, run the following command:

python  ./models/train.py --train './preprocessed_data/de-train.csv' --test './preprocessed_data/de-dev.csv' --output_dir './bart_de' --model 'bart'

You can also access the monolingual English trained model at here as an example of how model is saved.

To inference the model and export the results, run the following command:

python  ./models/inference.py --data_path './public_data/DE-German/de_test.conll' --word_max_length 4 --model 'mbart' --model_path './best_model/' --output_path './de.pred.conll'

If you are lazy to run theses 3 above commands, you can run the following command to end-to-end reproduce the results:

chmod +x run.sh
./run.sh

Results

We will update the results after the leaderboard is released.

Contributors

🐮 TRAN Thi Hong Hanh 🐮

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

L3i++ at Semeval 2023-Task 2: CoNER

Introduction

Datasets

Requirements

Usage

Results

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

L3i++ at Semeval 2023-Task 2: CoNER

Introduction

Datasets

Requirements

Usage

Results

Contributors