Skip to content

Latest commit

 

History

History
75 lines (51 loc) · 2.37 KB

README.md

File metadata and controls

75 lines (51 loc) · 2.37 KB

L3i++ at Semeval 2023-Task 2: CoNER

Introduction

This repository contains the source code for the L3i++ team at Semeval 2023-Task 2: CoNER.

Datasets

We use the dataset from the SemEval 2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition, which is available at here. This dataset contains 12 languages (English, Spanish, Swedish, Ukrainian, Portuguese, French, Farsi, German, Chinese, Hindi, Bangla, and Italian), divided into 3 parts: train, dev, and test. Each part contains a set of CoNLL files, which are the input data for the model. The CoNLL files are in the following format:

# id 0d88e010-c6e8-4409-9dec-a785e43eac16	domain=de
sie _ _ O
war _ _ O
die _ _ O
erste _ _ O
frau _ _ O
die _ _ O
beim _ _ O
großes _ _ B-Facility
auge _ _ I-Facility
beobachtet _ _ O
durfte _ _ O
. _ _ O

See the sample files in the public_data/DE-German/ folder.

Requirements

Run the following command to install the required packages:

pip install -r requirements.txt

Usage

To preprocess the data, run the following command:

python ./models/preprocess.py --input_dir './public_data/DE-German/' --output_dir './preprocessed_data/' --lang 'de'

See the sample files after preprocessing steps in the preprocessed_data folder.

To train the model, run the following command:

python  ./models/train.py --train './preprocessed_data/de-train.csv' --test './preprocessed_data/de-dev.csv' --output_dir './bart_de' --model 'bart'

You can also access the monolingual English trained model at here as an example of how model is saved.

To inference the model and export the results, run the following command:

python  ./models/inference.py --data_path './public_data/DE-German/de_test.conll' --word_max_length 4 --model 'mbart' --model_path './best_model/' --output_path './de.pred.conll'

If you are lazy to run theses 3 above commands, you can run the following command to end-to-end reproduce the results:

chmod +x run.sh
./run.sh

Results

We will update the results after the leaderboard is released.

Contributors