KIND_Pertino_Pesce_Sandri

The aim of the project is to apply the techniques learnt during the course of Natural Language Processing (NLP) to analyse the KIND dataset. The main task of this dataset is to perform Named-Entity-Recognition (NER) from Italian documents.

Preliminary analysis:
- description and inspection with statistics of the dataset;
- clustering of the documents and visualisation, using k-means and topic modelling;
- indexing of the documents to perform keyword search over them using PyTerrier;
- training of a Word2Vec and fastText embedding on the data and investigation of the resulting properties of the embedding.
Training models to perform NER:
- training Conditional Random Fields models;
- testing pre-trained models using SpaCy and Stanza;
- fine-tuning of BERT models (Italian-only, multilingual and English) from HuggingFace.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
dataset		dataset
evalita-2023		evalita-2023
gazetteers		gazetteers
indexes/index_docs		indexes/index_docs
.gitignore		.gitignore
GroupAssignment2023.pdf		GroupAssignment2023.pdf
KIND annotation guidelines.docx.pdf		KIND annotation guidelines.docx.pdf
NLP_Project.ipynb		NLP_Project.ipynb
README.md		README.md
README_dataset.md		README_dataset.md
base_config.cfg		base_config.cfg
config.cfg		config.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KIND_Pertino_Pesce_Sandri

About

Releases

Packages

Contributors 3

Languages

leonardopesce/KIND_Pertino_Pesce_Sandri

Folders and files

Latest commit

History

Repository files navigation

KIND_Pertino_Pesce_Sandri

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages