Porttagger is a state of the art part of speech tagger for Brazilian Portuguese that automatically assigns morphosyntactic classes to the words of sentences, following the Universal Dependencies international model. You may provide single sentences or multiple sentences (using plain text files with several sentences) to be tagged. You may also choose which trained model to use. The options include a model trained on news texts (using the Porttinari-base corpus), on stock market tweets (from the DANTE corpus), on academic texts from the oil & gas domain (from the PetroGold corpus), and on all of them together. To the interested reader, this initiative is part of the POeTiSA project, where much more information is available. See more details about Porttagger in this paper
Para obter uma cópia local deste repositório, utilize o seguinte comando:
git clone https://github.com/felmateos/porttaggerDANTE.git
Antes de começar, certifique-se de ter o ambiente Python configurado. Utilize o seguinte comando para instalar as dependências necessárias:
pip install -r requirements.txt
Contribuições são bem-vindas! Sinta-se à vontade para propor melhorias, relatar problemas ou abrir pull requests.
Este projeto é licenciado sob a Licença MIT.