A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.
-
Updated
Jul 6, 2021 - Python
A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.
This project leverages BERT for Named Entity Recognition (NER) on a medical dataset. The notebook provides a step-by-step guide, from dataset preparation to fine-tuning and saving the trained model for medical NER tasks.
Generate insights and rankings for potential candidates
Add a description, image, and links to the berttokenizer topic page so that developers can more easily learn about it.
To associate your repository with the berttokenizer topic, visit your repo's landing page and select "manage topics."