Skip to content

Chinese NER problem that needs to capture 18 types of entities in medical conversation text. The process is divided into 4 parts that are encapsulated in high-level abstract classes. We control the workflow in a single Jupyter notebook.

License

Notifications You must be signed in to change notification settings

windsuzu/AICUP-Deidentification-of-Medical-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors MIT License PR Welcome Author LinkedIn


Logo

AICUP Deidentification-of-Medical-Data

AICUP 醫病資料去識別化
View Demo · Report Bug · Request Feature

Table of Contents

About

這個專案來自 AICUP 競賽 - 醫病資料去識別化,該競賽提供了從成功大學醫院收集的臨床對話和相關訪談的文字內容。其中,文本的隱私內容和命名實體都是由人工標註的。 F1-Score 將被用來評估測試數據集上預測的正確性。 簡而言之,這個競賽就是中文的 NER (named-entity-recognition) 任務,我們必須在文字中識別出 18 種命名實體。 我們不只想要提升任務的表現,還想藉由該任務學習應用 design pattern 於一個 AI 專案。

The competition provides information on clinical conversations and related interviews collected from the NCKU Hospital. The private contents and named entities of the text data are marked manually. The F1-Score will be used to evaluate the correctness of predictions on the test dataset. In short, this competition is the Chinese NER (named-entity-recognition) task, where we must identify 18 types of named entities in text. We not only want to improve the performance of the task, but we also want to use the task to learn to apply the design pattern to an AI project.

Built With
  • Python 3
  • PyTorch
  • Transformers
  • Tensorflow 2
  • Jupyter Notebook
  • absl-py

Getting Started


Dataset and Baseline

Baseline Source Code


Design Pattern

我們將不同 notebook 都切成四個部分: data generator, data preprocessor, trainer, predictor,並以這四個為基礎分別建立他們的 abstract class。最終在一個終端控制的 main notebook 使用 absl.flags 來操控所有的類別。

Abstract Classes

Illustration
data generator data preprocessor trainer predictor
source code source code source code source code

Main

Illustration
main data generator data preprocessor trainer predictor
source code source code source code source code source code

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Reach out to the maintainer at one of the following places:

Acknowledgements

About

Chinese NER problem that needs to capture 18 types of entities in medical conversation text. The process is divided into 4 parts that are encapsulated in high-level abstract classes. We control the workflow in a single Jupyter notebook.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published