GitHub - windsuzu/AICUP-Deidentification-of-Medical-Data: Chinese NER problem that needs to capture 18 types of entities in medical conversation text. The process is divided into 4 parts that are encapsulated in high-level abstract classes. We control the workflow in a single Jupyter notebook.

AICUP Deidentification-of-Medical-Data

AICUP 醫病資料去識別化
View Demo · Report Bug · Request Feature

Table of Contents

About
Getting Started
Dataset and Baseline
- Baseline Source Code
Design Pattern
- Abstract Classes
- Main
Contributing
License
Contact
Acknowledgements

About

這個專案來自 AICUP 競賽 - 醫病資料去識別化，該競賽提供了從成功大學醫院收集的臨床對話和相關訪談的文字內容。其中，文本的隱私內容和命名實體都是由人工標註的。 F1-Score 將被用來評估測試數據集上預測的正確性。簡而言之，這個競賽就是中文的 NER (named-entity-recognition) 任務，我們必須在文字中識別出 18 種命名實體。我們不只想要提升任務的表現，還想藉由該任務學習應用 design pattern 於一個 AI 專案。

The competition provides information on clinical conversations and related interviews collected from the NCKU Hospital. The private contents and named entities of the text data are marked manually. The F1-Score will be used to evaluate the correctness of predictions on the test dataset. In short, this competition is the Chinese NER (named-entity-recognition) task, where we must identify 18 types of named entities in text. We not only want to improve the performance of the task, but we also want to use the task to learn to apply the design pattern to an AI project.

Built With

Python 3
PyTorch
Transformers
Tensorflow 2
Jupyter Notebook
absl-py

Getting Started

Dataset and Baseline 查看競賽的基礎介紹與程式碼
Design Pattern 查看整個專案的架構介紹
Motivation 查看專案動機簡報
Report 查看專案成果簡報

Dataset and Baseline

Baseline Source Code

Design Pattern

我們將不同 notebook 都切成四個部分: data generator, data preprocessor, trainer, predictor，並以這四個為基礎分別建立他們的 abstract class。最終在一個終端控制的 main notebook 使用 absl.flags 來操控所有的類別。

Abstract Classes

Illustration

data generator	data preprocessor	trainer	predictor
source code	source code	source code	source code

Main

Illustration

main	data generator	data preprocessor	trainer	predictor
source code	source code	source code	source code	source code

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Reach out to the maintainer at one of the following places:

GitHub discussions
The email which is located in GitHub profile

Acknowledgements

人工智慧共創平台

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
dataset		dataset
docs		docs
images		images
pretrained		pretrained
program		program
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AICUP Deidentification-of-Medical-Data

About

Getting Started

Dataset and Baseline

Baseline Source Code

Design Pattern

Abstract Classes

Main

Contributing

License

Contact

Acknowledgements

About

Releases

Packages

Languages

License

windsuzu/AICUP-Deidentification-of-Medical-Data

Folders and files

Latest commit

History

Repository files navigation

AICUP Deidentification-of-Medical-Data

About

Getting Started

Dataset and Baseline

Baseline Source Code

Design Pattern

Abstract Classes

Main

Contributing

License

Contact

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages