Skip to content

lfoyh6591/nlp_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nlp_project

Objects

Apply texttiling method for topic segmentation to Korean text data.

Imporvement texttiling method by using sentence embedding with pretrained SBERT models and setting cosine similarities to similarity scores.

File explanation

execution

Execute only korean_texttiling.ipynb for the result.

1. texts

Korean text data in the texts folder from AIHUB
Preprocessed by fetch_data.ipynb
link : https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=543

2. textiling.py

Modified nltk's texttiligng source code for Korean text and improvement. link : https://www.nltk.org/_modules/nltk/tokenize/texttiling.html

3. segmentation.py

nltk's segmentation source code for evaluation metric. link : https://www.nltk.org/_modules/nltk/metrics/segmentation.html

4. fetch_data.ipynb

Used to load data from AIHUB and preprocess data.

5. korean_texttiling.ipynb

Used to expriment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published