GitHub - Aniezka/Course_work: Coursework on "Clustering of English texts on the basis of automated extraction of key properties"

Clustering of English texts on the basis of automated extraction of key properties

To ran the code, you need to download the corresponding jupyter notebook "REALEC_clustering.ipynb" and unzip the dataset "REALEC_cleaned.zip" in the same directory.

Next, you need to set the parameters that will be used in the work in a string format:

data_type - the data we want to analyze, can be set to:

'unknown_data' - the unlabeled data to which you should set path to folder with essay (example: folder = "datasets/REALEC_clean/exam/Exam2018")
'test_graph' - labeled data which is used to draw the representations of graph descriptions from Exam2017 for test results
'test_essay' - labeled data which is used to draw the representations of opinion essays from Exam2017 for test results

method - method for receiving embeddings, can take the following values:

'tf-idf' - sklearn TF-IDF method
'bm25' - weighted TD-IDF
'dm' - 'dm' realization of doc2vec PV
'dbow' - 'dbow' realization of doc2vec PV

After analysis of values of internal measures, you should set the optimal number of clusters ('best_k' - by default set to 4) that will be used for all subsequent clustering algorithms and for keywords extraction for the corresponding groups.

Dataset REALEC: https://yadi.sk/d/AVFzedaVtSporg

Dataset EFCAMDAT: https://yadi.sk/d/LXn7PifcOmOjYQ

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
REALEC_Clustering.ipynb		REALEC_Clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering of English texts on the basis of automated extraction of key properties

About

Releases

Packages

Languages

Aniezka/Course_work

Folders and files

Latest commit

History

Repository files navigation

Clustering of English texts on the basis of automated extraction of key properties

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages