Skip to content

Assignment for Intelligent Systems course which aims to perform basic analysis on a provided corpus, researching state-of-the-art techniques for word cloud generation.

License

Notifications You must be signed in to change notification settings

angeligareta/basic-text-mining-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic Text Mining using Python

Assignment of the Intelligent Systems course of the EIT Digital data science master at UPM

UPM License GitHub contributors

Abstract

This project aims to perform a basic analysis a provided corpus consisting of a head and neck cancer medication textual corpus. First, the dataset needs to be preprocessed, filtering the seer stage field and creating additional columns. Next, a basic word cloud will be created and the results discussed, followed my researching more advances techniques for word cloud generation. Approaches used include TextRank, MultipartiteRank, TopicRank, PositionRank, Yake, TF-IDF, SingleRank and a custom text rank. The implementation can be found in the format of Jupyter Notebook.

Authors

About

Assignment for Intelligent Systems course which aims to perform basic analysis on a provided corpus, researching state-of-the-art techniques for word cloud generation.

Topics

Resources

License

Stars

Watchers

Forks