Assignment of the Intelligent Systems course of the EIT Digital data science master at UPM
This project aims to perform a basic analysis a provided corpus consisting of a head and neck cancer medication textual corpus. First, the dataset needs to be preprocessed, filtering the seer stage field and creating additional columns. Next, a basic word cloud will be created and the results discussed, followed my researching more advances techniques for word cloud generation. Approaches used include TextRank, MultipartiteRank, TopicRank, PositionRank, Yake, TF-IDF, SingleRank and a custom text rank. The implementation can be found in the format of Jupyter Notebook.
- Angel Igareta (angel@igareta.com)
- Cristian Abrante Dorta