Description: Project to compare different types of classifications and visualizations techniques.
Concepts:
- Text preprocessing (remove stopwords, Stemming, Lemmatizing)
- Dimensionality reduction (TF-IDF)
- Split dataset
- Classification (Support Vector Machine, Artificial Neural Network: Feed-forward Backpropagation Multilayer Perceptron, Naive Bayes, Random Forest, Nearest Neighbors Classifier, Decision Tree)
- Visualization (Scatterplot)
Collection of 682 scientific papers, which are categorized as:
- Case-based Reasoning (CBR)
- Inductive Logic Programming (ILP)
- Information Retrieval (IR)
- Sonification (SON)
- Interactive Visualization (INT)
Each scientific paper is represented by a simple text describing its title, authors, abstract and references.
http://vicg.icmc.usp.br/vicg/software
cbr-ilp-ir-son-int.zip: compacted files of all scientific papers;
compscience_papers.csv: csv file structured as [label,text of the scientific paper];
cbr-ilp-ir-son-int_cosine_gpt.data: derived TF-IDF representation of this text collection;
[1] Paulovich, F. V., Nonato, L. G., Minghim, R., & Levkowitz, H., Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping. IEEE Transactions on Visualization and Computer Graphics, v. 14, n. 3, p. 564-575, 2008.
[2] Levkowitz, H., Minghim, R., Nonato, L. G., & Paulovich, F. V., Visual mapping of text collections through a fast high precision projection technique. In: Tenth International Conference on Information Visualisation (IV'06). IEEE, 2006. p. 282-290.
[3] Eler, D. M., Paulovich, F. V., de Oliveira, M. C. F., & Minghim, R. Coordinated and multiple views for visualizing text collections. In: 2008 12th International Conference Information Visualisation. IEEE, 2008. p. 246-251.
[4] Paulovich, Fernando V.; Minghim, Rosane, Multidimensional Data Mapping–Integrating Mining and Visualization.