- hathi_ia_texts.csv : combined metadata and text for all books
- all_counts.csv : n-gram occurences per book
- top1000_w_cluster.csv : ngrams annotated with potential cluster/speech origin
- proc_into.ipynb : cleaning and downloading files
- NgramExperiments.ipynb : ngram generation
- visualization.ipynb: clustering and visualization