Skip to content

Latest commit

 

History

History
12 lines (9 loc) · 408 Bytes

README.md

File metadata and controls

12 lines (9 loc) · 408 Bytes

Exploring shared text in elocution manuals

Data Files

  • hathi_ia_texts.csv : combined metadata and text for all books
  • all_counts.csv : n-gram occurences per book
  • top1000_w_cluster.csv : ngrams annotated with potential cluster/speech origin

Code

  • proc_into.ipynb : cleaning and downloading files
  • NgramExperiments.ipynb : ngram generation
  • visualization.ipynb: clustering and visualization