Skip to content

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century

Notifications You must be signed in to change notification settings

computationalstylistics/100_english_novels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

100 English Novels ver. 1.4

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century. It contains novels by 33 authors (1/3 female writers, 2/3 male writers), and one anonymous (well, not so much...) novel entitled "Clara Vaughan".

The corpus is aimed at stylometric benchmarks. See: https://computationalstylistics.github.io/resources/ for further details.

Additionally, the folder 'word_embedding_models' contains two vector representations of the benchmark novels. The two models were produced using the GloVe algorithm via the 'text2vec' library for R. The models include a 50-dimensional representation of words, as well as a 100-dimensional one.

About

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published