From r/goodreads (2018) or the Goodreads Developers forum, Breslin (2018) or Giulia (2018):
I simply need to obtain all (or as many) reviews for two books, namely Woolf's To the Lighthouse and Mrs Dalloway, so that i can then analyse the corpus obtained from them and see if readers define the two novels as "difficult".
$ cat savreviews-book12345-stars2.txt
2018/12/29 #1234567
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea <em>commodo consequat</em>.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum
dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
-------------------------------------------------------------------------------
2018/10/21 #7654321
Ut enim ad minim veniam, quis nostrud <b>exercitation</b> ullamco laboris nisi
ut aliquip ex ea commodo consequat: <a href="https://example.com">example.com</a>
-------------------------------------------------------------------------------
2018/04/01 #918273
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi
Note:
The generated files (one per star-rating) contain review-texts, dates and the review-ID only. They do not contain any other information, e.g., user names. If there is interest in these details or other output formats, just contact me or add an issue.
- Install the toolbox
- at the prompt, enter:
$ ./savreviews.pl --help
$ ./savreviews.pl 59716 # Goodreads Book-ID in URL
Loading reviews for "To the Lighthouse"... 5271 of 5860 [searching]
Number of reviews per year:
2007 ################ 263
2008 ##################### 343
2009 ################ 266
2010 ################# 276
2011 ###################### 357
2012 ############################# 473
2013 ################################## 565
2014 ############################ 456
2015 ########################### 440
2016 ############################# 474
2017 #################################### 599
2018 ######################################## 648
2019 ###### 111
Writing reviews to:
./list-out/savreviews-book59716-stars0.txt
./list-out/savreviews-book59716-stars1.txt
./list-out/savreviews-book59716-stars2.txt
./list-out/savreviews-book59716-stars3.txt
./list-out/savreviews-book59716-stars4.txt
./list-out/savreviews-book59716-stars5.txt
Total time: 36 minutes
- long runtime: Goodreads slows down all requests and we have to load a lot of data
- there's no way to load all reviews of a book, but the program
tries different things to get as many fulltext reviews as
possible -- this can take very long (see
--rigor
parameter and this) - needs data cleansing on your side
- review text might include user-entered (broken) HTML code and URLs
- review text can be in any language, e.g., German or Russian
- review text might include non-latin characters, e.g., Cyrillic
- no duplicate reviewers, but could theoretically contain duplicate reviews posted by different members (statistically negligible?)
If you like this project, give it a star on GitHub. Report bugs or suggestions via GitHub or see the AUTHORS.md file.
- friendrated.pl - Books common among the people you follow
- friendnet.pl - Social network analysis
- friendgroup.pl - Groups common among the people you follow
- recentrated.pl - Know when people rate or write reviews about a book
- similarauth.pl - Find all similar authors
- likeminded.pl - Finding people based on the books they've read