A Python3 script collection to get statistics about your Letterboxd network.
- Python3 (tested on 3.5)
- beautifulsoup4
- pandas
- numpy
Run
python3 setup.py
This will create necessary directories and ask you for your username.
In order to save which movies the users you followed on Letterboxd logged, run
python3 ratings.py -u [username]
It will create a new folder for each user in the user
directory and a new csv each time this script is executed, a file that contains the Letterboxd movie URL and the rating (0-10).
Since the script downloads multiple HTMLs for each user, it takes a while to finish usually and maybe runs into exceeding traffic issues.
Without the -u
flag, it will try to take the user set as default.
You can also run ratings.py
for updating the whole database for recent additions.
python3 ranking.py -o csv -r 0
This will put out a csv file including all movies ever logged by users you follow.
The three columns are [title],[network_rating],[network_number_of_logs]
.
ranking.py
also has other options available:
-r [rating]
: The minimum rating down to which value you want the resulting list to go. By default, this is3.75
.-w y
: Only consider movies you already logged.-w n
: Filter movies you already logged.-target [username]
: Set target user other than default (yourself probably).-keep_own
: Keep your own rating for averaging. When this flag is not set, your rating is not part of the network rating calculation.-list [path/to/list]
: Filter by cloned or created list. You can also pass a URL, then this list will be cloned (see below for further instructions about cloning lists).-ignore_net
: Consider all movies (not only those known to your network). This disables nrating and nlogs columns.-mode [std | bayesian]
: Choose mode of rating value calculation.std
is the naive way where the sum of rating values is divided by the number of ratings. By default, this isbayesian
where movies with few logs get punished.-b [weight]
: Choose weighting for bayesian averaging. The default is the number of users in your network divided by 100. This has no effect whenmode
is set tostd
.-o [csv | net]
: Without the-o
flag, the list is simply printed to your terminal in full.csv
saves the list to a file with the columns mentioned above. See below fornet
.-d [date]
: Date up to when ratings should be considered. This is useful for producing rankings from previous updates.-usub [user1 user2 ...]
: Collect logs only from a subset of users.
Some options are not available unless you download the metadata: In order to download the HTML of every movie page, run
python3 ladle.py
Warning: This usually takes around 10MB of disk space per 100 movies.
By default, this takes the csv file in the lists
directory with the latest date in its name. You have the following options with ladle.py
:
-r [rating]
: Specify minimum rating (default: 3.75)-f
: Select another csv file-refresh
: Download HTMLs again (limit: once per day). The older file gets moved into a sub directory which has the "last-modified" date as its name.-w [seconds]
: Sleep timer after each request-cols [col1] [col2]
: Specify which columns the csv file has. Default:title
,rating
,num_logs
.
Now you can use the following flags with ranking.py
:
-m
(required for flags below) : Include metadata. At the moment, this is by default[year],[letterboxd_average_rating],[letterboxd_number_of_logs],[list_of_genres]
-lbr [rating]
: Minimum Letterboxd rating.-min_llogs [logs]
: Minimum number of logs on Letterboxd.-max_llogs [logs]
: Maximum number of logs on Letterboxd.-miny [year]
: Movies released in or after this year.-maxy [year]
: Movies released in or before this year.-mint [runtime in min]
: Minimum runtime of a movie.-maxt [runtime in min]
: Maximum runtime of a movie.-genre [genre]
: Filter by genre, e.g."science fiction"
oraction
.
For the following flags, please follow the letterboxd URL (search function coming soon!), so it's best to include hyphens instead of spaces:
-actor
: Filter by actor, e.g.nicolas-cage
.-director
: Filter by director, e.g.stanley-kubrick
.-producer
: Filter by producer, e.g.paul-thomas-anderson
.-writer
: Filter by writer, e.g.aaron-sorkin
.-editor
: Filter by editor, e.g.tricia-cooke
.-cinematography
: Filter by cinematographer, e.g.robert-elswit
.-visual_effects
: Filter by visual effects crew, e.g.janet-yale
.-composer
: Filter by composer, e.g.jonny-greenwood
.-sound
: Filter by sound crew, e.g.ben-burtt
.-production_design
: Filter by production designer, e.g.leslie-dilley
.-costumes
: Filter by costume designer, e.g.mary-zophres
.-studio
: Filter by studio, e.g.a24
.-country
: Filter by country, e.g.italy
.-language
: Filter by language, e.g.thai
.
Run
python3 ranking.py -sort <column_name>
with any column appearing in the metadata, e.g.:
title
: Reverse alphabetical order according to movie title (title_asc
for alphabetical)year
: Release year of the movie, newest first (year_asc
for earliest first)nrating
: Highest network rating (average of your friends' ratings) first (this is the default way of sorting if you don't specify-sort
)nlogs
: Number of logs from your friends, highest firstlrating
: Highest Letterboxd rating firstllogs
: Number of logs on Letterboxd, highest first. Note: This number won't change unless you manually delete all folders inmoviedata
(not recommended, though!).diff
: Difference betweennrating
andlrating
.
It is possible to pass multiple columns, separated by spaces, where the last column sort gets done first s.t. only for ties at the first column sort the next few decide on the order.
You can also change the columns (next to the mandatory "title" and "year") displayed at the end by running
python3 ranking.py -cols <column_name>
e.g. genre
, runtime
, llogs
(number of Letterboxd logs), lrating
(Letterboxd rating), director
, cinematography
, language
.
When you downloaded the top few movies as movie pages using ladle.py
you can run
python3 ranking.py -o net
to generate a network
csv file which can be imported to Letterboxd.
python3 ratings.py -c
The -c
flag is for updating the ratings in case they changed after a rewatch or adding the rating few days after simply logging it (rating 0). It makes absolutely sure that no movie is ignored and also keeps the ratings in case someone deleted them (to 0).
Other flags:
-u
: Username to retrieve ratings (from network) for. Without it, the default user is selected.-new
: Only retrieve newly added users and do not check already existing users in the database for new ratings.-w
: Sleep timer in seconds after each request.
When you execute ratings.py
for a second time a few days after, you most likely have new logs from your friends. In order to see what movies changed the most, run
python3 hype.py -old <path/to/old/csv> -new <path/to/new/csv> -flags popular
By default (without -old
and -new
), it will look for the csvs with the two most recent dates in the lists
directory.
Using the -flags
option, you can (currently) choose from three different hype lists which are:
rising
(default) : Movies with a rating of above 3.00 that have gained at least 0.05popular
: Movies that have gained the most new logstop
: See how the current top 20 movies were placed last time
Note: hype.py
currently only works with downloaded metadata (moviedata
directory).
Run python3 clone_list.py -url <URL>
to clone a list from Letterboxd.
You can specify the directory to save the list to via the -save_dir
flag.
Run python3 ladle.py -f <path/to/csv/file> -r 0
to download the necessary metadata that is not already in the moviedata
directory.
You can run python3 ranking.py -list <path/to/csv/file>
to filter all ratings by this list.
One WIP script gives you a table of all users in your users
directory featuring two major similarity measures.
One is called match_%
which shows how much of your (or -u
's) film catalog this friend also watched. The rmatch_%
column shows the opposite: How many of their films you (or -u
) watched. To some degree, both these columns are being averaged in the total
calculation.
The second one is rats_sim
which is a measure of how similar your ratings are to theirs. The nz_rats
column shows how many of their ratings are non-zero. This is to account for some users who don't rate films. rats_sim
's impact in the total result is dependant on the nz_rats
vlaue.
The total
column then shows some sort of average between these two major components.
python3 compare_users.py -u <USER>
has the following options:
-lb
: Set lower bound for ratings of the user for matches (match_%
andrmatch_%
). Default is 8.-tol
: Set ratings similarity (rats_sim
) tolerance, i.e. small ratings differences are ignored. Default is 2.-mrm
: Match reverse-match weight. Default is 0.75, i.e. thematch_%
has 75% while thermatch_%
has 25% on theavg_match_%
calculation.-fac
: Total (avg match % vs. ratings similarity) factor. Default is 0.25.
python3 search.py -q <MOVIE>
(where the movie is in the sub URL format, e.g. blue-is-the-warmest-color
)
prints the full database entry of a movie to your terminal.
In the future, this will also show the ratings of all network users.