Some R functions to scrape and plot Twitter account info for past points in time from web.archive.org snapshots.
Author: ChRauh
Input
String containing a single Twitter handle, without '@' or URL. Examples: 'realDonaldTrump', 'vonderleyen', 'GretaThunberg'
Output
R data.frame object containing links to all valid (status = 200, text/html available) web.archive.org snapshots for the Twitter profile page of the provided handle with timestamps. If multiple snapshots exist for the same day, only the first of those is kept.
Input
R data.frame object as structured by handleSnapshots()
Output
R data.frame object containg the count of followers, following, tweets, and likes for each available snapshot.
Input
R data.frame object as structured by extractAccountInfo()
.
Output
A ggplot()
object indicating profile follower count of the specified handle at each available web.archive.org snapshot via geom_bar()
and the linear interpolation via geom_line()
.
# Note: Execution time depends strongly on the number of available archive.org snapshots
# For the 'realDonaldTrump' example > 3h (2232 available snapshots on May 6 2021)
# Functions provide rudimentary progress feedback
# Attach PastTwitter functions ####
source("PastTwitter.R")
# The Twitter handle of interest ####
handle <- "realDonaldTrump"
# Output params
datafile <- paste0("./data/", handle, ".RDS")
plotfile <- paste0("./plots/", handle, ".png")
# Get archive.org snapshots ####
snapshots <- handleSnapshots(handle)
# Extract account info ####
info <- extractAccountInfo(snapshots)
# Plot follower count ####
pl.f <- plotFollowers(info)
# Export ####
write_rds(info, datafile)
ggsave(plotfile, pl.f, width = 22, height = 14, units = "cm")