Skip to content

Functions to scrape Twitter account info for past points in time via archive.org

Notifications You must be signed in to change notification settings

ChRauh/PastTwitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PastTwitter

Some R functions to scrape and plot Twitter account info for past points in time from web.archive.org snapshots.

Author: ChRauh



handleSnapshots()

Input
String containing a single Twitter handle, without '@' or URL. Examples: 'realDonaldTrump', 'vonderleyen', 'GretaThunberg'

Output
R data.frame object containing links to all valid (status = 200, text/html available) web.archive.org snapshots for the Twitter profile page of the provided handle with timestamps. If multiple snapshots exist for the same day, only the first of those is kept.


extractAccountInfo()

Input
R data.frame object as structured by handleSnapshots()

Output
R data.frame object containg the count of followers, following, tweets, and likes for each available snapshot.


plotFollowers()

Input
R data.frame object as structured by extractAccountInfo().

Output
A ggplot() object indicating profile follower count of the specified handle at each available web.archive.org snapshot via geom_bar() and the linear interpolation via geom_line().


Example application

# Note: Execution time depends strongly on the number of available archive.org snapshots
# For the 'realDonaldTrump' example > 3h (2232 available snapshots on May 6 2021)
# Functions provide rudimentary progress feedback

# Attach PastTwitter functions ####
source("PastTwitter.R")

# The Twitter handle of interest ####
handle <- "realDonaldTrump"

# Output params
datafile <- paste0("./data/", handle, ".RDS")
plotfile <- paste0("./plots/", handle, ".png")

# Get archive.org snapshots ####
snapshots <- handleSnapshots(handle)

# Extract account info ####
info <- extractAccountInfo(snapshots)

# Plot follower count ####
pl.f <- plotFollowers(info)

# Export ####
write_rds(info, datafile)
ggsave(plotfile, pl.f, width = 22, height = 14, units = "cm")

Example output


Dependencies

About

Functions to scrape Twitter account info for past points in time via archive.org

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages