tweetexploR is an R package for exploring and visualising a collection of Tweets that has been tidied into an SQLite database file by the tidy_tweet library.
tweetexploR allows you to quickly answer questions such as:
Question | Function name |
---|---|
How many tweets are there per hour/day/month? | num_tweets_by_timeperiod() |
How many times did each user post a tweet? | num_tweets_by_username() |
How many unique users posted tweets per hour/day/month? | num_users_by_timeperiod() |
What are the most frequently used hashtags? | top_n_hashtags() |
Which tweets were liked the most? | top_n_liked_tweets() |
Who is being mentioned the most? | top_n_mentions() |
Who is being replied to the most? | top_n_replied_to_tweets() |
Which tweets were retweeted the most? | top_n_retweeted_tweets() |
Which accounts were retweeted the most? | top_n_retweeted_accounts() |
What are the engagement metrics for the tweets? | engagement_summary() |
Under the hood, tweetexploR uses ggplot2 to create nicely formatted charts, and even allows you to tweak them to suit your own preferences. tweetexploR also gives you the option to export the data underlying each visualisation by using the return_data = TRUE
parameter.
You can install the development version of tweetexploR from GitHub with:
# install.packages("devtools")
devtools::install_github("QUT-Digital-Observatory/tweetexploR")
The first step is to connect to your SQLite database of Tweets that you created using tidy_tweet. If you haven't got your tidy_tweet SQLite database yet, follow the instructions in the tidy_tweet documentation.
Your tidy_tweet SQLite database should be a file ending in .db
. You'll need to know the path to the .db
file in order to connect to it.
sqlite_con <- connect_to_sqlite_db("my_database.db")
It is recommended that you save your connection to the SQLite database as an object in your environment so that you can use it with all of the other tweetexploR functions. A suggested name is sqlite_con
.
Once you have connected to the SQLite database, you should receive a message to let you know how many tweets were found in the database:
The following code creates a bar chart of the top 10 hashtags and saves it as an object in your environment called top_10_hashtags
. The code assumes that you called your SQLite database connection sqlite_con
.
library(tweetexploR)
top_10_hashtags <- top_n_hashtags(sqlite_con, n = 10)
If you'd like to tweak the chart, you can specify parameters to pass to geom_col()
such as fill = "blue"
. You can also add you own theme or use one of the complete themes such as theme_minimal()
:
top_10_hashtags <- top_n_hashtags(sqlite_con, n = 10, fill = "blue") +
theme_minimal()
You can save your chart using ggplot2::ggsave()
in the same way that you would save any other ggplot chart:
ggsave(filename = "my_awesome_chart.png", width = 12, height = 8, units = "cm")
The following code creates a line chart of the number of tweets per day and saves it as an object in your environment called tweets_per_day
. It will also return the underlying data as a data frame by using return_data = TRUE
. Again, the code assumes that you called your SQLite database connection sqlite_con
.
library(tweetexploR)
tweets_per_day <- num_tweets_by_timeperiod(sqlite_con, period = "day", return_data = TRUE)
To access the chart:
tweets_per_day$chart
To access the underlying data as a data frame:
tweets_per_day$data
The following code generates a data frame that summarises the engagement metrics for all tweets in the database. Again, the code assumes that you called your SQLite database connection sqlite_con
.
library(tweetexploR)
engagement_summary <- engagement_summary(sqlite_con)
Once you've finished exploRing, it's good practice to disconnect from your SQLite database:
DBI::dbDisconnect(sqlite_con)
We appreciate all feedback and contributions!
Issues (documentation improvements, bug reports, and feature requests/discussions) are always welcome on our GitHub, as are pull requests.
If you found an error or if something was unclear to you, please file an issue on our GitHub for us to fix. We will do our best to respond to all issues, and appreciate your time and feedback. Alternatively, you can submit a pull request with your own changes.
tweetexploR is created and maintained by the QUT Digital Observatory and is open-sourced under an MIT license. We welcome contributions and feedback!
To cite this package:
QUT Digital Observatory (2023): tweetexploR. Queensland University of Technology. (Software) https://doi.org/10.25912/RDF_1676860790823