Skip to content

A bot that scrapes art from the National Gallery of Art's 100,000 images and posts it on twitter

Notifications You must be signed in to change notification settings

LimarAryan/x_art_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Art Bot v1🖼️

Project is live on Twitter:
@x_art_bot🎨

How to install and run in a local environment

  1. Clone this repo to a local directory
git clone https://github.com/LimarAryan/x_art_bot.git
  1. Switch to the cloned repository folder
cd path_to_directory/x_art_bot
  1. Install tweepy, a python twitter package
pip install tweepy
  1. Run art_scraper.py from inside the directory
    images downloaded will go into the x_art_bot/art_images folder.
python art_scraper.py

WARNING: if you leave this script running continuously it will download 100,000 image files. Close the terminal or Click CTRL + C
to exit out of the art_scraper.py script when you are satisfied with the amount of images downloaded.

  1. Use a 3-legged OAuth API flow to get your access_token and access_token_secret
    Twitter's Developer documentation for 3-legged OAuth flow can be found HERE

    Here is an example of python code needed for the 3-legged OAuth flow
    to get your access_token and access_token_secret, you can copy and paste the code below
    into art_bot.py, run the python program once and a print message on the terminal screen should pop up with your
    access_token and access_token_secret after these are obtained you can delete this code from the program entirely
import requests
from urllib.parse import quote

# Your credentials
#The API Key and API Secret can be found in your twitter developer portal under 'Keys and Tokens'
#The CALLBACK_URL can be found in the User authentication settings in the twitter developer portal
api_key = 'YourAPIKey'
api_secret = 'YourAPISecret'
CALLBACK_URL = 'https://api.twitter.com/oauth/authorize?oauth_token={YOUR_OAUTH_TOKEN}' #example link


# Step 1: Encode callback URL and get request token
callback_encoded = quote(CALLBACK_URL, safe='')
response = requests.post(f"https://api.twitter.com/oauth/request_token?oauth_callback={callback_encoded}", auth=(api_key, api_secret))

if response.status_code == 200:
    # Extract token and secret from response
    oauth_token, oauth_token_secret = response.text.split('&')[0], response.text.split('&')[1]
    print("Request Token and Secret obtained.")

    # Step 2: Redirect user to Twitter for authorization
    # Direct the user to this URL
    print(f"https://api.twitter.com/oauth/authorize?oauth_token={oauth_token}")

    # Step 3 would occur after the user has authorized the app and you've received the oauth_verifier
    # This part would typically be handled by your web server handling the callback
else:
    print("Failed to obtain request token.")
  1. On line 8-11 in art_bot.py fill in your API keys:
api_key = "x"
api_secret = "x"
access_token = "x"
access_token_secret = "x"
  1. Run art_bot.py to post a random artwork image from x_art_bot/art_images
python art_bot.py

Work Folder

The 'work' folder containing JSON data
which is provided by the amazing students at
Carnegie Mellon University
with 100,000 already scraped / crawled image
sites and metadata. You can download it
from my repo, or from this link as a zipped
file called "nearest_neighbors.tar.gz":
Download Link

Running Remotely

This version runs in a local environment,
however I am running the actual
x_art_bot account on an aws lambda function,
using an s3 bucket, and dynamodb. A brief explanation
will be given in the next section.

AWS

I am using all AWS free tier products to run this concurrently.
I use an AWS lambda function for the python script, DynamoDB to store
used filenames, and an S3 Bucket for the img files.

Lambda

I have included the code that I'm using in lambda_function.py
Basically, what is happening is that I check the local tmp folder of
the lambda instance and if there are any files in the /tmp/ folder I delete them through a shell subprocess
(because this is where the bot places the /tmp/ img file from the s3 bucket the moment before it posts).

Then I establish a connection to my twitter API through the keys and define my twitter posting functions.
I grab an artwork filename, and check if the filename for the artwork is already used (with DynamoDB)
If it is not used, then it will post it to twitter, and store the new used filename in DynamoDB.
Important: The python open-source library "tweepy" is zipped up and placed as a 'layer' because this is how dependencies are used in lambda.
A 'Cloudwatch Event' is used to create a cron job so the script only runs every hour.

DynamoDB

I use AWS DynamoDB to save a simple txt string of the filename that is already used.
So the lambda function when checking if a file is already used goes to --> DynamoDB database,
--> then compares the current filename to all the filenames already used.
If it is not used, then it is posted. If it has already been used, then it will not be posted,
and that img file is skipped, and the next img file is checked.

S3 Bucket

This is where the real magic happens, in the s3 bucket I upload img files
that I scraped with art_scraper.py from my local terminal. Currently there are 10,000 images in
my s3 bucket, but I can always manually add more by uploading them through aws cli.
This part is not automatic, but only takes little time and can be run in the background.
Perhaps I can update this later on to make it concurrent,
maybe with another lambda function ?

About

A bot that scrapes art from the National Gallery of Art's 100,000 images and posts it on twitter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages