Project is live on Twitter:
@x_art_bot🎨
- Clone this repo to a local directory
git clone https://github.com/LimarAryan/x_art_bot.git
- Switch to the cloned repository folder
cd path_to_directory/x_art_bot
- Install tweepy, a python twitter package
pip install tweepy
- Run art_scraper.py from inside the directory
images downloaded will go into thex_art_bot/art_images
folder.
python art_scraper.py
WARNING: if you leave this script running continuously it will download 100,000 image files. Close the terminal or Click CTRL + C
to exit out of the art_scraper.py script when you are satisfied with the amount of images downloaded.
- Use a 3-legged OAuth API flow to get your
access_token
andaccess_token_secret
Twitter's Developer documentation for 3-legged OAuth flow can be found HERE
Here is an example of python code needed for the 3-legged OAuth flow
to get youraccess_token
andaccess_token_secret
, you can copy and paste the code below
into art_bot.py, run the python program once and a print message on the terminal screen should pop up with your
access_token
andaccess_token_secret
after these are obtained you can delete this code from the program entirely
import requests
from urllib.parse import quote
# Your credentials
#The API Key and API Secret can be found in your twitter developer portal under 'Keys and Tokens'
#The CALLBACK_URL can be found in the User authentication settings in the twitter developer portal
api_key = 'YourAPIKey'
api_secret = 'YourAPISecret'
CALLBACK_URL = 'https://api.twitter.com/oauth/authorize?oauth_token={YOUR_OAUTH_TOKEN}' #example link
# Step 1: Encode callback URL and get request token
callback_encoded = quote(CALLBACK_URL, safe='')
response = requests.post(f"https://api.twitter.com/oauth/request_token?oauth_callback={callback_encoded}", auth=(api_key, api_secret))
if response.status_code == 200:
# Extract token and secret from response
oauth_token, oauth_token_secret = response.text.split('&')[0], response.text.split('&')[1]
print("Request Token and Secret obtained.")
# Step 2: Redirect user to Twitter for authorization
# Direct the user to this URL
print(f"https://api.twitter.com/oauth/authorize?oauth_token={oauth_token}")
# Step 3 would occur after the user has authorized the app and you've received the oauth_verifier
# This part would typically be handled by your web server handling the callback
else:
print("Failed to obtain request token.")
- On line 8-11 in art_bot.py fill in your API keys:
api_key = "x"
api_secret = "x"
access_token = "x"
access_token_secret = "x"
- Run art_bot.py to post a random artwork image from
x_art_bot/art_images
python art_bot.py
The 'work' folder containing JSON data
which is provided by the amazing students at
Carnegie Mellon University
with 100,000 already scraped / crawled image
sites and metadata. You can download it
from my repo, or from this link as a zipped
file called "nearest_neighbors.tar.gz":
Download Link
This version runs in a local environment,
however I am running the actual
x_art_bot account on an aws lambda function,
using an s3 bucket, and dynamodb. A brief explanation
will be given in the next section.
I am using all AWS free tier products to run this concurrently.
I use an AWS lambda function for the python script, DynamoDB to store
used filenames, and an S3 Bucket for the img files.
I have included the code that I'm using in lambda_function.py
Basically, what is happening is that I check the local tmp folder of
the lambda instance and if there are any files in the /tmp/ folder I delete them through a shell subprocess
(because this is where the bot places the /tmp/ img file from the s3 bucket the moment before it posts).
Then I establish a connection to my twitter API through the keys and define my twitter posting functions.
I grab an artwork filename, and check if the filename for the artwork is already used (with DynamoDB)
If it is not used, then it will post it to twitter, and store the new used filename in DynamoDB.
Important: The python open-source library "tweepy" is zipped up and placed as a 'layer' because this is how dependencies are used in lambda.
A 'Cloudwatch Event' is used to create a cron job so the script only runs every hour.
I use AWS DynamoDB to save a simple txt string of the filename that is already used.
So the lambda function when checking if a file is already used goes to --> DynamoDB database,
--> then compares the current filename to all the filenames already used.
If it is not used, then it is posted. If it has already been used, then it will not be posted,
and that img file is skipped, and the next img file is checked.
This is where the real magic happens, in the s3 bucket I upload img files
that I scraped with art_scraper.py from my local terminal. Currently there are 10,000 images in
my s3 bucket, but I can always manually add more by uploading them through aws cli.
This part is not automatic, but only takes little time and can be run in the background.
Perhaps I can update this later on to make it concurrent,
maybe with another lambda function ?