Automatically scrape RSS feeds and detect embedded social media posts (Bluesky, Twitter/X, TikTok, Instagram, Facebook). Log findings to Google Sheets and optionally repost Bluesky embeds to a Bluesky account.
- Multi-platform detection: Bluesky, Twitter/X, TikTok, Instagram, Facebook
- RSS feed scraping: Configurable list of feeds to monitor
- Google Sheets logging: Track all discovered embeds with duplicate prevention
- Bluesky reposting: Automatically quote-post discovered Bluesky embeds
- GitHub Actions: Runs on a schedule (every 30 minutes by default)
git clone https://github.com/your-username/media-embed-tracer.git- Create a new Google Spreadsheet
- Note the URL (you'll need it for
SPREADSHEET_URL) - Share the spreadsheet with your service account email (see setup instruction below)
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Google Sheets API and Google Drive API
- Create a Service Account:
- Go to "IAM & Admin" > "Service Accounts"
- Click "Create Service Account"
- Give it a name and create
- Create a key:
- Click on the service account
- Go to "Keys" tab
- Add Key > Create new key > JSON
- Download the JSON file
- Share your spreadsheet:
- Open the JSON file and copy the
client_email - Share your Google Spreadsheet with this email (Editor access)
- Open the JSON file and copy the
If you want to repost Bluesky embeds:
- Go to Bluesky Settings
- Navigate to "App Passwords"
- Create a new app password
- Save it for the
BLUESKY_PASSWORDsecret
Go to your repository's Settings > Secrets and variables > Actions, and add:
| Secret | Description |
|---|---|
FEEDS_JSON |
JSON array of feeds (see format below) |
SPREADSHEET_URL |
Full URL of your Google Spreadsheet |
GOOGLE_CREDENTIALS_JSON |
Full contents of your service account JSON file |
| Secret | Description |
|---|---|
BLUESKY_POSTING_ENABLED |
Set to true to enable posting |
BLUESKY_ACCOUNT |
Account name: international or localized |
BLUESKY_USERNAME |
Your Bluesky handle (e.g., yourbot.bsky.social) |
BLUESKY_PASSWORD |
Your Bluesky app password |
FEED_NAMES_JSON |
JSON mapping domains to friendly names |
For running multiple instances (e.g., international + localized regional feeds):
| Secret | Description |
|---|---|
BLUESKY_INTERNATIONAL_USERNAME |
International bot handle |
BLUESKY_INTERNATIONAL_PASSWORD |
International bot app password |
BLUESKY_LOCALIZED_USERNAME |
Localized/regional bot handle |
BLUESKY_LOCALIZED_PASSWORD |
Localized/regional bot app password |
Create a JSON array with your RSS feeds:
[
{
"name": "The Guardian",
"url": "https://www.theguardian.com/world/rss"
},
{
"name": "BBC News",
"url": "https://feeds.bbci.co.uk/news/rss.xml"
}
]Store this as the FEEDS_JSON secret.
For nicer Bluesky post formatting, create a JSON feed dictionary:
{
"theguardian.com": "The Guardian",
"bbc.co.uk": "BBC",
"nytimes.com": "NYT"
}Store this as the FEED_NAMES_JSON secret.
To run separate instances for international and localized outlets:
- Option A: Fork the repo twice with different secrets
- Option B: Create two workflow files with different configurations
For Option B, create .github/workflows/scrape-localized.yml:
name: Scrape Localized Feeds
on:
schedule:
- cron: '15,45 * * * *' # Offset from main workflow
workflow_dispatch:
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- run: pip install -r requirements.txt
- run: python -m src.main
env:
FEEDS_JSON: ${{ secrets.LOCALIZED_FEEDS_JSON }}
SPREADSHEET_URL: ${{ secrets.LOCALIZED_SPREADSHEET_URL }}
GOOGLE_CREDENTIALS_JSON: ${{ secrets.GOOGLE_CREDENTIALS_JSON }}
BLUESKY_POSTING_ENABLED: 'true'
BLUESKY_ACCOUNT: localized
BLUESKY_LOCALIZED_USERNAME: ${{ secrets.BLUESKY_LOCALIZED_USERNAME }}
BLUESKY_LOCALIZED_PASSWORD: ${{ secrets.BLUESKY_LOCALIZED_PASSWORD }}
FEED_NAMES_JSON: ${{ secrets.LOCALIZED_FEED_NAMES_JSON }}The scraper creates/uses a worksheet called "All Embeds" with these columns:
| Column | Description |
|---|---|
| Date | Date discovered (YYYY-MM-DD) |
| Time | Time discovered (HH:MM:SS) |
| Platform | Social media platform |
| Domain | Source article domain |
| Author Handle | Post author's handle |
| Article URL | URL of the article |
| Post URL | URL of the embedded post |
| Article Title | Title of the article |
| Article Summary | Brief summary |
| Published Date | When the article was published |
| Repost Status | pending, posted, or failed |
- Direct bsky.app links
- at:// URI format
- Bluesky embed blockquotes
- twitter.com and x.com links
- Twitter embed blockquotes
- Full tiktok.com video URLs
- Short vm.tiktok.com links (automatically expanded)
- TikTok embed blockquotes
- Posts (/p/)
- Reels (/reel/, /reels/)
- IGTV (/tv/)
- Instagram embed blockquotes
- Posts
- Videos and Watch
- Reels
- Photos
- fb.watch short links