Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ src/lib/services/initFirebase.js
*.log
/prescraped-data-*/
/prescraped-*.json
/scraping-data/
# Test files with secrets
test-*.js
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -40,30 +40,30 @@ npm run fix-lyrics-cached

1. **Fix songs for a specific artist** (FASTEST):
```bash
node scripts/fix-null-lyrics.js --artist grace-petrie --dry-run
node fix-null-lyrics.js --artist grace-petrie --dry-run
```

2. **Fix only cached songs** (FAST):
```bash
node scripts/fix-null-lyrics.js --check-cached-only --dry-run
node fix-null-lyrics.js --check-cached-only --dry-run
```

3. **Scan limited songs** (SLOW):
```bash
node scripts/fix-null-lyrics.js --max-songs 100 --dry-run
node fix-null-lyrics.js --max-songs 100 --dry-run
```

### Filter by Artist (Recommended)

Fix only songs from a specific artist - this is the fastest method:
```bash
# Dry run - the script will search for the artist automatically
node scripts/fix-null-lyrics.js --artist "grace petrie" --dry-run
node scripts/fix-null-lyrics.js --artist "Grace Petrie" --dry-run
node scripts/fix-null-lyrics.js --artist grace-petrie --dry-run
node fix-null-lyrics.js --artist "grace petrie" --dry-run
node fix-null-lyrics.js --artist "Grace Petrie" --dry-run
node fix-null-lyrics.js --artist grace-petrie --dry-run

# Actually fix
node scripts/fix-null-lyrics.js --artist "kendrick lamar"
node fix-null-lyrics.js --artist "kendrick lamar"
```

**The script now smartly searches for artists!** You can use:
Expand All @@ -82,7 +82,7 @@ This method:

Only process songs that are in artists' `cachedSongIds` arrays:
```bash
node scripts/fix-null-lyrics.js --check-cached-only
node fix-null-lyrics.js --check-cached-only
```

**This is the recommended approach** as it:
Expand All @@ -95,12 +95,12 @@ node scripts/fix-null-lyrics.js --check-cached-only

Process songs in smaller batches:
```bash
node scripts/fix-null-lyrics.js --batch-size 5
node fix-null-lyrics.js --batch-size 5
```

Limit total number of songs to process:
```bash
node scripts/fix-null-lyrics.js --max-songs 50
node fix-null-lyrics.js --max-songs 50
```

**Note:** When scanning ALL songs (without `--artist` or `--check-cached-only`), the script defaults to a maximum of 10,000 songs to prevent runaway scans. Use `--max-songs` to adjust this limit.
Expand All @@ -109,14 +109,14 @@ node scripts/fix-null-lyrics.js --max-songs 50

See detailed information about each song:
```bash
node scripts/fix-null-lyrics.js --verbose
node fix-null-lyrics.js --verbose
```

### Combine Options

```bash
node scripts/fix-null-lyrics.js --artist baby-jey --dry-run --verbose
node scripts/fix-null-lyrics.js --check-cached-only --batch-size 3 --max-songs 20
node fix-null-lyrics.js --artist baby-jey --dry-run --verbose
node fix-null-lyrics.js --check-cached-only --batch-size 3 --max-songs 20
```

## What It Does
Expand Down Expand Up @@ -279,9 +279,9 @@ For permanently failed songs:
**Solutions:**
1. Try different name formats:
```bash
node scripts/fix-null-lyrics.js --artist "grace petrie" --dry-run
node scripts/fix-null-lyrics.js --artist "Grace Petrie" --dry-run
node scripts/fix-null-lyrics.js --artist grace-petrie --dry-run
node fix-null-lyrics.js --artist "grace petrie" --dry-run
node fix-null-lyrics.js --artist "Grace Petrie" --dry-run
node fix-null-lyrics.js --artist grace-petrie --dry-run
```

2. The script will search using:
Expand All @@ -299,7 +299,7 @@ For permanently failed songs:

**Solution:** Use the `--check-cached-only` flag:
```bash
node scripts/fix-null-lyrics.js --check-cached-only --dry-run
node fix-null-lyrics.js --check-cached-only --dry-run
```

This targets only songs in `cachedSongIds` arrays (songs that should have lyrics) and fetches them one at a time instead of all at once, avoiding timeouts.
Expand All @@ -326,7 +326,7 @@ If you see HTTP 429 errors, the script is hitting Genius too fast. Try:

### Firebase Connection Issues

The script uses the same Firebase configuration as your other scripts (`scripts/firebase-uploader.js`, etc.). If those work, this will too!
The script uses the same Firebase configuration as your other scripts (`firebase-uploader.js`, etc.). If those work, this will too!

If you encounter connection issues:
1. Check that your Firebase config in `src/lib/services/initFirebase.js` is correct
Expand Down
File renamed without changes.
File renamed without changes.
4,006 changes: 4,006 additions & 0 deletions Most Popular J Artists on Genius.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -99,23 +99,23 @@ This prevents them from being treated as "cached" in the future and cluttering t
A bulk scraper utility has been created to proactively fix all songs with null lyrics across your entire database!

### Location
- **Script**: `scripts/fix-null-lyrics.js`
- **Script**: `fix-null-lyrics.js`
- **Documentation**: `FIX_NULL_LYRICS_README.md`

### Quick Start

```bash
# Dry run to see what would be fixed
node scripts/fix-null-lyrics.js --dry-run
node fix-null-lyrics.js --dry-run

# Fix all songs with null lyrics
node scripts/fix-null-lyrics.js
node fix-null-lyrics.js

# Fix only songs from a specific artist
node scripts/fix-null-lyrics.js --artist baby-jey
node fix-null-lyrics.js --artist baby-jey

# Fix only songs that are supposed to be cached
node scripts/fix-null-lyrics.js --check-cached-only
node fix-null-lyrics.js --check-cached-only
```

### Features
Expand Down
30 changes: 15 additions & 15 deletions documentation/PRESCRAPER_README.md → PRESCRAPER_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ A comprehensive bulk scraping and upload system for LyricType that fetches artis

The prescraper system consists of two main components:

1. **`scripts/prescraper.js`** - Scrapes artist songs and lyrics from Genius API
2. **`scripts/firebase-uploader.js`** - Uploads prescraped data to Firebase Firestore
1. **`prescraper.js`** - Scrapes artist songs and lyrics from Genius API
2. **`firebase-uploader.js`** - Uploads prescraped data to Firebase Firestore

This system allows you to bulk-populate your database with artist data, song metadata, and lyrics for a better user experience.

## Features

### 🚀 Prescraper (`scripts/prescraper.js`)
### 🚀 Prescraper (`prescraper.js`)
- ✅ Loads artists from existing `genius-artists-*.json` files
- ✅ Fetches complete song lists for each artist (up to 1000 songs)
- ✅ Scrapes lyrics for configurable number of top songs per artist
Expand All @@ -24,7 +24,7 @@ This system allows you to bulk-populate your database with artist data, song met
- ✅ Detailed logging and statistics
- ✅ Configurable via CLI arguments

### 🔥 Firebase Uploader (`scripts/firebase-uploader.js`)
### 🔥 Firebase Uploader (`firebase-uploader.js`)
- ✅ Uploads artists, songs, and lyrics to Firestore
- ✅ Batch operations for efficiency
- ✅ Duplicate detection and skip existing data
Expand Down Expand Up @@ -81,7 +81,7 @@ Either:
#### Basic Usage
```bash
# Scrape 10 songs per artist for all letters
node scripts/prescraper.js
node prescraper.js

# Or use npm script
npm start
Expand All @@ -90,13 +90,13 @@ npm start
#### Advanced Options
```bash
# Test with limited data
node scripts/prescraper.js --test 5 --letters a,b --songs 3
node prescraper.js --test 5 --letters a,b --songs 3

# Scrape specific letters only
node scripts/prescraper.js --letters j,k,l --songs 15
node prescraper.js --letters j,k,l --songs 15

# Help
node scripts/prescraper.js --help
node prescraper.js --help
```

#### CLI Options
Expand All @@ -110,7 +110,7 @@ node scripts/prescraper.js --help
#### Basic Usage
```bash
# Upload latest prescraped data
node scripts/firebase-uploader.js
node firebase-uploader.js

# Or use npm script
npm run upload
Expand All @@ -119,17 +119,17 @@ npm run upload
#### Advanced Options
```bash
# Dry run (test without uploading)
node scripts/firebase-uploader.js --dry-run
node firebase-uploader.js --dry-run
npm run upload-dry

# Upload specific directory
node scripts/firebase-uploader.js --dir ./prescraped-data-2025-09-14/
node firebase-uploader.js --dir ./prescraped-data-2025-09-14/

# Force overwrite existing data
node scripts/firebase-uploader.js --force
node firebase-uploader.js --force

# Help
node scripts/firebase-uploader.js --help
node firebase-uploader.js --help
```

#### CLI Options
Expand Down Expand Up @@ -334,10 +334,10 @@ The prescraper creates partial files as it works. To resume:
The uploader skips existing artists by default:
```bash
# Skip existing data (default)
node scripts/firebase-uploader.js
node firebase-uploader.js

# Or force overwrite
node scripts/firebase-uploader.js --force
node firebase-uploader.js --force
```

## Advanced Configuration
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,13 @@ Fix null lyrics for a single artist (much faster than scanning all songs):

```bash
# Dry run for specific artist - try any of these formats!
node scripts/fix-null-lyrics.js --artist "grace petrie" --dry-run
node scripts/fix-null-lyrics.js --artist "Grace Petrie" --dry-run
node scripts/fix-null-lyrics.js --artist grace-petrie --dry-run
node fix-null-lyrics.js --artist "grace petrie" --dry-run
node fix-null-lyrics.js --artist "Grace Petrie" --dry-run
node fix-null-lyrics.js --artist grace-petrie --dry-run

# Fix for specific artist
node scripts/fix-null-lyrics.js --artist "kendrick lamar"
node scripts/fix-null-lyrics.js --artist "Kendrick Lamar"
node fix-null-lyrics.js --artist "kendrick lamar"
node fix-null-lyrics.js --artist "Kendrick Lamar"
```

**Smart artist search!** The script will find artists using:
Expand All @@ -79,22 +79,22 @@ The artist filter:

Process songs in smaller batches (slower but safer):
```bash
node scripts/fix-null-lyrics.js --batch-size 3
node fix-null-lyrics.js --batch-size 3
```

Limit how many songs to process:
```bash
node scripts/fix-null-lyrics.js --max-songs 20
node fix-null-lyrics.js --max-songs 20
```

See detailed info about each song:
```bash
node scripts/fix-null-lyrics.js --verbose
node fix-null-lyrics.js --verbose
```

Combine options:
```bash
node scripts/fix-null-lyrics.js --artist drake --batch-size 5 --verbose
node fix-null-lyrics.js --artist drake --batch-size 5 --verbose
```

## How the Automatic System Works
Expand Down
16 changes: 8 additions & 8 deletions documentation/SCRAPER_README.md → SCRAPER_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,25 +25,25 @@ A Node.js script to scrape artist links from Genius.com artist index pages.
#### Single Letter Scraping
```bash
# Scrape artists for letter 'j' with IDs (default, slower)
node scripts/genius-scraper.js
node genius-scraper.js

# Scrape artists for a specific letter with IDs
node scripts/genius-scraper.js a
node scripts/genius-scraper.js k
node scripts/genius-scraper.js z
node genius-scraper.js a
node genius-scraper.js k
node genius-scraper.js z

# Fast mode: Skip ID extraction for quicker results
node scripts/genius-scraper.js j --no-ids
node scripts/genius-scraper.js a --no-ids
node genius-scraper.js j --no-ids
node genius-scraper.js a --no-ids
```

#### Bulk Scraping (All Letters A-Z)
```bash
# Scrape ALL letters with IDs (very slow - several hours!)
node scripts/genius-scraper.js all
node genius-scraper.js all

# Bulk scrape ALL letters without IDs (much faster - ~30 minutes)
node scripts/genius-scraper.js all --no-ids
node genius-scraper.js all --no-ids
```

### Example Output (with ID extraction)
Expand Down
File renamed without changes.
Loading