A professional Node.js API for scraping and accessing CIA World Factbook data. This project translates VBA web scrapers into a modern JavaScript API using Puppeteer, Axios, and Cheerio.
- RESTful API - Clean, professional API endpoints
- Multiple Data Sources - Access various CIA Factbook fields and comparisons
- Real-time Scraping - Fresh data from CIA sources
- Error Handling - Comprehensive error handling and logging
- Rate Limiting - Built-in rate limiting and request management
- CORS Support - Cross-origin resource sharing enabled
- Clone or download the project
- Navigate to the project directory
- Install dependencies:
npm install- Start the server:
npm startThe API will be available at http://localhost:3000
http://localhost:3000
- GET
/health- Check API status
- GET
/api/fields - Returns list of available field endpoints
- GET
/api/fields/{field-name} - Scrapes and returns data for a specific field
- POST
/api/fields/bulk - Body:
{ "fields": ["field1", "field2", ...] }
- GET
/api/comparisons - Returns list of available comparison endpoints
- GET
/api/comparisons/{comparison-name} - Scrapes and returns comparison data
- POST
/api/comparisons/bulk - Body:
{ "comparisons": ["comparison1", "comparison2", ...] }
diplomatic-representation-us- Diplomatic representation in the USeconomic-overview- Economic overviewgdp-purchasing-power- GDP purchasing power parityelectricity-access- Electricity accesscivil-aircraft-code- Civil aircraft registration codesmilitary-and-security-forces- Military and security forcesclimate- Climate informationurbanization- Urbanization datamajor-urban-areas-population- Major urban areas populationdrinking-water-source- Drinking water sourcesanitation-facility-access- Sanitation facility accesseducation-expenditure- Education expenditureenvironmental-issues- Environmental issues
subscriptions-comparison- Broadband subscriptions per 100 and totalelectricity-country-comparison- Electricity capacity, consumption, exports/imports, lossesenergy-country-comparison- Energy consumption per capita (2023 data)airports-country-comparison- Total airports
const axios = require('axios');
// Get climate data for all countries
axios.get('http://localhost:3000/api/fields/climate')
.then(response => {
console.log('Climate data:', response.data);
})
.catch(error => {
console.error('Error:', error.message);
});
// Get electricity comparison data
axios.get('http://localhost:3000/api/comparisons/electricity-country-comparison')
.then(response => {
console.log('Electricity data:', response.data);
});
// Bulk scraping multiple fields
axios.post('http://localhost:3000/api/fields/bulk', {
fields: ['climate', 'economic-overview', 'gdp-purchasing-power']
})
.then(response => {
console.log('Bulk data:', response.data);
});import requests
# Get diplomatic representation data
response = requests.get('http://localhost:3000/api/fields/diplomatic-representation-us')
data = response.json()
print(f"Found {len(data['data'])} entries")
for entry in data['data'][:5]: # Show first 5
print(f"{entry['placeName']}: {entry['formatted'][:100]}...")# Get all available fields
curl http://localhost:3000/api/fields
# Get specific field data
curl http://localhost:3000/api/fields/climate
# Bulk scraping
curl -X POST http://localhost:3000/api/fields/bulk \
-H "Content-Type: application/json" \
-d '{"fields": ["climate", "economic-overview"]}'All API responses follow this structure:
{
"success": true,
"data": [...],
"count": 195,
"timestamp": "2024-01-15T10:30:00.000Z"
}{
"success": true,
"field": "climate",
"data": [
{
"placeName": "United States",
"formatted": "Continental climate with considerable regional variation...",
"region": "North America"
}
],
"count": 195,
"timestamp": "2024-01-15T10:30:00.000Z"
}{
"success": true,
"comparison": "electricity-country-comparison",
"data": [
{
"placeName": "United States",
"region": "North America",
"installedGeneratingCapacity": "1,107,000,000 kW",
"consumption": "3,928,000,000,000 kWh",
"exports": "12,500,000,000 kWh",
"imports": "51,200,000,000 kWh",
"transmissionDistributionLosses": "209,000,000,000 kWh"
}
],
"count": 195,
"timestamp": "2024-01-15T10:30:00.000Z"
}CIA_FACTBOOK/
βββ scrapers/ # Web scraping modules
β βββ base-scraper.js # Base scraper class
β βββ base-comparison-scraper.js
β βββ diplomatic-representation-us.js
β βββ economic-overview.js
β βββ ... (other field scrapers)
βββ routes/ # API route handlers
β βββ fields.js # Field data routes
β βββ comparisons.js # Comparison data routes
βββ utils/ # Utility functions
β βββ helpers.js # Data cleaning and processing
βββ server.js # Main Express server
βββ package.json # Project dependencies
βββ README.md # This file
- Express.js - Web framework
- Axios - HTTP client for API requests
- Puppeteer - Browser automation (for complex scraping)
- Cheerio - HTML parsing
- Helmet - Security middleware
- CORS - Cross-origin resource sharing
- Morgan - HTTP request logger
All data is scraped from official CIA World Factbook APIs:
- Base URL:
https://www.cia.gov/the-world-factbook/page-data/field/ - Field endpoints:
{field-name}/page-data.json - Comparison endpoints:
{field-name}/country-comparison/page-data.json
- Built-in delays between requests to avoid overwhelming servers
- Error handling for rate limit responses
- Automatic retry logic with exponential backoff
- Data Freshness: Data is scraped in real-time from CIA sources
- Rate Limiting: Respectful scraping with built-in delays
- Usage: This tool is for educational and research purposes
- Attribution: Always attribute data to CIA World Factbook
- Legal: Ensure compliance with CIA terms of service
- Server won't start: Ensure port 3000 is available
- Scraping fails: Check internet connection and CIA website availability
- Empty responses: Some fields may have no data for certain countries
404- Endpoint or field not found500- Server error (check logs)429- Rate limited (wait and retry)
- Fork the repository
- Create a feature branch
- Add new scrapers or improve existing ones
- Test thoroughly
- Submit a pull request
This project is for educational purposes. Please respect CIA data usage policies.
- CIA World Factbook for providing comprehensive country data
- Original VBA scrapers for the data collection logic
- Open source community for the excellent tools used
Created with β€οΈ for data enthusiasts and researchers