Skip to content

Aureum01/CIA_FACTBOOK_API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CIA World Factbook API

A professional Node.js API for scraping and accessing CIA World Factbook data. This project translates VBA web scrapers into a modern JavaScript API using Puppeteer, Axios, and Cheerio.

πŸš€ Features

  • RESTful API - Clean, professional API endpoints
  • Multiple Data Sources - Access various CIA Factbook fields and comparisons
  • Real-time Scraping - Fresh data from CIA sources
  • Error Handling - Comprehensive error handling and logging
  • Rate Limiting - Built-in rate limiting and request management
  • CORS Support - Cross-origin resource sharing enabled

πŸ“¦ Installation

  1. Clone or download the project
  2. Navigate to the project directory
  3. Install dependencies:
npm install
  1. Start the server:
npm start

The API will be available at http://localhost:3000

πŸ“‹ API Endpoints

Base URL

http://localhost:3000

Health Check

  • GET /health - Check API status

Field Data Endpoints

Get All Available Fields

  • GET /api/fields
  • Returns list of available field endpoints

Get Specific Field Data

  • GET /api/fields/{field-name}
  • Scrapes and returns data for a specific field

Bulk Field Scraping

  • POST /api/fields/bulk
  • Body: { "fields": ["field1", "field2", ...] }

Comparison Data Endpoints

Get All Available Comparisons

  • GET /api/comparisons
  • Returns list of available comparison endpoints

Get Specific Comparison Data

  • GET /api/comparisons/{comparison-name}
  • Scrapes and returns comparison data

Bulk Comparison Scraping

  • POST /api/comparisons/bulk
  • Body: { "comparisons": ["comparison1", "comparison2", ...] }

πŸ“Š Available Fields

Simple Fields

  • diplomatic-representation-us - Diplomatic representation in the US
  • economic-overview - Economic overview
  • gdp-purchasing-power - GDP purchasing power parity
  • electricity-access - Electricity access
  • civil-aircraft-code - Civil aircraft registration codes
  • military-and-security-forces - Military and security forces
  • climate - Climate information
  • urbanization - Urbanization data
  • major-urban-areas-population - Major urban areas population
  • drinking-water-source - Drinking water source
  • sanitation-facility-access - Sanitation facility access
  • education-expenditure - Education expenditure
  • environmental-issues - Environmental issues

Comparison Fields

  • subscriptions-comparison - Broadband subscriptions per 100 and total
  • electricity-country-comparison - Electricity capacity, consumption, exports/imports, losses
  • energy-country-comparison - Energy consumption per capita (2023 data)
  • airports-country-comparison - Total airports

πŸ’‘ Usage Examples

JavaScript/Node.js

const axios = require('axios');

// Get climate data for all countries
axios.get('http://localhost:3000/api/fields/climate')
  .then(response => {
    console.log('Climate data:', response.data);
  })
  .catch(error => {
    console.error('Error:', error.message);
  });

// Get electricity comparison data
axios.get('http://localhost:3000/api/comparisons/electricity-country-comparison')
  .then(response => {
    console.log('Electricity data:', response.data);
  });

// Bulk scraping multiple fields
axios.post('http://localhost:3000/api/fields/bulk', {
  fields: ['climate', 'economic-overview', 'gdp-purchasing-power']
})
  .then(response => {
    console.log('Bulk data:', response.data);
  });

Python

import requests

# Get diplomatic representation data
response = requests.get('http://localhost:3000/api/fields/diplomatic-representation-us')
data = response.json()

print(f"Found {len(data['data'])} entries")
for entry in data['data'][:5]:  # Show first 5
    print(f"{entry['placeName']}: {entry['formatted'][:100]}...")

cURL

# Get all available fields
curl http://localhost:3000/api/fields

# Get specific field data
curl http://localhost:3000/api/fields/climate

# Bulk scraping
curl -X POST http://localhost:3000/api/fields/bulk \
  -H "Content-Type: application/json" \
  -d '{"fields": ["climate", "economic-overview"]}'

πŸ“ Response Format

All API responses follow this structure:

{
  "success": true,
  "data": [...],
  "count": 195,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Field Data Response

{
  "success": true,
  "field": "climate",
  "data": [
    {
      "placeName": "United States",
      "formatted": "Continental climate with considerable regional variation...",
      "region": "North America"
    }
  ],
  "count": 195,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Comparison Data Response

{
  "success": true,
  "comparison": "electricity-country-comparison",
  "data": [
    {
      "placeName": "United States",
      "region": "North America",
      "installedGeneratingCapacity": "1,107,000,000 kW",
      "consumption": "3,928,000,000,000 kWh",
      "exports": "12,500,000,000 kWh",
      "imports": "51,200,000,000 kWh",
      "transmissionDistributionLosses": "209,000,000,000 kWh"
    }
  ],
  "count": 195,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

πŸ› οΈ Project Structure

CIA_FACTBOOK/
β”œβ”€β”€ scrapers/                    # Web scraping modules
β”‚   β”œβ”€β”€ base-scraper.js         # Base scraper class
β”‚   β”œβ”€β”€ base-comparison-scraper.js
β”‚   β”œβ”€β”€ diplomatic-representation-us.js
β”‚   β”œβ”€β”€ economic-overview.js
β”‚   └── ... (other field scrapers)
β”œβ”€β”€ routes/                     # API route handlers
β”‚   β”œβ”€β”€ fields.js              # Field data routes
β”‚   └── comparisons.js         # Comparison data routes
β”œβ”€β”€ utils/                      # Utility functions
β”‚   └── helpers.js             # Data cleaning and processing
β”œβ”€β”€ server.js                   # Main Express server
β”œβ”€β”€ package.json               # Project dependencies
└── README.md                  # This file

πŸ”§ Technical Details

Dependencies

  • Express.js - Web framework
  • Axios - HTTP client for API requests
  • Puppeteer - Browser automation (for complex scraping)
  • Cheerio - HTML parsing
  • Helmet - Security middleware
  • CORS - Cross-origin resource sharing
  • Morgan - HTTP request logger

Data Sources

All data is scraped from official CIA World Factbook APIs:

  • Base URL: https://www.cia.gov/the-world-factbook/page-data/field/
  • Field endpoints: {field-name}/page-data.json
  • Comparison endpoints: {field-name}/country-comparison/page-data.json

Rate Limiting

  • Built-in delays between requests to avoid overwhelming servers
  • Error handling for rate limit responses
  • Automatic retry logic with exponential backoff

⚠️ Important Notes

  1. Data Freshness: Data is scraped in real-time from CIA sources
  2. Rate Limiting: Respectful scraping with built-in delays
  3. Usage: This tool is for educational and research purposes
  4. Attribution: Always attribute data to CIA World Factbook
  5. Legal: Ensure compliance with CIA terms of service

πŸ› Troubleshooting

Common Issues

  1. Server won't start: Ensure port 3000 is available
  2. Scraping fails: Check internet connection and CIA website availability
  3. Empty responses: Some fields may have no data for certain countries

Error Codes

  • 404 - Endpoint or field not found
  • 500 - Server error (check logs)
  • 429 - Rate limited (wait and retry)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add new scrapers or improve existing ones
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project is for educational purposes. Please respect CIA data usage policies.

πŸ™ Acknowledgments

  • CIA World Factbook for providing comprehensive country data
  • Original VBA scrapers for the data collection logic
  • Open source community for the excellent tools used

Created with ❀️ for data enthusiasts and researchers

About

An open-source API that delivers structured access to data from the CIA World Factbook (also known as the CIA World Handbook). This project converts the raw Factbook entries into a normalized, developer-friendly JSON API with country-level endpoints, rate limiting, and easy integration for research, intelligence, and risk analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors