Skip to content

Production Service Architecture

RayBB edited this page May 17, 2024 · 1 revision

UPDATES

  • 2019: ol-mem[0-2] trusty VMs have been reprovisioned as ol-mem[3-5] following the Provisioning Guide
  • Dec. 2020: Migration to Docker-based deployments based on docker compose
  • Jan. 2021: Upgrade to Infogami master and drop docker-compose.infogami-local.yml
  • Oct./Nov. 2021: Decomission ol-mem{3,4,5} on Ubuntu Xenial in favor of ol-mem{0,1,2} on Ubuntu Focal

Server Types

@cclauss, please update this section w/ instructions + commands on how to provision each of the following. We likely need to take into consideration ferm, docker install, services like haproxy + nginx + memcache, hb.conf (ips allow list for ol-db), & importantly olsystem updates, updating dns.

  • ol-mem
  • ol-web
  • ol-solr
  • ol-covers
  • ol-home
  • ol-www
  • ol-dev
  • ol-db

For ol-web, ol-home, ol-www, and ol-solr, we run something roughly equivalent to this setup script

Current Production Architecture

Today, our production service architecture consists of the following hosts and Docker containers:

hostname Docker image long name notes
ol-backup0 None bare-metal Trusty
ol-covers0 covers openlibrary-covers-1 & 2
ol-covers0 covers_nginx openlibrary-covers_nginx-1 underscore
ol-db1 & 2 None postgres on bare-metal -- no Docker Trusty
ol-home0 affiliate-server openlibrary-affiliate-server-1
ol-home0 cron-jobs openlibrary-cron-jobs-1
ol-home0 importbot openlibrary-importbot-1
ol-home0 infobase openlibrary-infobase-1
ol-home0 infobase_nginx openlibrary-infobase_nginx-1 underscore
ol-home0 solr-next-updater openlibrary-solr-next-updater-1
ol-home0 solr-updater openlibrary-solr-updater-1
ol-mem0, 1, 2 None memcached on bare metal -- no Docker
ol-solr0 & 1 solr openlibrary_solr_1 underscore
ol-solr0 & 1 solr_haproxy openlibrary_solr_haproxy_1 underscore
ol-solr0 & 1 solr_restarter openlibrary_solr_restarter_1 underscore
ol-web1 & 2 web openlibrary-web-1
ol-www0 web_haproxy openlibrary-web_haproxy-1 underscore
ol-www0 web_nginx openlibrary-web_nginx-1 underscore

Docker hostnames

hardware hostname container name docker hostname
ol-covers0 openlibrary-covers-1 docker-ol-covers0-covers-1
ol-covers0 openlibrary-covers-2 docker-ol-covers0-covers-2
ol-covers0 openlibrary-covers_nginx-1 docker-ol-covers0-covers_nginx-1
ol-home0 openlibrary-affiliate-server-1 docker-ol-home0-
ol-home0 openlibrary-cron-jobs-1 docker-ol-home0-cron-jobs-1
ol-home0 openlibrary-importbot-1 docker-ol-home0-importbot-1
ol-home0 openlibrary-infobase-1 docker-ol-home0-infobase-1
ol-home0 openlibrary-infobase_nginx-1 docker-ol-home0-infobase_nginx-1
ol-home0 openlibrary-solr-next-updater-1 docker-ol-home0-solr-next-updater-1
ol-home0 openlibrary-solr-updater-1 docker-ol-home0-solr-updater-1
ol-solr0 openlibrary_solr_1 docker-ol-solr0-solr_1
ol-solr0 openlibrary_solr_haproxy_1 docker-ol-solr0-solr_haproxy_1
ol-solr0 openlibrary_solr_restarter_1 docker-ol-solr0-solr_restarter_1
ol-solr1 jenkins docker-ol-solr1-jenkins
ol-solr1 openlibrary_solr_haproxy_1 docker-ol-solr1-solr_haproxy_1
ol-solr1 solr_builder-solr_prod-1 docker-ol-solr1-solr_prod-1
ol-web1 openlibrary-web-1 docker-ol-web1-web-1
ol-web2 openlibrary-web-2 docker-ol-web2-web-1
ol-www0 openlibrary-web_haproxy-1 docker-ol-www0--web_haproxy-1
ol-www0 openlibrary-web_nginx-1 docker-ol-www0--web_nginx-1

Open Library Production Architecture

Current Provisioning Setup

Our current production setup process (as of 2021) for provisioning these Docker containers is manual and relies on a lot of manually rsync-ing images around, as well as a separate repository called olsystem which contains the production configs, cron jobs, and infrastructure required to run the official openlibrary.org service.

Our Docker containers are more-or-less provisioned identically:

  • The compose*.yaml files at openlibrary's root directory contains the Docker configuration data for each container
  • These containers may mount external volumes such as olsystem, petabox, and 1 to access config and shared data.
  • /opt/openlibrary contains the business logic for the Open Library project:
/opt/
/opt/petabox
/opt/openlibrary
/opt/openlibrary/venv  -- python virtualenv
/opt/openlibrary/maxmind-geoip/  -- .dat file for anonymizing IPs
/opt/openlibrary/deploys  -- history of all deploys, hash-binned by service
/opt/openlibrary/deploys/openlibrary  -- history of openlibrary deploys
/opt/openlibrary/deploys/olsystem  -- history of openlibrary deploys
/opt/openlibrary/deploys/base  -- deprecated??
/opt/openlibrary/deploys/openlibrary/openlibrary  -- active openlibrary deploy
/opt/openlibrary/deploys/openlibrary/openlibrary  -- active olsystem deploy
/opt/openlibrary/olsystem/  -- symlink to active olsystem: /opt/deploys/openlibrary/olsystem
/opt/openlibrary/openlibrary -- symlink to active openlibrary: /opt/deploys/openlibrary/olsystem

At a minimum, re-provisioning a container requires:

  • setting up firewall rules and installing core packages (e.g. git, docker) by running an ansible playbook
  • rsync-ing over the legacy VM's /opt directory (preferably as an external mountable /1 volume which can be moved in the future)
  • Setting up olsystem so that its files within /opt/openlibrary/olsystem/etc symlink to the right locations within /etc

Replace a memcache server (Nov. 2021)

When provisioning a new memcached server, remember to update/edit /etc/memcached.conf to increase the -m setting to use whatever resources are available on the machine (less perhaps 300mb for OS level memory needs -- e.g. perhaps if the machine has 9.7GB, use -m 9300 or a bit higher). By default memcached only uses 64mb.

Performing Upgrades

This is in response to https://github.com/internetarchive/openlibrary/issues/7667. When performing upgrades on production servers, follow a similar process to https://github.com/internetarchive/openlibrary/wiki/Disaster-Recovery-&-Immediate-Response#responding-to-a-outage re:

  • create a new tracking issue on GitHub to keep notes and track progress.
  • notifying the #openlibrary channel and ensuring other staff are available prior to performing a deploy (try to get a 👍 from other staff to confirm)
  • provide evidence in our announcement that a previous test succeeded (if possible) on a non-critical-path machine (dev1, web1, web2, solr1, solr0, ol-covers, ol-home0, ol-www0] -- in this order)
  • In advance of the upgrade, suggest a path for reversion (e.g. have a strategy to roll-back if possible)

Order of upgrades -- Wait for each one fully reboot and ensure that it works as expected (take good notes!):

Welcome to the Open Library Handbook! Here you will learn how to...

Developer Guides


Project Management

  1. Directory of Projects
  2. Important Documents by Year
  3. Responsibilities Matrix

Other Portals


Legacy
Orphaned Editions Planning

Clone this wiki locally