Skip to content
This repository was archived by the owner on Jan 18, 2024. It is now read-only.
This repository was archived by the owner on Jan 18, 2024. It is now read-only.

pgbackrest info missing stanza path when BOOTSTRAP_FROM_BACKUP=1 #628

@lazzarello

Description

@lazzarello

patroni fails to bootstrap from backup in restore_or_initdb method. This script cannot execute pgbackrest info when the environment variable BOOTSTRAP_FROM_BACKUP=1

I would like to bootstrap a new deployment from S3 backups, which are functioning correctly. It appears that pgbackrest needs postgres to be running to create a stanza, which is required to bootstrap postgres from a backup. Postgres can't start yet because it doesn't have the backup from which to start, which means the pgbackrest stanza cannot be created.

Feels like a race condition.

output from interactive container session

patroni

postgres@postgres-bootstrap-restore-0:~$ patroni /etc/timescaledb/patroni.yaml
2023-11-03 23:00:08,358 WARNING: Retry got exception: 'connection problems'
/var/run/postgresql:5432 - no response
2023-11-03 23:00:08,364 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
Sourcing /home/postgres/.pod_environment
2023-11-03 23:00:08 - restore_or_initdb - Attempting restore from backup
2023-11-03 23:00:08 - restore_or_initdb - Listing available backup information
WARN: environment contains invalid option 'backup-enabled'
WARN: configuration file contains invalid option 'repo1Path'
stanza: poddb
    status: error (missing stanza path)
WARN: environment contains invalid option 'backup-enabled'
WARN: configuration file contains invalid option 'repo1Path'
2023-11-03 23:00:08.390 P00   INFO: restore command begin 2.44: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=410-54be1d5a --link-all --log-level-console=detail --pg1-path=/var/lib/postgresql/data --process-max=4 --repo1-cipher-type=none --repo1-path=/default/postgres-timescale --spool-path=/var/run/postgresql --stanza=poddb
WARN: repo1: [FileMissingError] unable to load info file '/default/postgres-timescale/backup/poddb/backup.info' or '/default/postgres-timescale/backup/poddb/backup.info.copy':
      FileMissingError: unable to open missing file '/default/postgres-timescale/backup/poddb/backup.info' for read
      FileMissingError: unable to open missing file '/default/postgres-timescale/backup/poddb/backup.info.copy' for read
      HINT: backup.info cannot be opened and is required to perform a backup.
      HINT: has a stanza-create been performed?
ERROR: [075]: no backup set found to restore
2023-11-03 23:00:08.390 P00   INFO: restore command end: aborted with exception [075]
2023-11-03 23:00:08 - restore_or_initdb - Bootstrap from backup failed
2023-11-03 23:00:08,721 WARNING: Retry got exception: 'connection problems'
/var/run/postgresql:5432 - no response
2023-11-03 23:00:08,727 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
Traceback (most recent call last):
  File "/usr/bin/patroni", line 33, in <module>
    sys.exit(load_entry_point('patroni==2.1.4', 'console_scripts', 'patroni')())
  File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 144, in main
    return patroni_main()
  File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 136, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 108, in abstract_main
    controller.run()
  File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 106, in run
    super(Patroni, self).run()
  File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 65, in run
    self._run_cycle()
  File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 109, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1710, in run_cycle
    info = self._run_cycle()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1548, in _run_cycle
    return self.post_bootstrap()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1440, in post_bootstrap
    self.cancel_initialization()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1433, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'

pgbackrest

postgres@postgres-bootstrap-restore-0:~$ pgbackrest info
WARN: environment contains invalid option 'backup-enabled'
WARN: configuration file contains invalid option 'repo1Path'
stanza: poddb
    status: error (missing stanza path)
postgres@postgres-bootstrap-restore-0:~$ pgbackrest --stanza=poddb stanza-create --log-level-stderr=info || exit 1
WARN: environment contains invalid option 'backup-enabled'
WARN: configuration file contains invalid option 'repo1Path'
INFO: stanza-create command begin 2.44: --config=/etc/pgbackrest/pgbackrest.conf --exec-id=465-374b9859 --log-level-stderr=info --pg1-path=/var/lib/postgresql/data --pg1-port=5432 --pg1-socket-path=/var/run/postgresql --repo1-cipher-type=none --repo1-path=/default/postgres-timescale --stanza=poddb
WARN: unable to check pg1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
ERROR: [056]: unable to find primary cluster - cannot proceed
       HINT: are all available clusters in recovery?
INFO: stanza-create command end: aborted with exception [056]
exit
command terminated with exit code 1

Reproduction instructions

helm upgrade --install --namespace troubleshooting -f values.yaml postgres-timescale-restore .

values.yaml

image:
  repository: timescale/timescaledb-ha
  tag: pg14.6-ts2.9.3-patroni-dcs-failsafe-p0
  pullPolicy: IfNotPresent
persistentVolumes:
  data:
    size: 10Gi
    storageClass: ebs-sc
  wal:
    size: 10Gi
    storageClass: ebs-sc
nodeSelector:
  eks.amazonaws.com/nodegroup: timescaledb-20231023223903624400000001
patroni:
  bootstrap:
    dcs:
      postgresql:
        parameters:
          max_worker_processes: 64  # Must be > max_background_workers + max_worker_processes (default 8)
          max_parallel_workers: 32
          timescaledb.max_background_workers: 32
secrets:
  pgbackrest:
    PGBACKREST_REPO1_S3_REGION: "us-gov-west-1"
    PGBACKREST_REPO1_S3_KEY: "value"
    PGBACKREST_REPO1_S3_KEY_SECRET: "value"
    PGBACKREST_REPO1_S3_BUCKET: "timescaledb-wal-backups-dev"
    PGBACKREST_REPO1_S3_ENDPOINT: "s3.us-gov-west-1.amazonaws.com"
bootstrapFromBackup:
  enabled: True
  repo1-path: /default/postgres-timescale
backup:
  enabled: false
  pgBackRest:
    repo1-path: /default/postgres-timescale

Environment

Chart is a fork of 0.33.1 with this emptyDir PR merged

  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.17-eks-be83b3c", GitCommit:"d6adb24671245f68ce3cd985f6b68f124953968d", GitTreeState:"clean", BuildDate:"2023-09-27T17:22:23Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/amd64"}

Cluster type is EKS in AWS GovCloud created from Terraform. I'm working on a fix, so may have a PR ready in the coming week.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions