I recently had a replica that was restarted but was having trouble getting caught up enough to enable streaming replication. The WAL archive restore & recovery was taking just a little too long. The pgBackRest team offered these very helpful tips and I wanted to share them with all of you.
These three pgBackRest parameters play a key role in speeding up WAL archive restore & recovery times:
During recovery, PostgreSQL will, via the restore_command
parameter, ask pgBackRest to retrieve the next WAL file and place it in the pg_wal
directory. This is the process we want to speed up.
archive-async
Enabling archive-async
tells pgBackRest to pre-fetch WAL files so they will be locally available when PostgreSQL asks for them (which it does serially, only when the previous WAL is recovered). This is a huge win when WAL archives are stored remotely, especially in the cloud like in S3 or Azure. The WAL archives are pre-fetched and stored in the location specified by spool-path
, and it will store up to the amount (in bytes) specified by archive-get-queue-max
). Let’s discuss these now.
archive-async=y
archive-get-queue-max
By default, archive-get-queue-max
is set to 128MB
, which is pretty small. Since a WAL file is usually 16MB, this means that only 8 WAL files could be pre-fetched and stored on disk here. I would suggest going to 1GB
right away and maybe higher (e.g. 4GB
) depending on the volume of activity involved.
Keep in mind that this space will be used in the spool-path
, so make sure you have enough free space on that volume. Speaking of which …
archive-get-queue-max=4GB
spool-path
Last but not least, we come to spool-path
. As mentioned earlier, this is where pgBackRest will stored the pre-fetched WAL files. Then, when PostgreSQL asks for the next WAL, pgBackRest moves the file from the spool-path
location to the pg_wal
directory under the PostgreSQL data directory. This is much better than waiting for pgBackRest to fetch the file from the cloud every time you ask for it, but we can make things a little better still.
spool-path=/var/lib/pg_wal/spool
Location, location, location
When pgBackRest moves a file from one directory to another on the same volume, that is a quick mv
operation. Basically a pointer change. However if the source and target locations are on different volumes, then we have to copy the entire file to the new location. As you may be guessing, the key here is to ensure that spool-path
is on the same volume as the PostgreSQL pg_wal
directory. Doing this can shave quite a few milliseconds off of the restore time (in my tests, times regularly went from ~24ms to 2ms, a 12x reduction).
It’s important to note that if your pg_wal
is symlinked to a separate volume from your data directory, then spool-path
should be on that same separate volume. You want to be where the actual WAL files will be going, not necessarily the data directory location.
Check Your Permissions
Normally, pgBackRest will create spool-path
if needed (if it can as postgres). However, if spool-path
cannot be created (or otherwise written to), then you’ll get archive-get
failures on recovery:
2021-05-23 21:56:52.095 UTC [21656] LOG: starting archive recovery
2021-05-23 21:56:52.102 P00 INFO: archive-get command begin 2.33: [000000010000000000000002, pg_wal/RECOVERYXLOG] --archive-async --exec-id=21660-fdd3647d --log-level-console=info --pg1-path=/var/lib/postgresql/12/main --process-max=2 --repo1-azure-account=<redacted> --repo1-azure-container=pgbackrest --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/spool --stanza=postgres-foo
ERROR: [047]: unable to create path '/var/lib/pg_wal/spool': [13] Permission denied
2021-05-23 21:56:52.103 P00 INFO: archive-get command end: aborted with exception [047]
One thing to watch out for: when you try to set up a spool-path
on a running instance, if you can’t write to the path then you’ll get an error on archive-push
but the reason may not be immediately clear because pgBackRest (as of 2.33) does not write the Permission denied
message in the PostgreSQL log:
ERROR: [082]: unable to push WAL file '00000002000185E70000004B' to the archive asynchronously after 60 second(s)
2021-05-19 15:46:19.372 P00 INFO: archive-push command end: aborted with exception [082]
2021-05-19 15:46:19.374 UTC [22000] LOG: archive command failed with exit code 82
2021-05-19 15:46:19.374 UTC [22000] DETAIL: The failed archive command was: /usr/bin/pgbackrest --stanza=postgres-foo archive-push pg_wal/000000020
00185E70000004B
2021-05-19 15:46:20.386 P00 INFO: archive-push command begin 2.32: [pg_wal/00000002000185E70000004B] --archive-async --compress-type=lz4 --exec-id=22071-481bde16 --log-level-console=info --log-level-file=info --pg1-path=/var/lib/postgresql/12/main --process-max=3 --repo1-azure-account=<redacted> --repo1-azure-container=pgbackrest --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/spool --stanza=postgres-foo
It does, however, write it in the pgBackRest archive-push-async log (usually found under /var/log/pgbackrest):
-------------------PROCESS START-------------------
2021-05-19 15:46:20.399 P00 INFO: archive-push:async command begin 2.32: [/var/lib/postgresql/12/main/pg_wal] --archive-async --compress-type=lz4 --
exec-id=22071-481bde16 --log-level-console=off --log-level-file=info --log-level-stderr=off --pg1-path=/var/lib/postgresql/12/main --process-max=3 --r
epo1-azure-account=<redacted> --repo1-azure-container=pgbackrest --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-2
56-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/spool --stanza=postgres1-test
2021-05-19 15:46:20.399 P00 ERROR: [047]: unable to create path '/var/lib/pg_wal/spool': [13] Permission denied
2021-05-19 15:46:20.399 P00 INFO: archive-push:async command end: aborted with exception [047]
So be sure to check that the postgres
user (or whatever user pgbackrest
runs as, but typically it’s postgres
) has privileges to write to (and create, if necessary) the location specified by spool-path
.