Reference — pi2s3

FPM Monitor

PHP-FPM saturation monitor

When all PHP-FPM workers are exhausted, WordPress serves 504 errors — and WP-Cron goes silent at the worst moment. This monitor runs as a host cron (not inside Docker), so it fires regardless of PHP-FPM state.

host cron · every minute

Check 1: HTTP probe — curl to FPM_PROBE_URL (5xx or timeout?)

Check 2: DB queries — >15 s queries from wordpress user?

Check 3: Orphaned backup lock — pi2s3-lock SLEEP in PROCESSLIST?

Saturated for N checks → ntfy alert fires

FPM_AUTO_RESTART=true → docker restart pi_wordpress automatically

Orphaned lock → killed immediately + ntfy (backup must not be running)

Auto-restart

Set FPM_AUTO_RESTART=true to automatically restart the WordPress container when saturation is detected. An ntfy notification confirms the restart. A cooldown (FPM_RESTART_COOLDOWN, default 20 min) prevents restart loops. Configure both in the Devtools plugin (Debug AI → PHP-FPM Monitor) or directly in config.env.

Orphaned lock detection

After a backup, the DB lock process is killed from inside the container so no orphaned SLEEP(86400) connections survive. The monitor also detects and kills any that slip through, with a 30-min alert cooldown.

Plugin integration

The CloudScale Cyber & Devtools plugin (Debug AI tab → PHP-FPM Monitor) shows last saturation event, reason, and a pre-filled config.env snippet. Set FPM_CALLBACK_URL and FPM_CALLBACK_TOKEN to enable.

# config.env
FPM_SATURATION_THRESHOLD=3       # checks before alerting
FPM_WP_CONTAINER=pi_wordpress
FPM_DB_CONTAINER=pi_mariadb
FPM_AUTO_RESTART=true            # restart container automatically
FPM_RESTART_COOLDOWN=1200        # seconds between restarts (20 min)
FPM_ALERT_COOLDOWN=1800          # seconds between repeat alerts

Install: crontab -e → add: * * * * * /home/pi/pi2s3/fpm-saturation-monitor.sh 2>/dev/null

Security & Reliability

Backups you can trust

Five layers that catch failures before they become disasters.

Client-side encryption

Each partition image is encrypted with GPG AES-256 before it leaves your Pi. Even full S3 bucket access is useless without the passphrase, which lives only in config.env, never in S3.

# config.env
BACKUP_ENCRYPTION_PASSPHRASE="my-strong-passphrase"

Requires: sudo apt install gpg. Restore script detects encryption automatically.

Bandwidth throttle

Limit upload speed so nightly backups don't saturate your home or office connection. Uses pv in the upload pipeline. No AWS config changes needed.

# config.env
AWS_TRANSFER_RATE_LIMIT="2m"   # 2 MB/s

Requires: sudo apt install pv. Use 500k, 2m, or 1g format.

Auto-verify after backup

After every upload, the script re-lists S3 to confirm every uploaded file is non-zero bytes. Silent upload failures are caught immediately and reported via push notification.

# config.env (default: true)
BACKUP_AUTO_VERIFY=true

Verification result is included in the ntfy success notification.

Preflight health checks

Before stopping Docker, the script checks: all containers are healthy, free disk space exceeds the threshold, and no recent I/O errors in dmesg. Warns (or aborts) before touching your running stack.

# config.env
PREFLIGHT_MIN_FREE_MB=500
PREFLIGHT_ABORT_ON_WARN=false

Stale backup alert

A separate daily cron job checks S3 for a recent backup and sends a push alert if none is found within STALE_BACKUP_HOURS. Catches silent cron failures, including the Pi being offline.

# config.env (default: true)
STALE_CHECK_ENABLED=true
STALE_BACKUP_HOURS=25

Crash-safe Docker restart

An on_exit trap guarantees containers are restarted even if the backup script crashes mid-imaging. A separate post-backup cron job (30 min after backup) runs a second check to confirm containers came back up.

# config.env (default: true)
POST_BACKUP_CHECK_ENABLED=true

Pre/post backup hooks

Native databases get the zero-downtime path too — DB_CONTAINER="auto" detects a native mariadbd/mysqld/postgres process, not just Docker. Use these hooks for other services (nginx, php-fpm) or databases pi2s3 can't quiesce. The crash trap also runs POST_BACKUP_CMD on failure, so services always come back up.

# native WordPress: zero-downtime DB, web tier briefly stopped
DB_ROOT_PASSWORD="your-root-pw"   # native MySQL/MariaDB lock
PRE_BACKUP_CMD="systemctl stop nginx php8.2-fpm"
POST_BACKUP_CMD="systemctl start php8.2-fpm nginx"

Zero-downtime DB quiesce

For MariaDB/MySQL, pi2s3 issues SET GLOBAL read_only=ON instead of stopping Docker — gentler than a global read lock and held only while sync + drop_caches flush InnoDB dirty pages (typically under 10 seconds), then cleared. A replica that is already read-only is left untouched. For PostgreSQL it issues a CHECKPOINT and never blocks writes; WAL crash-recovery makes the image consistent. Works for Docker and native installs alike. A background probe pings your site every 60 seconds to confirm it stays up.

# config.env
DB_CONTAINER="auto"      # container or native
DB_ENGINE="auto"         # mysql | mariadb | postgres
DB_PG_USER="postgres"   # PostgreSQL CHECKPOINT user

Falls back to STOP_DOCKER automatically if no supported DB is found or quiesce fails.

Phase	Trigger	Action
Phase 1 (0–20 min)	First failure detected	Restart cloudflared + start stopped containers
Phase 2 (20–40 min)	Still down after 4 attempts	Full `docker compose down/up` + cloudflared restart
Phase 3 (40+ min)	Still down after 8 attempts	Reboot Pi (rate-limited: max once per 6 hours)

Troubleshooting

Common errors

The three errors that account for most first run failures.

pigz not found

The installer tries to install pigz automatically. If it fails, install manually:

sudo apt install pigz

The backup falls back to single-threaded gzip if pigz is absent. It will still work, just slower.

AWS credential failure

The preflight check will print: Cannot reach s3://…. Verify credentials on the Pi:

aws s3 ls s3://your-bucket --region af-south-1

Confirm ~/.aws/credentials exists, or that AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set in the environment. Check the IAM policy includes s3:ListBucket on the bucket ARN (not just the /* path).

partclone device mismatch

Error: partclone: device size mismatch or filesystem not found. Check the detected device:

lsblk   # confirm nvme0n1 or mmcblk0

Set BOOT_DEV=/dev/mmcblk0 explicitly in config.env if auto detection picks the wrong device.

Undervoltage / restore crash

Pi 5 requires a 27W (5.1V / 5A) USB-C power supply. A marginal PSU or long USB-C cable causes the restore to crash 30–90 seconds in. The restore script warns at startup if voltage is low:

vcgencmd get_throttled   # 0x0 = clean

Use the official Pi 5 PSU (SC1159) or any USB-C PD charger that negotiates 5V/5A. A shorter, thicker USB-C cable helps if the PSU is marginal.

Restore diagnostic script

Something went wrong? One command tells you exactly what broke and exactly how to fix it. Every failure prints a colour-coded verdict and a copy-paste fix command. A summary at the end lists every issue — no scrolling required.

sudo bash extras/diagnose-restore.sh

Saves full output to /var/log/pi2s3-diagnose-TIMESTAMP.log. Share that file when reporting an issue on GitHub.

What it checks

Voltage & CPU throttle (live + historical)
CPU temperature & rail voltages
NVMe SMART health & EEPROM boot order
WiFi SSID — connected vs config.env
Saved password special-char encoding
Corporate proxy & outbound firewall

Packet loss to gateway, 8.8.8.8, 1.1.1.1
AWS S3 + STS reachability & credentials
cmdline.txt PARTUUID cross-check + rootdelay
S3 manifest JSON (catches dd-vs-partclone bug)
cloud-init status, sentinel file, runcmd output
Cloudflare Tunnel — service, config, journal

[FAIL] blocks restore — fix immediately [WARN] may cause issues [OK] healthy

Pi won't boot? Solid red LED?

If the Pi shows a solid red LED (no green ACT blink) after a restore, the SD card's cmdline.txt likely has a missing or wrong PARTUUID. Run this on your Mac with the SD card inserted:

bash extras/recover-sd-boot.sh

Auto-detects the SD card at /Volumes/bootfs, shows the broken cmdline.txt, restores from the automatic backup if present, and prints step-by-step recovery instructions.

Example output — healthy Pi (all OK)

pi2s3 restore diagnostic
Generated: Mon Apr 28 08:00:00 SAST 2026
Host:      andrewninja-pi-5
Uptime:    up 2 hours, 14 minutes

────────────────────────────────────────────────────────────────
  1. POWER & VOLTAGE
────────────────────────────────────────────────────────────────
  get_throttled = 0x0
  [OK]    No under-voltage
  [OK]    CPU not throttled
  [OK]    Temperature OK
  [OK]    No historical under-voltage
    core         volt=0.8688V
    temp=48.3'C

────────────────────────────────────────────────────────────────
  3. WIFI & NETWORK
────────────────────────────────────────────────────────────────
  [OK]    Connected to 'Baker' — matches config.env
  [OK]    'Baker' (SSID=Baker) — password OK (len=12)

────────────────────────────────────────────────────────────────
  4. INTERNET & AWS REACHABILITY
────────────────────────────────────────────────────────────────
  [OK]    Gateway (192.168.0.1) — 0% packet loss
  [OK]    Internet (8.8.8.8) — 0% packet loss
  [OK]    AWS S3 (af-south-1) (s3.af-south-1.amazonaws.com:443)
  [OK]    s3://mybucket/ accessible (profile=personal)

────────────────────────────────────────────────────────────────
  8. BOOT CONFIG
────────────────────────────────────────────────────────────────
  [OK]    root=PARTUUID=21b5924d-0d80-4332-a32c-3b24a6bde370 → /dev/nvme0n1p2
  [OK]    rootdelay present in cmdline.txt

────────────────────────────────────────────────────────────────
  12. CLOUDFLARE TUNNEL
────────────────────────────────────────────────────────────────
  [OK]    cloudflared 2026.3.0
  [OK]    cloudflared.service  active=active  enabled=enabled
  [OK]    Credentials file present: /etc/cloudflared/84c2c36c-....json

────────────────────────────────────────────────────────────────
  SUMMARY
────────────────────────────────────────────────────────────────
  Ran in 38s.  Report: /var/log/pi2s3-diagnose-20260428_080000.log

  All checks passed — no issues found.

Example output — under-voltage + wrong SSID (2 issues)

────────────────────────────────────────────────────────────────
  1. POWER & VOLTAGE
────────────────────────────────────────────────────────────────
  get_throttled = 0x50005
  [FAIL]  UNDER-VOLTAGE RIGHT NOW — restore WILL crash
         fix: Use official Pi 5 PSU (5.1V / 5A / 27W, SC1159). Short/thin cable also drops voltage.
  [WARN]  Under-voltage event(s) since boot — PSU is marginal
         fix: Upgrade to official Pi 5 PSU or 27W+ USB-C PD charger

────────────────────────────────────────────────────────────────
  3. WIFI & NETWORK
────────────────────────────────────────────────────────────────
  [FAIL]  Connected to 'C_WIFI' but config.env WIFI_SSID='Baker'
    Visible networks:
      Baker                            signal=72
      C_WIFI                           signal=68
         fix: Update WIFI_SSID in config.env to 'C_WIFI', or connect to the right network:
         fix: sudo nmcli dev wifi connect "Baker" password "<password>"

────────────────────────────────────────────────────────────────
  SUMMARY
────────────────────────────────────────────────────────────────
  Ran in 41s.  Report: /var/log/pi2s3-diagnose-20260428_081200.log

  2 issue(s) found:

  ✗ UNDER-VOLTAGE RIGHT NOW — restore WILL crash
  △ Under-voltage event(s) since boot — PSU is marginal
  ✗ Connected to 'C_WIFI' but config.env WIFI_SSID='Baker'

Configuration, monitoring & troubleshooting

Cloudflare tunnel watchdog

Enable

Push notifications

Rate limited reboots