Disaster Recovery
This page is designed to be printed
As much as we don't want it to happen, somethings things die. Sometimes power runs out.
What is a 'Disaster'
This documentation can be followed for the below scenarios
Power outage
Server hardware failure
Procedure
Physical Hardware Checklist
Check all infrastructure is powered on (look for power lights)
Refer to Physical Hardware section on the left
Remove faceplate from NTD and confirm powered on
Confirm networking equipment is powered on
Software Checklist
Network Test
Ping 8.8.8.8 to confirm internet is working
Ping google.com to confirm external DNS is working
Ping setup.ui.com to confirm internal DNS is working
Confirm Proxmox VE is accessible
Internal link loads login page (use Linux credentials)
All storage pools are online
VM's show and are booting (there is a delay between boots so some may be on, others off)
Confirm Proxmox Backup Server is accessible
Internal link loads login page (use Linux credentials)
All storage pools are online
Confirm the NAS is accessible (creds in vault)
Open Storage Manager and confirm 'system is healthy'
Log into UptimeKuma and confirm that all services are green. It may take 15 minutes for them all to report as online
Confirm servers are reporting data back to NetData and check for any alerts Alerts related to disk backlog, IO delay or disk usage can be ignored for now. Backlog and IO delay can be caused by multiple VM's starting up ay once
An excessive, but very thorough way, to check all services are online is to go through each page in this doco and trying to access any "link to app" links
Services failing to start
Unfortunately I'm unable to write specific doco here as there is to much to capture. Please refer to the troubleshooting section on the left panel and/or the hints below
Compare the down services against the Cloudflare tunnels - are they all on the 1 server?
Last updated