Disaster Recovery

Server hardware failure

Press the display button on the UPS and confirm

It is powered on
It has power from mains

Check all infrastructure is powered on (look for power lights)

Refer to Physical Hardware section on the left

Remove faceplate from NTD and confirm powered on

Confirm networking equipment is powered on

Network Test

Ping 8.8.8.8 to confirm internet is working
Ping google.com to confirm external DNS is working
Ping setup.ui.com to confirm internal DNS is working

Confirm Proxmox VE is accessible

Internal link loads login page (use Linux credentials)
All storage pools are online
VM's show and are booting (there is a delay between boots so some may be on, others off)

Confirm Proxmox Backup Server is accessible

Internal link loads login page (use Linux credentials)
All storage pools are online

Confirm the NAS is accessible (creds in vault)

Open Storage Manager and confirm 'system is healthy'

Log into UptimeKuma and confirm that all services are green. It may take 15 minutes for them all to report as online

Confirm servers are reporting data back to NetData and check for any alerts Alerts related to disk backlog, IO delay or disk usage can be ignored for now. Backlog and IO delay can be caused by multiple VM's starting up ay once

Compare the down services against the Cloudflare tunnels - are they all on the 1 server?

Last updated 1 year ago

Was this helpful?

What is a 'Disaster'

Procedure

Physical Hardware Checklist

Software Checklist

Services failing to start