I had a bit of a scare this week. My setup is HAOS running on Proxmox. I have a Sonoff USB Zigbee gateway. (also a coral for Frigate, and a USB SSD attached to Proxmox)
Friday night, the server stops for no reason. I dig it out from the cupboard and I can hear the fan short cycling. I disconnect everything and take it to a screen so I can see what's happening - it boots fine, WTH?
Must be a USB thing. Add them back one by one and when I connect the gateway back problem is back. Now I get worried. Switch USB port and remap to HAOS and boom! back up and running. Panic over, cold house (radiators are zigbee) and angry wife and children avoided.
All of which has lead me to consider that my HA set up is really 'Mission Critical' and I need some recovery strategies beyond a daily backup. I think the gateway can be swapped but I'm not sure if the key to the zigbee mesh is hardware encoded or software.
This is the question - What are your recovery strategies? Do they include hardware or just software? I'm thinking maybe I need a second dongle and a couple of low powered machines in the Proxmox cluster. I won't be able to get my homely back up immediately, but if I can get HA running again on a different node with a backup dongle I'd be OK.
Matter comms are pretty light, and use an old 2.4G router as it's backbone with a VLAN bridge. Haven't noticed any problems. Most of my stuff is Zigbee as well.