Incident 2022-08-26

From Echopedia

Hello everyone,

This is a brief description of what happened to cause the Echopedia to be down for the last few days and resulted in a lot of lost data. I am very sorry for the work from contributors that was lost from this event, and I have implemented measures to prevent this from happening again.

The Echopedia is hosted using Oracle Cloud's VPS service. On Aug. 26, Oracle (probably accidentally) shut down this instance as well as the boot volume, which can be terminated independently. This was done without warning or explanation after the fact, resulting in the loss of all data on that server along with the automated and manual backups that were associated with that instance through the web console.

It was reported to me relatively quickly that the Echopedia was down, so I was able to view the logs in the Oracle web console showing that some internal Oracle IP address shut the instance down earlier that day. It was nice that this happened quickly because any reference to that instance existing was removed later that day (which I think is standard practice).

Luckily, I had offsite backups on my home PC, so when I got home that day, I was going to restore those to a new instance to get the Echopedia back up again fairly quickly (albeit with some lost data that was added since the backup was created). But unfortunately my computer was unable to boot. My SSD holding the backup chose that exact day to fail.

I was able to recover some pages from archive.org, and parts of other pages using an RSS feed of more recent changes (using a lot of manual labor), but there are a lot of pages still missing or incomplete. The hundreds of player and team pages were lost, but they can be recreated from the source data, since they were autogenerated in the first place. The custom block names on the Arena Block Map were not lost, since the submissions were stored in a database on a different server, and the page itself was in a GitHub repo.

For the future, I have implemented automatic daily offsite backups, so any future incidents should be much more easily recoverable.

- NtsFranz