Well the answer is pretty simple really; it’s one that’s been tested. It’s not the first time someone has come to me asking to restore a system claiming to have backups that I have been unable to help.
In this case they nearly did everything right, an online business with a very attractive and popular site in a niche market, who’d started retailing nearly a decade back on EBay, had expanded quite a bit needing as well as its own website, connectivity to a Sage accounts package in their office, Sage managed (past tense) everything for them, stock control, fulfillment, invoicing, supplier orders, credit control, contact management etc. so much so, the website became really only a front end.
The owners did take the initiative to implement a backup strategy; firstly they pay the websites hosting company to back up the website each night, which is what they understood it as, I did check this and in fact it’s a lot more, they backup the whole hosted virtual server instance, configuration and all. This comes with a suitable SLA that transfers the risk to the Hosting company. But really all the valuable business data is held on Sage.
Now here’s where the problem starts, although Sage has its own built-in backup application, which basically is a one click, choose a destination and click save type solution. Scheduling regular backups has never been simple enough for accounts staff, and as they knew best practice said it needed to go to either another media or even better off site, they opted to sign up for one of the many services where you install some software on the Server or Workstation and you can just forget about it.
So for the last couple of years, every night the server has been backing up to this off site service, they have even recovered a couple of word files and spread sheets that got corrupted or overwritten over that time, which has been very comforting for them. It’s so easy, browse the backup server file list, and drag and drop the file to be recovered. That’s exactly what they said to me “surely it’s that easy to just drag and drop like we have always done”. I guess no-one noticed when the first hard drive died some time back, as they told me that 2 of the 3 drives in the raid 5 died at the same time last 2 weeks ago, I don’t know if that’s even possible, definitely highly unlikely, but it’s safe to say the drives are beyond recovery. So they spoke to HP who rapidly sent them 2 replacement drives to rebuild their raid. They then spoke to the offsite backup service who’d told them they would need to rebuild the operating system, and re-install the applications preferably to its original state. It sounds like this was quite a shock to the company who thought they had a one click solution; they didn’t even have the original installation media handy, or know the server license numbers. It took them a couple of days to gather the prerequisites and re-provision the server, remaking user accounts with the help of a local “IT guy”.
This is where I came in, I have worked with the “IT Guy” in the past, and he suspected the worst but either wasn’t so confident in his own understanding or better, was unsure of how to explain the dire situation, so asked me to speak to the client. I hope one day the client can revisit this page to try to help him understand what went wrong and why I was unable to help him, he dismissed my negativity and last I heard is looking elsewhere “for someone that knows their job and what they are talking about”.
The reason all the data files bar one were recovered is quite simple, that one is the single file which is the entire database; the Sage database could never be backed up without specialist intervention either using the built in application or some other “open file” solution. The offsite backup solution clearly warns you that open files may not back up correctly. I haven’t seen the historical logs, but I am sure you got regular warnings similar to “one or more files were not successfully backed up” which you ignored, after all what’s one file amongst the 20,000+ that are saved? Your whole business revolved around the availability of that one file.
The solution would have been simple, if Sage had been scheduled to automatically make a local backup of the database locally, that backup would have been copied off site every night. Testing recovery of the backed up file is very simple in Sage as you can open up the archived copy side by side with the live instance, this should have been done occasionally and you would have noticed the massive hole in your contingency plan. I’m sorry I just cannot see a solution to this disaster, just a lot of “if only…”.