Recovering Data From A Failed Council SAN Server

As a contractor in the computer support industry I come into a lot of contact with servers and RAID arrays. In fact, my main job is looking after the data held on SAN servers and other form of Network Attached Storage. I work for companies and government institutions as a sort of freelance computer troubleshooter and mostly use IBM, Dell and HP server equipment. The Dell servers are typically Dell Poweredge series and the HP kit is mainly Proliant. Again the equipment is hooked up to a SAN data network.

Data redundancy is a big problem of mine, it’s what happens when I inherit old legacy systems that really should have been decommissioned years ago but because of budgetary constraints have continued to be used. I work on several HP Proliant and Dell SAN servers that I’d love to switch off and migrate the data onto something far more up to date like a Dell Blade or IBM X Server system. Unfortunately, I don’t really have any say in buying new equipment.

Older servers and computer equipment fails more regularly, it just does. It wears out, hard drive fail, memory goes bad and UPS’s fail. What greeted me when I came into work last Monday was a failed SAN server array – 12 disks running in a RAID 5 configuration with a hot spare. Analysis of the server logs showed that one of the hard drives had dropped out of the array on Saturday causing the hot spare to click in. This had seemingly worked fine – the hot spare should simply be ‘rebuilt’ back into the array, but instead the whole array had fallen over.

SAN data recoveryIn the server room the SAN’s RAID BIOS reported that three of the hard drives had now dropped out from the array. Well, that would explain why the SAN server was no longer booting the array. What had caused the three drives to fail was at this point a mystery. The server in question was one that ran part of the council payroll so it was obviously important to get the SAN back up and running as soon as possible, but obviously this had to be done in a method that followed best practice. It became my task and no data could be lost in recovering the SAN either.

Now I’m good a IT and SAN server support I’ll admit but when I discover 2 of the 3 drives that had dropped from the array had mechanical faults, the problem was beyond my abilities. I used a data recovery company a few years back but they were no more. Searching online pointed me to a specialist SAN recovery company called RAID and Server Data Recovery, an online review or two told me they could be trusted and that they were recommended, so I called them.

I spoke to RAID and Server Data Recovery’s specialist SAN recovery team who confirmed what I thought already. Some of the drives had mechanical damage and would need clean room attention in order to progress the data recovery attempt. I got clearance for the costs from finance and loaded the SAN server into the car and drove it down to the recovery company.

Analysis showed 1 drive had a head crash while the other two had firmware issues. Firmware is code that runs the hard drive’s operating system. It can corrupt and when it does the hard drive fails. It seemed that this firmware problem was the cause of the SAN crashing and all that needed fixing was the firmware on the two failed drives. This was indeed the case and after the repairs to the hard disks were completed and the drives re-integrated back into the SAN RAID BIOS, the SAN came back online and the data was accessible again. Panic over. The data was fully restored which was the outcome everyone had wanted.

Tags: , , , , , , , , , , ,