Posts Tagged ‘RAID’

*Official* – Most Manufacturer’s RAID Technical Support Is Rubbish

Wednesday, April 2nd, 2014

I’ve been supporting RAID and servers for longer than I care to remember… I think the earliest server (excluding ICL mainframes) that I worked on was an ICL Intel 486 system, back in something like 1992.

This was before Windows and as we know, things have moved on significantly since then. Servers were complicated beasts back then, now they’re even more complex. One thing that really annoys me is 1st/2nd line technical support staff who rather than admit they don’t know what they’re talking about will recommend the wrong course of action because they don’t know any different.

Here’s my response to an email from a customer of ours with a 7 disk RAID 5 server that has significant bad sector problems across several of the disk in the volume. The customer has been told by his tech support to rebuild the array and everything will work fine… WRONG, the rebuild will fail because of the bad sectors across multiple drives. This will cause a huge amount of irreversible data loss for the customer who is a professional video editor.

Hi <X>,

My colleague <Y> has just informed me you’ve been in touch after speaking to tech support regarding your RAID.

The rebuild procedure they suggest will unfortunately not complete successfully due to several of your hard drives having bad sectors issues. Rebuilding is an automated software task that relies on all the drives involved being free from bad sectors. Rebuilding is unable to cope with bad sectors – which are a physical problem. This is why drives with bad sectors have to be recovered using hardware rather than software. I wrote a blog post about this sometime ago, titled something like ‘5 things you mustn’t do if your RAID fails’. – take a look at all of it – especially the last point: http://www.dataclinic.co.uk/raid-or-server-failure-the-top-5-things-to-avoid/

As you know, we’ve been doing this long enough to know what we’re talking about, so may I suggest two possible courses of action –

1. We complete the recovery as planned.

Or

2. We first clone all your hard drives (effectively copying them) before returning them to you so you can then try the rebuild. Us having cloned your hard drives means we can go back to the data and perform the recovery when the rebuild doesn’t work.

Rebuilds that are unsuccessful result in massive data loss across the entire RAID and are irreversible due to the old (good) data being overwritten by the new (corrupt) data. It’s one of the largest causes of data loss on any type of RAID 5 system, and we wish tech support companies would stop recommending it as they are assuming all the hard drives in the array are free from bad sectors (which is the reason your RAID fell over in the first place).

Please let us know how you’d like to proceed.

I’ve just emailed this to the customer and await their response.

Our New NAS File Server

Thursday, January 30th, 2014

NAS with Datlabs linkI don’t know about you, but these days the family environment is a busy one when it comes to IT and computers. My eldest daughter has her own laptop and my youngest daughter has an IPad. They both have smartphones too. As well and texting and all the other things teenagers use their mobile phones for, they also take a lot of photographs that want to save.

Saving this data on the family iMac was fine – there was plenty os space and it was an easy thing to achieve but as time went by there was more and more data to store – more photos, more videos and now music too. Using the internet to search for an answer to my problem, my attention was drawn to Network Attached Storage, otherwise known as NAS. Basically these are devices that connect to the router in your house an allow anyone connected to that router to use them. Great I thought – I’ll get one of those !

So I did, everyone was happy. It came in a nice box with a link to Datlabs NAS Data Recovery Services, who I could call if I needed any technical assistance setting the NAS up and getting it working correctly. I just plugged it in, typed our password and it installed itself on our network. The first thing to do was to transfer all our photos, videos and music data from our family Mac onto the NAS. That was easy – a simple drag and drop operation saw that completed without any problems. There was a lot of data – some 50GB or so… How do teenagers make so much data?? All of which was of course, essential to them.

Anyway with that done I set about cleaning the Mac up and deleting files and folders. Another 30 minutes or so and this was completed. The first thug I noticed was the the machine began to run a lot quicker – which was a result I was very pleased with.

Next I took a look at the configuration of our new NAS device. It’s a 4 disk Linux based storage device that runs RAID 5. This means that the data it holds is spread across the whole 4 disks instead of one. That’s a bit odd I thought but after closer investigation I learned that this was infact a good thing. It allows one drive to fail and my data to still survive without being lost. RAID 5 also provide enhanced data read speeds too – something that was evident from the moment we began using our NAS file server concurrently. My daughters could watch their movies while I was able to stream music from it. This all worked fine – something that we could never do before on the Mac as it was just not quick enough.

So introducing a NAS RAID file server into our home environment has been a great success. Installing it was easy and I didn’t need to contact Datlabs for help in setting it up, I do think I’ll keep their link though just incase anything happens to the NAS that I can’t sort out myself.

Recovering Data From A Failed Council SAN Server

Monday, January 27th, 2014

As a contractor in the computer support industry I come into a lot of contact with servers and RAID arrays. In fact, my main job is looking after the data held on SAN servers and other form of Network Attached Storage. I work for companies and government institutions as a sort of freelance computer troubleshooter and mostly use IBM, Dell and HP server equipment. The Dell servers are typically Dell Poweredge series and the HP kit is mainly Proliant. Again the equipment is hooked up to a SAN data network.

Data redundancy is a big problem of mine, it’s what happens when I inherit old legacy systems that really should have been decommissioned years ago but because of budgetary constraints have continued to be used. I work on several HP Proliant and Dell SAN servers that I’d love to switch off and migrate the data onto something far more up to date like a Dell Blade or IBM X Server system. Unfortunately, I don’t really have any say in buying new equipment.

Older servers and computer equipment fails more regularly, it just does. It wears out, hard drive fail, memory goes bad and UPS’s fail. What greeted me when I came into work last Monday was a failed SAN server array – 12 disks running in a RAID 5 configuration with a hot spare. Analysis of the server logs showed that one of the hard drives had dropped out of the array on Saturday causing the hot spare to click in. This had seemingly worked fine – the hot spare should simply be ‘rebuilt’ back into the array, but instead the whole array had fallen over.

SAN data recoveryIn the server room the SAN’s RAID BIOS reported that three of the hard drives had now dropped out from the array. Well, that would explain why the SAN server was no longer booting the array. What had caused the three drives to fail was at this point a mystery. The server in question was one that ran part of the council payroll so it was obviously important to get the SAN back up and running as soon as possible, but obviously this had to be done in a method that followed best practice. It became my task and no data could be lost in recovering the SAN either.

Now I’m good a IT and SAN server support I’ll admit but when I discover 2 of the 3 drives that had dropped from the array had mechanical faults, the problem was beyond my abilities. I used a data recovery company a few years back but they were no more. Searching online pointed me to a specialist SAN recovery company called RAID and Server Data Recovery, an online review or two told me they could be trusted and that they were recommended, so I called them.

I spoke to RAID and Server Data Recovery’s specialist SAN recovery team who confirmed what I thought already. Some of the drives had mechanical damage and would need clean room attention in order to progress the data recovery attempt. I got clearance for the costs from finance and loaded the SAN server into the car and drove it down to the recovery company.

Analysis showed 1 drive had a head crash while the other two had firmware issues. Firmware is code that runs the hard drive’s operating system. It can corrupt and when it does the hard drive fails. It seemed that this firmware problem was the cause of the SAN crashing and all that needed fixing was the firmware on the two failed drives. This was indeed the case and after the repairs to the hard disks were completed and the drives re-integrated back into the SAN RAID BIOS, the SAN came back online and the data was accessible again. Panic over. The data was fully restored which was the outcome everyone had wanted.