*Official* – Most Manufacturer’s RAID Technical Support Is Rubbish

I’ve been supporting RAID and servers for longer than I care to remember… I think the earliest server (excluding ICL mainframes) that I worked on was an ICL Intel 486 system, back in something like 1992.

This was before Windows and as we know, things have moved on significantly since then. Servers were complicated beasts back then, now they’re even more complex. One thing that really annoys me is 1st/2nd line technical support staff who rather than admit they don’t know what they’re talking about will recommend the wrong course of action because they don’t know any different.

Here’s my response to an email from a customer of ours with a 7 disk RAID 5 server that has significant bad sector problems across several of the disk in the volume. The customer has been told by his tech support to rebuild the array and everything will work fine… WRONG, the rebuild will fail because of the bad sectors across multiple drives. This will cause a huge amount of irreversible data loss for the customer who is a professional video editor.

Hi <X>,

My colleague <Y> has just informed me you’ve been in touch after speaking to tech support regarding your RAID.

The rebuild procedure they suggest will unfortunately not complete successfully due to several of your hard drives having bad sectors issues. Rebuilding is an automated software task that relies on all the drives involved being free from bad sectors. Rebuilding is unable to cope with bad sectors – which are a physical problem. This is why drives with bad sectors have to be recovered using hardware rather than software. I wrote a blog post about this sometime ago, titled something like ’5 things you mustn’t do if your RAID fails’. – take a look at all of it – especially the last point: http://www.dataclinic.co.uk/raid-or-server-failure-the-top-5-things-to-avoid/

As you know, we’ve been doing this long enough to know what we’re talking about, so may I suggest two possible courses of action –

1. We complete the recovery as planned.

Or

2. We first clone all your hard drives (effectively copying them) before returning them to you so you can then try the rebuild. Us having cloned your hard drives means we can go back to the data and perform the recovery when the rebuild doesn’t work.

Rebuilds that are unsuccessful result in massive data loss across the entire RAID and are irreversible due to the old (good) data being overwritten by the new (corrupt) data. It’s one of the largest causes of data loss on any type of RAID 5 system, and we wish tech support companies would stop recommending it as they are assuming all the hard drives in the array are free from bad sectors (which is the reason your RAID fell over in the first place).

Please let us know how you’d like to proceed.

I’ve just emailed this to the customer and await their response.

Tags: , , , , , ,