Tuesday, 10 May 2011

The Network Server has a bad Night.

Ever had one of those days where things seemed like they were just sailing along.. your mental todo list full with little things like 'fix why the media pc is confused over which channels are high def', and 'think about adding more storage to the home server' .. and then you wake up to discover a little red house in your systray ?

If you know what I'm talking about, you'll have an inkling of where this is going. If you're wondering what a little red house is doing in my systray, then we should probably start there...




The little house in the systray is the Windows Home Server status monitor, that gets put in the systray of all machines the server is looking after. If the server is off, or asleep, it's a grey house. If the server is happy, and there's no need to think about it, the house is green. If the server needs a cuddle, the house will be yellow. Lastly, if the server has accidentally amputated 3 of its fingers in a drunken all night data storage orgy, the status will be red.

Yes. Red. The same color as my status. My Windows Home Server was screaming help in the only color it knew how.

So, I bring up the console panel, it's a funky remote desktop doodad application from the server. My Server has a frequent habit of getting blindingly drunk, and thinking it amputated half its fingers, when usually it's just forgotten that it ever had any. In tech terms, very infrequently 5 of the drives behind a SATA expander go awol, this happens maybe once every couple of months, and a reboot fixes it. I goto the drive status panel, sure enough there's a block of disks showing up as 'red', the chance of the server actually having lost all that block simultaneously is tiny, unthinkable.. so I tell it to reboot.

While it reboots, I start thinking, did the server fail 5 drives, or was it 6... did I feel lucky, well, did I ? And slowly the conclusion dawned in my head, that it may have been 6. 6 would be Bad, 6 is one more finger than the server usually drunkenly hallucinates amputating.

The server comes up.. status is green! yay.. status goes red! utoh.

I bring up the console, the server is clearly struggling, 5 disks offline already, and 1 showing as 'unhealthy', after the amount of vodka the server drank, this isnt too unsurprising, I ask the server for a diagnosis of the unhealthy disk.. the disk promply falls into a coma, and updates its status to offline.

So I power down the server.. pull every disk from it's hotswap bay.. power up the server, and start re inserting the disks one by one.. first 4 disks, no problem.. disks 5 & 6 spin up, but refuse to add to the pool, no problem, I pulled them back out, and continued on.. 7,8,9,10,11, all fine, 12.. no go.

3 disk failure? really? If this were running Raid I'd be a very very unhappy bunny. (Think watership down, rather than playboy). Thankfully, this is running Windows Home Server, will full duplication turned on.. with 3.5tb now 'missing' I know I'm going to lose some data, but I don't know which ;-) I do know the majority will survive..

So now, Im working through the 'drive removal' process, letting the server know that it's lost those fingers forever, and all the piano pieces it used to play using them, now needed re figuring out using its remaining fingers. The server understands, it takes it over 8 hours for each disk removal. You can almost hear it practicing and repracticing those tricky sections using its remaining fingers.

Once the process is complete, I'll go grab some more disks (probably have to be WD20EARS with a jumper on them to disable advanced format), and feed it back the space that it lost.

The other good news, is the 'failed' drives still spin up, although clearly they are taking a fair while to do so.. and Windows Home Server stores its data on each disk, as NTFS, the disks are mountable from another pc, where I'll be able to access & send much of any actually 'lost' data back to the server.

With duplication enabled for every share, I'll only lose files where the file & its duplicate were stored within the set of disks I'm removing. With 3.5tb total removed, that's a fairly large chunk of possible loss, but relatively small against the remaining 15tb or so.

No comments:

Post a Comment