Server drama and blogus interruptus
Murphy does more than live. Murphy thrives.
I’d been experiencing some hardware problems with my server for a couple months. Many readers noted that a visit here was often greeted with some kind of database error. This was evidently caused when the server would suddenly reboot for no reason, intermittently at first, then as frequently as eight or 10 times per hour, then it would work fine for a week or two, then begin the whole cycle again. The reboots resulted in corrupt database tables which were easy enough to repair, once I knew the corruption had occurred.
So (I can hear you ask), why didn’t I just fix the hardware problem?
My server was at a colocation facility that was, essentially, abandoned by its owner. Sending about 40 emails had no effect. I couldn’t get to the server. In fact, when I started making inquiries with the property manager, I found out that the colo owner hadn’t paid his rent (he claims otherwise) and even he couldn’t get into the facility.
Finally, out of the blue, the colo owner called and told me he was on site and if I wanted to get my server out of his facility, I’d better do it now. I broke a few speed limits getting there and retrieved my server and one I manage for a group with which I’m involved in a volunteer capacity. It only took a day to find a new colo service, but my server wouldn’t boot up at all; it didn’t get past the BIOS before it started rebooting.
Of course, this was now the day before Thanksgiving, and my IT guy, Mike Vincenty, was off for a family gathering in Phoenix while my family was meeting up with Michele’s brother and his family in Las Vegas. By the time Mike was able to get back to the server, it was clear that something serious was amiss. I was on the road, but Mike took it into the shop where all my computer work is done, where it was determined that the box had overheated and a capacitor on the mainboard had blown. The solution: a new server featuring the hard drives from the dead box.
That took another couple days, then more time to install new drivers, rebuild the RAID array, and make a host of other adjustments. With everything finished, Mike let it burn in overnight last night, then installed it at the colo facility this morning. I was back up and running by about 2 p.m. PST today.
In the end, my blog and website were offline, as was my email, for nearly two weeks. Knowing emails were bouncing (and that I would never recover them) and that I had few means of getting the word out that my site and blog were offline resulted in a degree of anxiousness I’ve rarely experienced before. Some of my colleagues have suggested that I should scrap my own server and maintain my properties in the cloud (using a hosting service), but that has about as many downsides as owning your own box. Besides, with a brand new server, I should be good for a few years.
In any case, I’m back. I’ve missed you all! And I have a backlog of topics I’ve been waiting to blog. Of course, I’m traveling this week (three cities: Montreal, Chicago, and New Orleans), but I’ll find the time to get some posts up. But when the travel subsides, you can expect a healthy dose of posts to make up for the fortnight’s absence.
12/03/06 | 9 Comments | Server drama and blogus interruptus