17 December 2004
2:56 PM - Well that went better than expected! The web server is back online.
17 December 2004
1:30 PM - We are currently working to replace a web server that has been giving us some problems the past couple of days. This will affect approximately 20% of web sites that we host. We currently expect the machine to be back up at approximately 5 PM.
20 September 2004
The mail server that was having problems is back online. We will now monitor it while it delivers all of the queued messages which should take a few hours. Again, no data or emails were lost during the fix.
20 September 2004
One of the email servers was having some problems this afternoon. Our original maintenance attempts seemed to have worked, but email just refused to deliver.
We are currently in the process of replacing a hard drive which may take an additional hour to complete.
No emails or data have been lost on the server. Any emails that are attempting to deliver while the server is down, will queue and redeliver as soon as the server is online again.
Thank you for your patience.
27 July 2004
One web server has been experiencing problems this morning. We performed some temporary maintenance that appearred to fix the problem, but alas it did not. We are now in the midst of doing some major maintenance on that one machine.
The outage is affecting approximately 125 customers. Only the web server is currently unavailable. All other services including email are not affected.
13 April 2004
One of our mail servers is completely hosed. I am working on the problem, but it will be many hours before it is fixed. No mail is getting lost and will eventually be delivered this afternoon.
13 April 2004
The worst of the email problems are over.
We were hit with a massive dictionary attack today beginning at about 10 am EST. A dictionary attack is when some moron combines tens of thousands of user names with a single domain name then hammers the email server to see which email addresses are valid. Once the valid ones are discovered, they take all those legitimate email addresses, put them on a CD, and sell them to spammers.
This attack has been nothing like I have ever seen, though.
The requests were coming in so fast that legitimate emails were not getting delivered. The email queue backed up almost six hours before we could track down the source of the attack and kill it. And it affected thousands of email servers worldwide.
Everything is back to normal, but the server is still running about three hours behind in terms of when mail actually arrived. It will take perhaps another hour to fully deliver all that mail.
Nothing was lost in the process, just slowed down badly.
8 February 2004
At least we are sure of what happened now. And you're not gonna believe this.
NT has a built-in software RAID program. The purpose of that program is to allow the user to mirror the boot drive so that in the event of a drive failure, the computer comes back up on the backup drive and nothing is lost. Unfortunately that scenario only works when it is the physical hardware that fails. If the RAID driver fails, then very bad things happen.
You see, Microsoft in their infinite wisdom has also put in some very strong security protection to their RAID system. Apparently the primary drive has a passcode system that communicates with the backup drive. If the primary drive fails, then the backup takes over. But if the software itself fails, then the backup drive becomes completely inaccessible because of the security protections.
The purpose of that system (according to someone who once worked for Microsoft) is to prevent someone from coming into a computer, installing another boot drive, and extracting the data from the backup drive. The ability to do so would allow anyone to circumvent the password protection of the computer itself by bringing in their own hard drive with a bootable system that they can log into. But it also does not allow for recovery by replacing the boot drive with one that is not corrupted and extracting the data from the backup.
I hate Microsoft.
8 February 2004
On 7 February 2004 at 5:36 pm EST, we suffered catastrophic failure of one of our web servers. Fortunately for most of our clients there were only seven accounts on that particular machine. Unfortunately for those seven accounts...well...we done gots ourselves a problem.
After working on it for almost 14 hours now, we have pretty much determined that the failure was caused by a corruption in the software RAID drive controller that also corrupted the registry. As such, none of the drives that contain data are accessible. Now, the operative word here is "registry."
Only Microsoft products have registries. And the machine that hosed is our only NT box; all the rest of the web servers are Macs and have been running without failure since 1996.
We are attempting to recover data and will be moving those seven sites to a Mac system as soon as we do. At that point, we will completely rid ourselves of all Microsoft-based hardware. It is simply not worth the problems that their ignorant programmers cause.
This will be the only update on the matter since we are able to work with each of the seven accounts individually.
Now, go out and get a Mac. :)
8 February 2004
OK. So I am going to add another word or two to this saga.
At this point, we can see the backup hard drives on the machine, but we can not yet access the data on them. So why is this so difficult?
Because we are dealing with Microsoft products. Let me explain.
With a Macintosh server, I can take a computer with absolutely nothing on it and install the operating system and server software, completely configure it, and put it into production within 30 to 45 minutes. Once in place, the Macintosh server will run for years without software or hardware failure, virus intrusion, or operating system corruption.
If the Macintosh primary hard drive catches fire and explodes, I can simply reboot the machine and am up and running on the backup drive. If push comes to shove, I can remove the backup drive, place it in another machine, and without a single bit of reconfiguration be up and running in ten minutes.
On an NT system none of the above will happen.
We have narrowed down the problem to a RAID software controller that glitched. Mind you that this RAID controller is what came with the Microsoft operating system. It is a Microsoft product, not some third-party hack that we installed. When the controller hosed, it took out the registry. (That is the rough equivalent of destroying the Finder in Macs language.) But with the NT registry gone, nothing on the computer is available.
We have gone out an procured another hard drive and installed it. What should have taken five minutes at most turned into a two hour ordeal. Then we reinstalled the operating system on that drive. Fifteen minutes for the Mac; over six hours for NT.
So we bring up the computer on the new operating system, plug in the old drives to recover information, and though we can see the drives, no information is yet available to be extracted. I could almost expect that with the RAIDed partitions, but there is one drive in there that contains a manual backup of all the files. It is completely independent of the boot drive. Yet it is still unreadable. I should be able to take that drive, plug it into another machine -- any machine -- and read its contents. But no...
Microsoft does not apparently feel that such is an appropriate method of running a computer. And what we are experiencing is not even the result of a virus. I can't even imaging what you folks have to deal with on an hourly basis as the result of viruses, trojan horses, and worms. We folks on the Mac side don't even know what those things are. I can open MyDoom of SoBig on my personal computer all day long and all I will get is a harmless file of garbage. You open it and your machine goes bye-bye. And why is that?
Because Microsoft does not care one bit about its users. They had the means of fixing their problems a decade ago and chose not to. It was more expedient for them to break anti-trust law at every turn in order to maximize market share without caring one bit about the end users or corporations whose livlihood depended on their product integrity.
Well, it is clear to me now from first-hand experience that the folks at Microsoft couldn't care less about their users. They have no integrity.
I thought I would never say this, but it does my heart good to see Microsoft products being constantly attacked by viruses. If enough of their garbage machines get wiped out, perhaps people will start realizing that they are spending far too much of their money on trash. Perhaps they will then switch to a real operating system running on quality hardware that works for them instead of against them.
Anyway, we still do not know if we will be able to recover any data off those drives, but this time we got lucky. It seems that all of the clients on that machine kept accurate and current backups of their web sites. In the future, though, they will not have to worry about this problem again. Once this crisis is over, I am going to take our sole NT box out into the front yard and smash it with a sledge hammer. Never again will I allow any Microsoft product to infest our company.
5 February 2004
Some complete idiot is hammering our mail servers with tens of thousands of spam mails advertising prescriptions to things that no one needs. Things have slowed down so badly that it will be hours before the queue is cleared out.
We have pretty much blocked the delivery of the spam, but the process of blocking it is slowing things down as well. Just be patient and everything will get back to normal as soon as I hunt down the spammer and kill him and his family.
Return to the Server Status Menu