PagePlop 1998 Web Hosting Server Status

1998 Reports

24 December 1998

It's update time...

We have now been without power for 23 hours and Carolina Power and Light does not anticipate bringing us back on line for at least two more days. The ice is pretty, though, so there is an upside to all this. In the meantime, the generator is working fine and the only thing that is not working are the 220 volt stuff like the dryer and the hot water heater; and though Val is getting somewhat rank, I still love her.

We had a bit of a problem just as the power died yesterday at 4:20 PM eastern time -- it took them about one hour to deliver the generator instead of 20 minutes as they were supposed to. Unfortunately, the outage coincided not only with icy road, but also with rush hour. As a result, we lost power for about 25 minutes after the UPS units drained. (Note to self -- get more UPS units so I can hold for three hours instead of 90 minutes.)

Now the problem I have at this moment is how to count that downtime against our server status numbers. We had scheduled a planned outage of 20 minutes last night anyway and since we were able to take care of what we had planned to do during the outage, do I count it as the planned outage or do I count it against our percentage downtime figures? Given that I am dog tired from working my butt off over the past 24 hours, I think I'll give myself the benefit of the doubt and not count it.

Merry Christmas.

23 December 1998

We are taking down the entire system at midnight tonight in order to upgrade the automated restart measures. In the event of a major power failure, we will better able to remain on-line for in indefinite period of time. The outage will last approximately twenty minutes.

1 December 1998

9:30 AM

OK...let's see just how good everyone is. I'm currently looking out my door at a fountain of water exploding from the street. A major water main has broken. The backhoes are ready to go and the folks are marking out the location of the buried telco lines. The digging will begin shortly. Let's hope that the backhoe does not take a liking to the telco lines.

In the meantime, I have BellSouth on the way out to look at the situation. They are prepared to string aerial lines through the trees if necessary to maintain the circuits.

Keep your fingers crossed...

Film at 11.

1:10 PM

Fine...so it's not 11 yet, but I though you'd like an update.The hole is dug in the street and a big hole it is. A very cool hole at that. They are about two feet from the telco cables and the guy running the backhoe is really good. The pipe is mended; now they have to fix the sewer line that was cracked when the underlying ground got wet and it sagged. Surprisingly, it didn't even smell, though you know what they say...I don't think I'll even go there... More to come...

3:15 PM

Well, what was presumed to be a busted sewer pipe turned out to be a major sewer line with a nick in it. When they got it uncovered, the pipe broke. Normally, that would not have been a significant deal, but that was no ordinary sewer pipe. Apparently, it was the main sewer feed for an entire section of Raleigh and we are near the bottom of the gravity well that keeps the juice moving downhill. It was kinda like the Fourth of July, except wetter and it didn't smell as good.

Anyway, the hole is getting very big and they will be bringing in the portable lights with the night crew shortly. This is going to be a long evening of fun and merriment. More later...

11:00 PM

It's 11 o'clock and there is no film. But there is still a log of sewage in the neighborhood. We've tried to get them to move out...no wait...that's another story entirely...

The pipes are fixed and the hole is being filled back in. Of course, it will be sometime in March before the city actually gets around to re-paving the road, so we will have to dodge orange and white barrels until then. Small price to pay, though. They managed to get through an entire day with a backhoe mere inches from the T1 lines without cutting any of them. I am happy. On the other hand, everyone in the neighborhood is probably going to come down with cholera, dysentery, and hepatitis shortly, and dogs will be rooting around for months to come.

I also learned a lot about how municipal sewer and water systems worked so today wasn't a total loss.

20 November 1998

I hate it when that happens. The graphic counter on one of the web servers crashed and took down the server with it. Fortunately, the auto-rebooting system kicked in just fine and the total outage lasted a bit under three minutes. I really need to fix that thing even though it happens only about every five or six months.

8 November 1998

There are major problems on the internet backbone today. A huge peering point router in Santa Clara, California went south about 12:30 PM EST this afternoon and the problems have cascaded throughout the entire system. What has resulted is that just about everyone in the world, though transmitting data internally without problems, are unable to talk to each other. In other words, our system is fine and your system is fine, but my system is not recognizing the existence of your system; kinda like when we track mud into the house and our wives give us the silent treatment for our stupidity even while we are cleaning up the problems. Unfortunately, I can not just buy the internet some flowers and take it out to dinner to make good.

24 September 1998

Hadn't written anything in a while since I haven't needed to so I thought I'd just mention a few things. We now offer NT/Front Page/ASP service. See our account pricing page for more details, but don't dig too far into the web site since we are in the process of completely updating it. We should be done in a bout a week (though we have been saying that for a month now.)

We are going to add a new switch this weekend which will result in about a 60 second network outage unless Chad drops his end of the ethernet cable then we could be out for weeks. He promises to be careful and to pick up any packets that happen to fall on the floor.

The LAN has now been completely upgraded with all new and faster machines. There is nothing left to do so I am going to sleep for a while. Do not call me before noon tomorrow.

7 September 1998

We have put our new back-up policy and plans into effect over this past weekend. We have looked at many different methods of back-up strategy for over a year now and managed to eliminate what are the most popular solutions -- tape drive and RAID -- as rather problematic.

Though tape is efficient and low maintenance, it is very difficult and time consuming to restore a failed hard drive from a tape system. But even beyond that aspect, tape manages to not only back up the good files, but also the corrupted ones. Tape is also magnetic and capable of losing data from any number of problems. The software that runs tape drives is also not very good regardless of what the manufactures of that software claim in their advertisements. Remember, we process over three million electronic transactions per day across nineteen different computers; that's a bit different than what an individual or small business would experience in normal use. And the more robust tape systems run in excess of $100,000.

We have also experimented with hard drive RAID systems and they are fraught with their own problems. For instance, the two drives in a RAID system are mirrors of each other. That means that if the primary drive fails, the secondary drive contains a mirror image of what was on the primary and is able to take over immediately. But RAIDs suffer from the same problems as tape drives; if the file is corrupted on the primary, it is also corrupted on the RAID. The only thing RAID really gives you is an immediate hot swap of drives. Presumably, though, if the primary is failing, then it will be taking significant errors in the data written to it and pass those errors over to the secondary. And every reasonably priced software RAID solution we have tested has a huge problem with stability. Given that RAID software failure occurs about every 10 days and that the average hard drive will last for at least three years before failure, it is clear that RAID is not exactly the best solution. Full hardware RAID solutions that would serve our needs start at $60,000. Better than tape, but still no banana.

So in order to provide a back-up solution that will keep our hosting prices under two billion dollars per month per account, we have decided to go with a multiple hard drive and an automated back-up system as a solution that eliminates many of the problems. This is how it works:

Each machine, whether it be a web server, email server, listserv server, DNS, or whatever, has at least two external hard drives attached to it. One of the externals is the primary drive; that is where the system software, the applications, and the client files reside. It is the live drive; (internal drives do nothing but act as storage for "stuff".) The other drive in the SCSI chain is the back-up drive. It is at least twice as large as the primary drive. On early Sunday morning, early Wednesday morning, and early Friday morning, we take the entire contents of the primary drives and copy them unaltered to the respective secondary drives. Then every Sunday, we take the latest contents of the secondary drives and compress the contents. Those files are then moved to a separate file server. So this is what happens:

Sunday: Primary to back-up, with the latest day's contents of secondary stuffed and placed in the archive.
Wednesday: Primary to back-up, placing a full and updated copy on the secondary drive while stuffing the Wednesday copy and placing it on the archive. Sunday's live copy gets erased.
Friday: Primary to back-up, erasing the unstuffed copy made to the secondary on Wednesday. and stuffing the Friday copy and placing it on the archive.
Sunday: Primary to back-up, erasing the full copy made on Friday. The new contents are then stuffed and placed in the archive. We will keep four weeks worth of stuffed files in the archive on-site for each drive backed-up. On the fifth week, we erase the oldest weekly archived files.

On the last Monday of each month, the stuffed contents of the archive are placed on a CD-ROM that is brought to the safe deposit box at our bank. As time goes by, we will have complete back-ups of all files going back to today.

Now, here's what this gains us. For starters, we are using only simple programs that are tried and fully tested for stability and reliability (such as Stuffit.) Second, we always have a full back-up on an independent, in-house hard drive which includes a fully working system folder. All we have to do is open a control panel, change the start-up drive, and restart the computer. Third, in the event of a catastrophe, we have at least four full weeks of files located off-site. Fourth, in the event that we lose a drive, we can restore files quickly from uncompressed back-ups rather than tape. Finally, the whole system is in place when we finish the new FTP server software so that we can fully RAIC the machines (but that is another story for another time.)

The downside is that in the event of a drive failure, it will take about three minutes to restore the specific service and those files can be up to 48 hours old. Both those problems disappear when we go to full RAIC (hopefully in January, but don't hold me to that.)

The one thing that should be apparent from the above discussion is that you, the client, is ultimately responsible for maintaining complete back-ups of all files -- including email. With 4 gig external drives currently costing less than $300 and high-capacity floppy media like Zip drives around $100, there is no reason to ever lose files again.

But since we already have brought a Travan tape system, we are going to put that on the email system only so that every email sent to a POP account gets archived as well; there will be no latency on that service as soon as we hook it up (probably this coming weekend.) In all cases of recovery or restoration needed which is not a result of a problem on our side (i.e., your hard drive exploded and you didn't keep your own back-ups,) we will charge $75 per hour with a one hour minimum for that service.

And as an aside to all that, we are pricing with contractors for the installation of our own natural gas-fired generator. That way, in the event of a major power outage lasting longer than about four hours, we do not have to have one hauled in. We will be conducting a random drawing to determine which one of our clients will be paying for it. :) :) :)

1 September 1998

What a week. We are almost done with the complete reconfiguration of the entire LAN. We would have been done sometime last week had it not been for hurricane emergency mode, but it is going well so far. All web servers are now on G3 machines and they are screaming fast. Given the quickness of the servers, the gross overcapacity of the internal LAN and the telco, and the absolute stability of BellSouth, any slowness you experience can now be squarely blamed on either the Internet congestion as a whole or your dial-up connection flaking out.

hehehe

Anyway, we have also put email on an upgraded machine -- well, all services have been upgraded except the secure server. That little puppy has proved to be a bit problematic to move, so we are going to rest a few days (or maybe a week or so) and try it again. During the whole process, no service was down for more than about two minutes with the exception of FTP which we had to shut down completely for about five hours to ensure a clean copy of all software for backup purposes; fortunately, we do not have to do that when we back up only your files.

I'm going to sleep now...

27 August 1998 -- 11:00 am EDT

The Sun just came out for a bit and it is clearing up. Bonnie has just been downgraded to a tropical storm and continues to move away from us at 6 mph. It looks like it is over as least as far as we are concerned. This will be my last update on Hurricane Bonnie. Now, let me go tend to those 90 emails that have backed up one me over the past two or three days.

27 August 1998 -- 9:40 am EDT

The generators are still sitting in and on trucks at the State Fairgrounds and I am still torqued about the whole thing. But at least we are out of danger. Bonnie is still sitting on the coast and has not made any significant movement in the past six hours. Rainfall totals a mere 60 miles from here are already upwards of 12 to 15 inches with at least 8 inches more expected today.

In Raleigh, the wind has died down to a steady 20 mph with gusts to 35. The rain has tapered off considerably after the storm dumped a bit under three inches on us. Conditions here are expected to improve throughout the day as Bonnie makes a slow turn to the northeast and pulls farther away from us. On the other hand, the dew point is expected to be in the mid-70s today with a temperature of 95. Combine that with unstable air and we may have severe thunderstorms with tornadoes later this afternoon. We see that all the time, though, and it doesn't phase us at all, but if I have to watch that Lowes commercial on the Weather Channel one more time, I think I'm gonna kill someone.

Then there is the issue of Danielle, but that Hurricane is still out in the Atlantic and at least four days from landfall. I need to move to Iowa...

27 August 1998 -- 3:00 am EDT

Sorry for no updates lately, but it has been rather hectic here. If you have been following the progress of Hurricane Bonnie today, you would have noted that there is no progress. The storm has stalled just north of Wilmington, North Carolina and is sitting there spinning as a strong category two storm. While it is doing so, the coast of North Carolina is beginning to wash into the ocean. We have had a series of rain bands all day long with winds up to about 45 mph. Later today the forecast is for continued winds and very heavy rain (some areas of the coast and inland coastal counties have already had 10 inches of rain with another 10 to 20 inches predicted before ending Friday morning.)

So far, the power has held which is very good since we do not have a generator. And I am not amused. Let me explain. When there is a potential need for a generator, we simply make arrangements to rent one for $375 per week. That is a lot less expensive than paying $16,000 or so for one of our own (which, by the way, we would not have had to crank up a single time since Hurricane Fran in September 1996.) Well, we again made arrangements to have the generator delivered on Wednesday afternoon -- then the politicians stepped in.

This being an election year in North Carolina, every two-bit piece of political slime decided that they were going to suck as much pork for their individual counties as possible. So they had the State literally commandeer generators from all over North Carolina to use in the aftermath of Bonnie. Those generators are presently sitting on trucks at the North Carolina State Fairgrounds unable to be moved eastward until at least Friday morning when the storm abates. Of course, that means that there are no generators in the entire state to be had because the State, in active cooperation with the morons from FEMA, have them all in their possession.

The situation reminds me somewhat of when Keith Richards of the Rolling Stones comes to town and all the drugs within 300 miles of the concert site disappear.

The upside is that Carolina Power and Light does not anticipate the situation to get bad enough in our area that power outages would result. The downside is that if the power does go out, we have enough back-up reserve to last about three hours. On the other hand, as soon as things open in the morning, I am going to start scrounging again for a generator. By the way, the generator we do have (5000 watts) has a burned stator that can not be fixed for about two weeks. No matter, though, since that thing is not designed to last us more than about 10 hours anyway; it is designed to carry us through till the generator arrives -- do I have to go into that situation again?

26 August 1998 - 1:50 pm EDT

Everything is fully backed up and we're waiting on the delivery of the generator. The hard drives are at the bank, tucked away safe and sound, and the sump pump is checked. Actually, the sump pump has nothing to do with our ability to serve files, but if it goes, Val goes ballistic and starts throwing corn puffs at me. It is not a very pretty sight.

Hurricane Bonnie is now located just off the North Carolina coastline with winds still howling at 115 plus mph. The storm is now predicted to turn slightly to the NNE which will move it away from Raleigh, though we are still expected to get winds in excess of 50 mph and very heavy rain. There is another official National Weather Service report due out at 4 pm. When it arrives, I will post it.

25 August 1998 -- 11:45 pm EDT

Well, this is the latest. By the way, I have decided to start back-ups now rather than in the morning since Hurricane Bonnie may pick up forward speed tonight, leaving me without time to back-up and get the drives to the safe deposit box. Until the full back-up is done, which should take about four hours, the server will be a tad slow to respond. You may get latency times of three to five seconds for a download to begin. I have also turned off FTP services for the duration of the archiving so that there is no possibility of file corruption.

WTNT32 KNHC 260246 TCPAT2 ADVISORY

HURRICANE BONNIE ADVISORY NUMBER 28

NATIONAL WEATHER SERVICE MIAMI FL 11 PM EDT TUE AUG 25 1998

...RAINBANDS OF HURRICANE BONNIE BEGINNING TO SPEAD ACROSS PORTIONS OF THE COAST AND OUTER BANKS...

A HURRICANE WARNING IS IN EFFECT FROM CAPE ROMAIN SOUTH CAROLINA TO CHINCOTEAGUE VIRGINIA...INCLUDING PAMLICO AND ALBEMARLE SOUNDS...AND CHESAPEAKE BAY FROM SMITH POINT SOUTHWARD.

A HURRICANE WATCH IS IN EFFECT FROM SAVANNAH GEORGIA TO CAPE ROMAIN SOUTH CAROLINA AND FROM CHINCOTEAGUE VIRGINIA TO CAPE HENLOPEN DELAWARE.

PREPARATIONS TO PROTECT LIFE AND PROPERTY SHOULD BE RUSHED TO COMPLETION IN THE HURRICANE WARNING AREA. INTERESTS IN THE WATCH AND WARNING AREAS SHOULD FOLLOW RECOMMENDATIONS FROM THEIR LOCAL EMERGENCY MANAGEMENT OFFICIALS.

AT 11 PM EDT...0300Z...THE CENTER OF HURRICANE BONNIE WAS LOCATED NEAR LATITUDE 31.6 NORTH...LONGITUDE 76.8 WEST OR ABOUT 215 MILES SOUTH OF CAPE LOOKOUT NORTH CAROLONA.

BONNIE IS MOVING TOWARD THE NORTH-NORTHWEST NEAR 14 MPH...AND A GRADUAL TURN TOWARD THE NORTH IS EXPECTED BY MORNING. THE FORECAST TRACK BRINGS THE CENTER TO NEAR THE OUTER BANKS OF NORTH CAROLINA AROUND MIDDAY WEDNESDAY. TROPICAL STORM FORCE WINDS ARE LIKELY TO ARRIVE AT THE COAST OF SOUTH CAROLINA AND NORTH CAROLINA WITHIN THE NEXT FEW HOURS AND HURRICANE FORCE WINDS ARE LIKELY AROUND DAYBREAK ON WEDNESDAY.

MAXIMUM SUSTAINED WINDS ARE NEAR 115 MPH...WITH HIGHER GUSTS. SOME FLUCTUATIONS COULD OCCUR BUT BONNIE IS EXPECTED TO REMAIN A POWERFUL HURRICANE FOR THE NEXT 24 HOURS. BONNIE IS A LARGE HURRICANE. HURRICANE FORCE WINDS EXTEND OUTWARD UP TO 145 MILES FROM THE CENTER...AND TROPICAL STORM FORCE WINDS EXTEND OUTWARD UP TO 230 MILES.

AN AIR FORCE HURRICANE HUNTER AIRCRAFT RECENTLY ESTIMATED A MINIMUM CENTRAL PRESSURE OF 965 MB...28.50 INCHES.

STORM SURGE FLOODING IS EXPECTED NEAR AND TO THE NORTH OF WHERE THE HURRICANE REACHES THE COAST...INCLUDING IN PAMLICO SOUND AND ALBEMARLE SOUND...WITH WATER LEVELS INCREASING UP TO 9 TO 11 FEET ABOVE NORMAL ASTRONOMICAL TIDAL LEVELS.

LARGE SWELLS ARE PROPAGATING WELL AHEAD OF THE HURRICANE AND ARE IMPACTING PORTIONS OF THE U.S. EAST COAST. SEE STATEMENTS FROM LOCAL NATIONAL WEATHER SERVICE OFFICES.

RAINFALL TOTALS OF 5 TO 10 INCHES ARE POSSIBLE IN ASSOCIATION WITH BONNIE.

ISOLATED TORNADOESE ARE POSSIBLE IN THE VICINITY OF THE NORTH CAROLINA OUTER BANKS WITHIN THE NEXT 12 TO 24 HOURS.

REPEATING THE 11 PM EDT POSITION...31.6 N... 76.8 W. MOVEMENT TOWARD...NORTH NORTHWEST NEAR 14 MPH. MAXIMUM SUSTAINED WINDS...115 MPH.

MINIMUM CENTRAL PRESSURE... 965 MB.

AN INTERMEDIATE ADVISORY WILL BE ISSUED BY THE NATIONAL HURRICANE CENTER AT 2 AM EDT FOLLOWED BY THE NEXT COMPLETE ADVISORY AT 5 AM EDT...WEDNESDAY.

(from National Weather Service at 23:06 ET)

25 August 1998

OK folks...this is the deal...

We are under a hurricane warning for Bonnie as of noon today. Hurricane Bonnie is a strong category 3 storm located (at this moment - 8 PM EDT) about 500 miles to our SSW (250 miles south of Cape Lookout, North Carolina) and moving to the NNW at 16 mph. We are 130 miles inland and they have the eye passing us to our east by about 50 miles. Of course, that's what they said about Hurricane Fran in 1996 when Raleigh took a direct hit from a category two storm. What a mess...anyway...

We are as ready as we can be. The big generator is on order for delivery tomorrow. The sump pumps are checked and working. The portable generators are tested and purring like kittens. I will be boarding up the east-facing windows tomorrow afternoon. There is propane in the gas grill and the freezer is full. So now we wait.

The current forecast for Raleigh calls for high winds in excess of 50 mph and heavy rain. And if that's all there is, we are just fine. But if that eye gets closer to us, we could see winds in excess of 100 to 120 mph and up to 20 inches of rain. That would be bad.

I am going to do a complete backup of all files around noon tomorrow on an external hard drive. That drive will be placed in our safe deposit box at our bank in the event of a complete catastrophe. That does not mean that you can sit back and relax; as always, we encourage you to keep complete back-ups of all your files at all times. If push comes to shove, we are prepared to move the entire operation another 200 miles inland. And if Armageddon strikes, we can move to Florida or Arizona (which are contingency plans if the building is literally destroyed.)

Then again, nothing may happen at all. We will see.

I'll post another advisory late tonight after the midnight update from the National Hurricane Center.

23 August 1998

We are planning a monstrous system upgrade in the next week. Every machine we use for all services will be upgraded to faster machines. We are also installing an NT network so that we can open up Front Page, ASP, and Cold Fusion services. All work will be done between midnight and 6 am starting on Tuesday night and continuing through at least Sunday. If we play our cards right, we will have not more than a two to five minute outage on any given service as we physically swap machines and drives. We'll keep you posted as we go along.

1 August 1998

At about 6:30 am, we took down each of the servers to install a new piece of software that accelerates web requests. We have been testing it for about three weeks now and it seems to be a piece of killer software. It's name it Nitro by Clearway Technologies for those who want to know.

2 July 1998 through 6 July 1998

Starting on the evening of Thursday, 2 July and continuing through the early morning of Monday, 6 July, we are planning a whole series of hardware and software upgrades and reconfigurations -- a completion of what we started last week. Throughout this time, the entire system or parts thereof may be bouncing up and down for up to one-half hour at a time, though we don't anticipate any single service to actually go down for more than a couple of minutes at a time. Some of this work will have to be performed outside our maintenance window, but we will keep it to a minimum.

When we are done, you perhaps will see no outward effect on any service, but man...it's gonna make our job a whole lot easier. We will also be loading the completed MGI database component to the server and will be putting that software through live testing over the next two weeks or so (as well as writing the tutorials for it.) When that is done, the entire MGI database-shopping basket complex will be completed. It's been a long time coming, but it will be worth the wait; you're not gonna believe what that thing can do. hehehe

23 June 1998

Today is not a good day. Come back tomorrow.

Actually, we are making some major changes in our LAN configuration, adding several machines, and working with BellSouth to do some routing changes. The whole process is estimated to take about 14 hours. Rather than starting during the night and interrupting the peak time tomorrow, we have waited until peak is over today and will be working throughout the night until sun-up. As a result, some services will be bouncing up and down this afternoon and evening as we physically move stuff. It also means that if anyone even thinks about calling me between 6 am and noon tomorrow, I will hunt them down and hurt them. Email will not be affected, nor will any listserv or DNS.

12 June 1998

One of our users tried to violate PagePlop anti-spam policy this afternoon. Not only did he try to use our email servers to spam, but he screwed it up. Rather than using his program to sort out and send thousands of emails, the list of email addresses got added as an attachment resulting in his trying to send out thousands of emails with a 5.5 meg attachment. Needless to say, the email servers were not very happy.

After about two hours, we finally got the whole thing sorted out and the servers calmed down. during that time, you may have experienced some difficulty in downloading email especially if the email you were trying to download was of any significant size which required a connection to remain open for more than a couple of minutes. No client email was lost as a result of the spam attempt.

The offending user has been removed from our system permanently. His web site is down. His email is gone. His DNS entry is history. And we are in the process of orphaning his domain name so it no longer even points to PagePlop.

5 June 1998

Sometimes you just have a bad day. We lost a drive today -- big time. Since it was the first one we had lost, we had the opportunity to put in to affect our recovery plans which up till now had been theoretical. Well, sometimes the best laid plans...you get the rest.

What should have taken about five minutes actually took 27. We kinda forgot about the time it takes to remount the drive and its files. Anyway, we have learned our lesson and will be looking into serious hardware RAID solutions which, by the way, run about $8,000 for the base models so if anyone wants to send us money at random for the "Help Buy PagePlop A Hardware RAID System" fund, we would greatly appreciate it.

29 May 1998

This is just too good...

The secure server crashed this morning. No big deal, though; you just reboot it and the process is so fast that you don't even get a timeout of incoming requests. Well, we ran into a little problem.

I hurt my ankle so Val had to go into the server room to reboot the machine. but Val's got a bit of a problem -- she has no concept of geography. Not that it takes a world explorer to find the server room, but all our machines are named for countries and the numerous hard drives are cities within those countries. Unfortunately for Val, only the city icons show up on the monitors and not the country names. And unfortunately for one of the mail servers and the listserv server, Val has no concept of what city is in which country.

A lot of machines got rebooted this morning, I tell you... :)

Labeling will begin shortly.

8 May 1998

We had a six minute outage of one of the web servers shortly after noon. Here's how it went...

Steve: Hey Mikey, the server crashed.

Mikey: Yeah, I know.

Steve: Hey Mikey, I rebooted the server, but it didn't come back up properly.

Mikey: Yeah, I know.

Steve: Hey Mikey. How about checking out the server?

Mikey: OK

Steve: See anything?

Mikey: Yeah.

Steve: What?

Mikey: I forgot to remove the old log files and the drive filled up.

Steve: Oh. That was fairly ignorant, though I should have caught it.

Mikey: Nah. It was my fault. I'm an idiot.

Steve: No Mikey. You're not an idiot. I'm an idiot.

Mikey: No Steve. You're not an idiot. I'm an idiot.

Steve: No. I'm the idiot.

Mikey: I'm the idiot.

Steve.: No, I'm...wait...let's blame it on Val.

Mikey: Great

Steve: Val's an idiot.

Mikey: Yes, Val's an idiot.

Steve: hehehe...Val is an idiot...Val is an idiot...

Mikey: Val is an...uh, Hi Val.

Val: What are you two discussing?

Steve and Mikey: Uh...nothing...

28 April 1998

One of the web servers hung this morning for about four minutes. There was no apparent cause and has been stable for the rest of the day.

8 April 1998

Sometime this morning, a customer of one of our clients sent him and email. That email went to our client's POP mailbox and also to an autoresponder that our client has set up to send out an email when someone writes him. Well, that autoresponder email was sent out. Unfortunately, our client has the return address of the autoresponder pointing to the same POP account that feeds the autoresponder.

Now, under normal circumstances, that would have caused no problems at all, but in this instance, the customer of our client had his own autoresponder set to respond to any incoming emails. And his autoresponder return address was also the address of his autoresponder.

Some 4800 emails later, the email server came crashing to the ground, screaming in pain. We got it all sorted out and things got back to normal in about 15 minutes. No mail was lost, but was somewhat afraid to enter the email server for a short time thereafter in fear that it would get yelled at.

5 April 1998

It doesn't get any cooler than that. Right after 1 AM, the power slammed off. And it wasn't one of those gradual things either. It was the type where you could actually hear the noise from the wires as the electricity stopped in its tracks. We were out for a bit over four minutes and every system kicked in properly. But you knew that. So why am I posting this? Because we finally tested out new toys in a real life situation.

We now have emergency lights. The whole neighborhood was dark and we were lit up like a Christmas tree on the White House lawn. The whole thing went off with almost no hitches...(remind me, though, that *every* room in the house needs them.)

31 March 1998

Hey, it happens. The primary server went south (actually got hung) this morning. We could have had it back up in about three minutes, but we kept it down for a total of 17 to do some diagnostics. Fortunately, we discovered the little conflict between two programs running on the computer and fixed it, so that extra 14 minutes was not in vain.

20 March 1998

That was interesting. Beginning at a bit before 6 PM eastern time, severe thunderstorms hit the Raleigh area, spawning large hail and tornadoes. We were relatively lucky. We had several tornadoes pass just to our east and to our west. The ones to the east missed us by about 1.5 miles and took out a church, a shopping center, a hospital parking deck, and a whole bunch of homes. The ones to our west were about 5 to 8 miles away and hit the airport and points north of there. Directly overhead we saw a number of cyclonic rotations in the clouds, but nothing ever dropped out of them over our location. We had severe weather lasting until about 9 PM or so.

As of about 9 PM, some 23,000 homes were without power. We had intermittent power outages throughout the evening, but nothing lasting more than a few seconds at a time. It is now pushing 11 PM and all has calmed down; in fact, the stars are out. I'm going to eat some crab legs now. Val is sleeping. Mikey's at a movie. The servers are throwing a party and the UPS units are already drunk after celebrating a hard nights' work well done.

17 March 1998

It appeared to have worked...bwhahahahahahaha...

We finally got finished about 8 am this morning, after about 16 hours of work with no food in freezing rain and with vicious gargoyles nipping at our toes...OK, perhaps no gargoyles, but the fact remains that the server appears to be fully stable once again.

We will keep a close eye on things for the next several days and take steps as necessary if need be.

13-16 March 1998

Not a very pretty three days. The primary web server has been bounding up and down for the past 90 hours or so. On the morning of the 13th, and after over three months of testing, we upgraded to the latest software version of the web server and something is not going right. In all, the server has crashed dozens of times for a total of 135 minutes of outage (including one outage lasting 87 minutes which was primarily for diagnostic purposes.) When the server goes down, in most cases the recovery time is less than 2 minutes or so and in many cases so fast that the end user does not even time out on the connection. Indeed, not counting that 87 minute diagnostic session, our percentage uptime over the 70 hours of this situation has still exceeded 99.99 percent; even counting the diagnostics we have maintained 99.97 percent uptime during this issue.

But not wanting to put a full, rosy face on the picture, I still hate that it is happening.

We are continuing to work on the situation in order to discover the exact cause of the problem and will keep you posted. In the meantime, just bear with us. If we can not get a satisfactory resolution to this situation by early Monday morning, we are going to revert back to the old configuration. If that should happen, we will be able to do a hot swap that will not affect proper serving of your pages.

3 March 1998

GridNet lost a "hissy" card this morning in one of its circuits around 10:45 am est. A new card was installed and connectivity was restored around 11:49 am est.

2 March 1998

Well folks, we've reach the point where monthly statistics have gotten completely out of hand. We've got so many clients who want statistics and the logs themselves have gotten so big that it is taking us about 35 hours to run all the reports using a fully dedicated machine for the task. So for the time being, your stats will be delivered on the 2nd instead of the 1st of the month until we figure out a better way of doing this. Hopefully, that will be done by the April statistics mailing on 1 (or maybe 2) May.

2 March 1998

The servers were down for a bit under three minutes this morning when Mikey, the Wonder-Geek decided to pull an "I wonder what this does" bone-head test. Fortunately, the momentary damage was limited to our own network and did not spread to the entire civilized world.

Mikey is presently being flogged.

21 February 1998

BellSouth performed routine maintenance this morning that resulted in an outage of approximately 22 minutes. Surgery was successful.

20 February 1998

OK...what's a hissy card? At least that's what BellSouth called the thing. I have no idea and I get the sneaking suspicion that no one at BellSouth knows either; in fact, I think they made it up. Nonetheless, that is what is being blamed on a 25 minute telco outage this evening. Perhaps it's time for a hissy fit.

20 February 1998

Talk about stupid. Last night we upgraded some software on the mail server. Now that server has two programs that actually deliver mail. The first one - and the target of the upgrade - is the one that accepts and sends email. The other one is designed to sort that mail internally and make sure it gets to the right mailbox on our system. I forgot to restart the second one.

At about 8:30 am this morning, I realized what had happened and launched that puppy. Thousands upon thousands of email that had been stacked up in the hold queue started getting delivered. About ten minutes later, the server heaved a sigh of relief and went to take a quick nap; delivering all that mail was exhausting work.

Fortunately, we have built-in, anti-stupid protection and the email system is set-up so that no mail gets lost. It just takes a vacation for a bit.

16 February 1998

Wow...long night. But step one is done. We spent a good seven hours moving physical machines and upgrading software, moving services to different machines, and upgrading telco stuff. For those who noticed, we bounced up and down all night long as we made the various moves. We are now in a very good LAN position to expand as intended with MGI.

Tomorrow night will see a few more changes, the most significant as far as clients are concerned is that anyone who is using Secure Services will get a different FTP login host name from their existing account on the regular web servers. In the event you are effected and should not read your email but every month or so, the host name for Secure Services will be changing to ftp.secure.domain.(com, net, or org...whatever is applicable to your account.) The login ID and passcodes will remain the same for both the secure and unsecure side. And just in case I need to mention it, the host name for the regular web servers will still be ftp.domain.(again...whatever...)

We're going to sleep now...

10 February 1998

One of the web servers decided to GO-NAC (Glitch Out - No Apparent Cause) today, taking about 400 clients down for some five minutes. Nothing major, just irritating. But it does give me a moment to explain something about how situations like that are handled here.

There is a single computer that checks every machine we have on-site and all our off-site locations. It also checks every single router between us and the Internet. That check to the entire system is performed every 30 seconds. If something does not respond, no matter what device or service it may be, this really nasty klaxon siren goes off.

Now that siren is sitting on my desk, but you can probably hear it in Ghana when it sounds. And if for some reason we are out of earshot, it is also hooked up to a paging system that sounds alarms on our pagers. So we go running. (It's also worthy to note that some of the critical machines are connected to autorebooters that will automatically re-start the machines when alarms start sounding.)

Once we reach the server, it takes about three minutes for it to come back up. But some people may notice that their FTP does not work for about another eight minutes. The reason for that is due to file sharing. The FTP server is not located on the same machines as the web servers. It takes about eight minutes to re-establish that connection after the server comes back on line.

7 February 1998

As if yesterday wasn't bad enough...

Some cretin cut a major fiber optic cable at MAE-WEST shortly after 9 PM eastern time this evening. That essentially crashed MAE-WEST. So much of that traffic started routing through MAE-EAST which promptly started crashing routers on the east coast. Things are a mess at this point with no indication when the problems will be corrected. Generally, it takes about four hours to splice a fiber line when it is cut then the whole system has to come back to stabilization which put us at about 1 or 2 AM eastern before things start coming back to normal. Time to eat since there is literally nothing that anyone can do at this point.

6 February 1998

MAE-EAST is taking an enormous amount of traffic this afternoon which is affecting any internet request that needs to pass through that point. Packet losses are hitting 100 percent with respect to many of the other major national providers such as MCI and Sprint. Until things calm down later this evening, there will be difficulty in reaching sites and sending email. Unfortunately, there is absolutely nothing we can do. Our system is up and stable and BellSouth is showing no problems on their system right up to MAE-EAST. We just need to ride it out.

4 February 1998

El Nino Strikes again! A BellSouth telco facility in RTP was depressed from the torrent of rain that has plagued North Carolina for two days straight and went quietly into that good night from loss of power at approximately 2:30 pm est taking all telco service in the greater Raleigh area with it. Upon noticing the attempted power outage, BellSouth rescue teams went straight to work and resuscitated the facility at 3 pm est. PagePlop would like to thank BellSouth rescue for their quick response. We wish the facility well.

2 February 1998

Someone mailed something to someone that our mail server did not like. We assigned it a file name beginning with the letter "A". That particular email then went to the top of the process bin, but the process bin did not know what to do with the email. So it sat there.

Other emails came in and piled up behind the file beginning with the letter "A" and sat there themselves since the email server was still trying to figure out what to do with the first email in the alphabetical list. And sat there they did.

About 400 emails later, someone noticed that they were not getting mail. We investigated. We fixed. And the floodgates opened. The poor email server looked down the pipe and saw all those emails flying at it. It started to cry for its mommy, but it handled the task admirably. We gave it a glass of Ovaltine and it is now resting.

1 February 1998

One of our back-up power supply units decided it was time to retire this morning. It packed up the wife and kids, loaded the van, and headed to the Bermuda Triangle where it is sharing a condo with some old electrical equipment that a family of shrimp salvaged from a downed airplane. Luckily, it put its old house up for sale first so we noticed the sign in the yard and were able to make an offer which was accepted. We bought the land, tore down the old USP unit, and erected our own (we always keep an extra one spare just in case we get the golden opportunity to take advantage of such a good USP real estate deal on such short notice.)

It took us about four minutes to build the new USP, but obviously it will take far longer to replant the trees and shrubbery; we are going for a tiered effect with bordering landscape timbers. APC is assisting us as well. Their emergency contact got back to our page within five minutes and a new USP unit will be here tomorrow to replace what used to be our back-up which is now our new home.

26 January 1998

The FTP server went south on us and caught me by surprise (I was helping put in a hot water heater sans pager and out of range of the alarm system.) It was down for about 20 minutes and took another 10 or so to stabilize after reboot. Sorry if it caused any problems. I will be tying the FTP machine into the autoreboot system with the week.

We are also getting reports of sites hosted on PagePlop being difficult to reach today. There are a lot of problems on the net today, none of which involve us or BellSouth. Mindspring, for instance, has been disconnected from the rest of the world for at least 12 hours. And some other large networks are showing between 10 and 25 percent packet loss. If you can't reach us, it is because your provider is having problems either with their own system or Internet provider. It's gonna be another long day.

23 January 1998

We had a short outage of the primary web server that lasted about eight minutes. A preference file became corrupted and needed to be rebuilt. The people responsible for maintaining that preference file have been flogged.

21 January 1998

BellSouth performed routine maintenance on their system between 2:03 AM and 2:39 AM EST. The result of that maintenance is faster and more reliable service. We are happy.

12 January 1998

Well, that was a tough one involving two simultaneous errors. The first obviously affected the web server. It went down at 8:30 AM EST. The problem was tentatively traced back to a corrupted .gif file that really hosed the server software. And if that wasn't bad enough, there is a bug in the software we are using to check the integrity of the system. That bug has been reported to the developer, and we are in the process of fixing the corruption problem on the server.

We are now back on-line as of 12:10 EST. The next question is: Why do things always seem to fail in pairs?

Return to the Server Status Menu