1998 Reports
24 December 1998
It's update time...
We have now been without power for 23 hours and Carolina Power and Light
does not anticipate bringing us back on line for at least two more days.
The ice is pretty, though, so there is an upside to all this. In the meantime,
the generator is working fine and the only thing that is not working are
the 220 volt stuff like the dryer and the hot water heater; and though Val
is getting somewhat rank, I still love her.
We had a bit of a problem just as the power died yesterday at 4:20 PM
eastern time -- it took them about one hour to deliver the generator instead
of 20 minutes as they were supposed to. Unfortunately, the outage coincided
not only with icy road, but also with rush hour. As a result, we lost power
for about 25 minutes after the UPS units drained. (Note to self -- get more
UPS units so I can hold for three hours instead of 90 minutes.)
Now the problem I have at this moment is how to count that downtime against
our server status numbers. We had scheduled a planned outage of 20 minutes
last night anyway and since we were able to take care of what we had planned
to do during the outage, do I count it as the planned outage or do I count
it against our percentage downtime figures? Given that I am dog tired from
working my butt off over the past 24 hours, I think I'll give myself the
benefit of the doubt and not count it.
Merry Christmas.
23 December 1998
We are taking down the entire system at midnight tonight in order to
upgrade the automated restart measures. In the event of a major power failure,
we will better able to remain on-line for in indefinite period of time.
The outage will last approximately twenty minutes.
1 December 1998
9:30 AM
OK...let's see just how good everyone is. I'm currently looking out my
door at a fountain of water exploding from the street. A major water main
has broken. The backhoes are ready to go and the folks are marking out the
location of the buried telco lines. The digging will begin shortly. Let's
hope that the backhoe does not take a liking to the telco lines.
In the meantime, I have BellSouth on the way out to look at the situation.
They are prepared to string aerial lines through the trees if necessary
to maintain the circuits.
Keep your fingers crossed...
Film at 11.
1:10 PM
Fine...so it's not 11 yet, but I though you'd like an update.The hole
is dug in the street and a big hole it is. A very cool hole at that. They
are about two feet from the telco cables and the guy running the backhoe
is really good. The pipe is mended; now they have to fix the sewer line
that was cracked when the underlying ground got wet and it sagged. Surprisingly,
it didn't even smell, though you know what they say...I don't think I'll
even go there... More to come...
3:15 PM
Well, what was presumed to be a busted sewer pipe turned out to be a
major sewer line with a nick in it. When they got it uncovered, the pipe
broke. Normally, that would not have been a significant deal, but that was
no ordinary sewer pipe. Apparently, it was the main sewer feed for an entire
section of Raleigh and we are near the bottom of the gravity well that keeps
the juice moving downhill. It was kinda like the Fourth of July, except
wetter and it didn't smell as good.
Anyway, the hole is getting very big and they will be bringing in the
portable lights with the night crew shortly. This is going to be a long
evening of fun and merriment. More later...
11:00 PM
It's 11 o'clock and there is no film. But there is still a log of sewage
in the neighborhood. We've tried to get them to move out...no wait...that's
another story entirely...
The pipes are fixed and the hole is being filled back in. Of course,
it will be sometime in March before the city actually gets around to re-paving
the road, so we will have to dodge orange and white barrels until then.
Small price to pay, though. They managed to get through an entire day with
a backhoe mere inches from the T1 lines without cutting any of them. I am
happy. On the other hand, everyone in the neighborhood is probably going
to come down with cholera, dysentery, and hepatitis shortly, and dogs will
be rooting around for months to come.
I also learned a lot about how municipal sewer and water systems worked
so today wasn't a total loss.
20 November 1998
I hate it when that happens. The graphic counter on one of the web servers
crashed and took down the server with it. Fortunately, the auto-rebooting
system kicked in just fine and the total outage lasted a bit under three
minutes. I really need to fix that thing even though it happens only about
every five or six months.
8 November 1998
There are major problems on the internet backbone today. A huge peering
point router in Santa Clara, California went south about 12:30 PM EST this
afternoon and the problems have cascaded throughout the entire system. What
has resulted is that just about everyone in the world, though transmitting
data internally without problems, are unable to talk to each other. In other
words, our system is fine and your system is fine, but my system is not
recognizing the existence of your system; kinda like when we track mud into
the house and our wives give us the silent treatment for our stupidity even
while we are cleaning up the problems. Unfortunately, I can not just buy
the internet some flowers and take it out to dinner to make good.
24 September 1998
Hadn't written anything in a while since I haven't needed to so I thought
I'd just mention a few things. We now offer NT/Front Page/ASP service. See
our account pricing page for more details, but don't dig too far into the
web site since we are in the process of completely updating it. We should
be done in a bout a week (though we have been saying that for a month now.)
We are going to add a new switch this weekend which will result in about
a 60 second network outage unless Chad drops his end of the ethernet cable
then we could be out for weeks. He promises to be careful and to pick up
any packets that happen to fall on the floor.
The LAN has now been completely upgraded with all new and faster machines.
There is nothing left to do so I am going to sleep for a while. Do not call
me before noon tomorrow.
7 September 1998
We have put our new back-up policy and plans into effect over this past
weekend. We have looked at many different methods of back-up strategy for
over a year now and managed to eliminate what are the most popular solutions
-- tape drive and RAID -- as rather problematic.
Though tape is efficient and low maintenance, it is very difficult and
time consuming to restore a failed hard drive from a tape system. But even
beyond that aspect, tape manages to not only back up the good files, but
also the corrupted ones. Tape is also magnetic and capable of losing data
from any number of problems. The software that runs tape drives is also
not very good regardless of what the manufactures of that software claim
in their advertisements. Remember, we process over three million electronic
transactions per day across nineteen different computers; that's a bit different
than what an individual or small business would experience in normal use.
And the more robust tape systems run in excess of $100,000.
We have also experimented with hard drive RAID systems and they are fraught
with their own problems. For instance, the two drives in a RAID system are
mirrors of each other. That means that if the primary drive fails, the secondary
drive contains a mirror image of what was on the primary and is able to
take over immediately. But RAIDs suffer from the same problems as tape drives;
if the file is corrupted on the primary, it is also corrupted on the RAID.
The only thing RAID really gives you is an immediate hot swap of drives.
Presumably, though, if the primary is failing, then it will be taking significant
errors in the data written to it and pass those errors over to the secondary.
And every reasonably priced software RAID solution we have tested has a
huge problem with stability. Given that RAID software failure occurs about
every 10 days and that the average hard drive will last for at least three
years before failure, it is clear that RAID is not exactly the best solution.
Full hardware RAID solutions that would serve our needs start at $60,000.
Better than tape, but still no banana.
So in order to provide a back-up solution that will keep our hosting
prices under two billion dollars per month per account, we have decided
to go with a multiple hard drive and an automated back-up system as a solution
that eliminates many of the problems. This is how it works:
Each machine, whether it be a web server, email server, listserv server,
DNS, or whatever, has at least two external hard drives attached to it.
One of the externals is the primary drive; that is where the system software,
the applications, and the client files reside. It is the live drive; (internal
drives do nothing but act as storage for "stuff".) The other drive
in the SCSI chain is the back-up drive. It is at least twice as large as
the primary drive. On early Sunday morning, early Wednesday morning, and
early Friday morning, we take the entire contents of the primary drives
and copy them unaltered to the respective secondary drives. Then every Sunday,
we take the latest contents of the secondary drives and compress the contents.
Those files are then moved to a separate file server. So this is what happens:
Sunday: Primary to back-up, with the latest day's contents of
secondary stuffed and placed in the archive.
Wednesday: Primary to back-up, placing a full and updated copy on
the secondary drive while stuffing the Wednesday copy and placing it on
the archive. Sunday's live copy gets erased.
Friday: Primary to back-up, erasing the unstuffed copy made to the
secondary on Wednesday. and stuffing the Friday copy and placing it on the
archive.
Sunday: Primary to back-up, erasing the full copy made on Friday.
The new contents are then stuffed and placed in the archive. We will keep
four weeks worth of stuffed files in the archive on-site for each drive
backed-up. On the fifth week, we erase the oldest weekly archived files.
On the last Monday of each month, the stuffed contents of the archive
are placed on a CD-ROM that is brought to the safe deposit box at our bank.
As time goes by, we will have complete back-ups of all files going back
to today.
Now, here's what this gains us. For starters, we are using only simple
programs that are tried and fully tested for stability and reliability (such
as Stuffit.) Second, we always have a full back-up on an independent, in-house
hard drive which includes a fully working system folder. All we have to
do is open a control panel, change the start-up drive, and restart the computer.
Third, in the event of a catastrophe, we have at least four full weeks of
files located off-site. Fourth, in the event that we lose a drive, we can
restore files quickly from uncompressed back-ups rather than tape. Finally,
the whole system is in place when we finish the new FTP server software
so that we can fully RAIC the machines (but that is another story for another
time.)
The downside is that in the event of a drive failure, it will take about
three minutes to restore the specific service and those files can be up
to 48 hours old. Both those problems disappear when we go to full RAIC (hopefully
in January, but don't hold me to that.)
The one thing that should be apparent from the above discussion is that
you, the client, is ultimately responsible for maintaining complete back-ups
of all files -- including email. With 4 gig external drives currently costing
less than $300 and high-capacity floppy media like Zip drives around $100,
there is no reason to ever lose files again.
But since we already have brought a Travan tape system, we are going
to put that on the email system only so that every email sent to a POP account
gets archived as well; there will be no latency on that service as soon
as we hook it up (probably this coming weekend.) In all cases of recovery
or restoration needed which is not a result of a problem on our side (i.e.,
your hard drive exploded and you didn't keep your own back-ups,) we will
charge $75 per hour with a one hour minimum for that service.
And as an aside to all that, we are pricing with contractors for the
installation of our own natural gas-fired generator. That way, in the event
of a major power outage lasting longer than about four hours, we do not
have to have one hauled in. We will be conducting a random drawing to determine
which one of our clients will be paying for it. :) :) :)
1 September 1998
What a week. We are almost done with the complete reconfiguration of
the entire LAN. We would have been done sometime last week had it not been
for hurricane emergency mode, but it is going well so far. All web servers
are now on G3 machines and they are screaming fast. Given the quickness
of the servers, the gross overcapacity of the internal LAN and the telco,
and the absolute stability of BellSouth, any slowness you experience can
now be squarely blamed on either the Internet congestion as a whole or your
dial-up connection flaking out.
hehehe
Anyway, we have also put email on an upgraded machine -- well, all services
have been upgraded except the secure server. That little puppy has proved
to be a bit problematic to move, so we are going to rest a few days (or
maybe a week or so) and try it again. During the whole process, no service
was down for more than about two minutes with the exception of FTP which
we had to shut down completely for about five hours to ensure a clean copy
of all software for backup purposes; fortunately, we do not have to do that
when we back up only your files.
I'm going to sleep now...
27 August 1998 -- 11:00 am EDT
The Sun just came out for a bit and it is clearing up. Bonnie has just
been downgraded to a tropical storm and continues to move away from us at
6 mph. It looks like it is over as least as far as we are concerned. This
will be my last update on Hurricane Bonnie. Now, let me go tend to those
90 emails that have backed up one me over the past two or three days.
27 August 1998 -- 9:40 am EDT
The generators are still sitting in and on trucks at the State Fairgrounds
and I am still torqued about the whole thing. But at least we are out of
danger. Bonnie is still sitting on the coast and has not made any significant
movement in the past six hours. Rainfall totals a mere 60 miles from here
are already upwards of 12 to 15 inches with at least 8 inches more expected
today.
In Raleigh, the wind has died down to a steady 20 mph with gusts to 35.
The rain has tapered off considerably after the storm dumped a bit under
three inches on us. Conditions here are expected to improve throughout the
day as Bonnie makes a slow turn to the northeast and pulls farther away
from us. On the other hand, the dew point is expected to be in the mid-70s
today with a temperature of 95. Combine that with unstable air and we may
have severe thunderstorms with tornadoes later this afternoon. We see that
all the time, though, and it doesn't phase us at all, but if I have to watch
that Lowes commercial on the Weather Channel one more time, I think I'm
gonna kill someone.
Then there is the issue of Danielle, but that Hurricane is still out
in the Atlantic and at least four days from landfall. I need to move to
Iowa...
27 August 1998 -- 3:00 am EDT
Sorry for no updates lately, but it has been rather hectic here. If you
have been following the progress of Hurricane Bonnie today, you would have
noted that there is no progress. The storm has stalled just north of Wilmington,
North Carolina and is sitting there spinning as a strong category two storm.
While it is doing so, the coast of North Carolina is beginning to wash into
the ocean. We have had a series of rain bands all day long with winds up
to about 45 mph. Later today the forecast is for continued winds and very
heavy rain (some areas of the coast and inland coastal counties have already
had 10 inches of rain with another 10 to 20 inches predicted before ending
Friday morning.)
So far, the power has held which is very good since we do not have a
generator. And I am not amused. Let me explain. When there is a potential
need for a generator, we simply make arrangements to rent one for $375 per
week. That is a lot less expensive than paying $16,000 or so for one of
our own (which, by the way, we would not have had to crank up a single time
since Hurricane Fran in September 1996.) Well, we again made arrangements
to have the generator delivered on Wednesday afternoon -- then the politicians
stepped in.
This being an election year in North Carolina, every two-bit piece of
political slime decided that they were going to suck as much pork for their
individual counties as possible. So they had the State literally commandeer
generators from all over North Carolina to use in the aftermath of Bonnie.
Those generators are presently sitting on trucks at the North Carolina State
Fairgrounds unable to be moved eastward until at least Friday morning when
the storm abates. Of course, that means that there are no generators in
the entire state to be had because the State, in active cooperation with
the morons from FEMA, have them all in their possession.
The situation reminds me somewhat of when Keith Richards of the Rolling
Stones comes to town and all the drugs within 300 miles of the concert site
disappear.
The upside is that Carolina Power and Light does not anticipate the situation
to get bad enough in our area that power outages would result. The downside
is that if the power does go out, we have enough back-up reserve to last
about three hours. On the other hand, as soon as things open in the morning,
I am going to start scrounging again for a generator. By the way, the generator
we do have (5000 watts) has a burned stator that can not be fixed for about
two weeks. No matter, though, since that thing is not designed to last us
more than about 10 hours anyway; it is designed to carry us through till
the generator arrives -- do I have to go into that situation again?
26 August 1998 - 1:50 pm EDT
Everything is fully backed up and we're waiting on the delivery of the
generator. The hard drives are at the bank, tucked away safe and sound,
and the sump pump is checked. Actually, the sump pump has nothing to do
with our ability to serve files, but if it goes, Val goes ballistic and
starts throwing corn puffs at me. It is not a very pretty sight.
Hurricane Bonnie is now located just off the North Carolina coastline
with winds still howling at 115 plus mph. The storm is now predicted to
turn slightly to the NNE which will move it away from Raleigh, though we
are still expected to get winds in excess of 50 mph and very heavy rain.
There is another official National Weather Service report due out at 4 pm.
When it arrives, I will post it.
25 August 1998 -- 11:45 pm EDT
Well, this is the latest. By the way, I have decided to start back-ups
now rather than in the morning since Hurricane Bonnie may pick up forward
speed tonight, leaving me without time to back-up and get the drives to
the safe deposit box. Until the full back-up is done, which should take
about four hours, the server will be a tad slow to respond. You may get
latency times of three to five seconds for a download to begin. I have also
turned off FTP services for the duration of the archiving so that there
is no possibility of file corruption.
WTNT32 KNHC 260246 TCPAT2 ADVISORY
HURRICANE BONNIE ADVISORY NUMBER 28
NATIONAL WEATHER SERVICE MIAMI FL 11 PM EDT TUE AUG 25 1998
...RAINBANDS OF HURRICANE BONNIE BEGINNING TO SPEAD ACROSS PORTIONS OF
THE COAST AND OUTER BANKS...
A HURRICANE WARNING IS IN EFFECT FROM CAPE ROMAIN SOUTH CAROLINA TO CHINCOTEAGUE
VIRGINIA...INCLUDING PAMLICO AND ALBEMARLE SOUNDS...AND CHESAPEAKE BAY FROM
SMITH POINT SOUTHWARD.
A HURRICANE WATCH IS IN EFFECT FROM SAVANNAH GEORGIA TO CAPE ROMAIN SOUTH
CAROLINA AND FROM CHINCOTEAGUE VIRGINIA TO CAPE HENLOPEN DELAWARE.
PREPARATIONS TO PROTECT LIFE AND PROPERTY SHOULD BE RUSHED TO COMPLETION
IN THE HURRICANE WARNING AREA. INTERESTS IN THE WATCH AND WARNING AREAS
SHOULD FOLLOW RECOMMENDATIONS FROM THEIR LOCAL EMERGENCY MANAGEMENT OFFICIALS.
AT 11 PM EDT...0300Z...THE CENTER OF HURRICANE BONNIE WAS LOCATED NEAR
LATITUDE 31.6 NORTH...LONGITUDE 76.8 WEST OR ABOUT 215 MILES SOUTH OF CAPE
LOOKOUT NORTH CAROLONA.
BONNIE IS MOVING TOWARD THE NORTH-NORTHWEST NEAR 14 MPH...AND A GRADUAL
TURN TOWARD THE NORTH IS EXPECTED BY MORNING. THE FORECAST TRACK BRINGS
THE CENTER TO NEAR THE OUTER BANKS OF NORTH CAROLINA AROUND MIDDAY WEDNESDAY.
TROPICAL STORM FORCE WINDS ARE LIKELY TO ARRIVE AT THE COAST OF SOUTH CAROLINA
AND NORTH CAROLINA WITHIN THE NEXT FEW HOURS AND HURRICANE FORCE WINDS ARE
LIKELY AROUND DAYBREAK ON WEDNESDAY.
MAXIMUM SUSTAINED WINDS ARE NEAR 115 MPH...WITH HIGHER GUSTS. SOME FLUCTUATIONS
COULD OCCUR BUT BONNIE IS EXPECTED TO REMAIN A POWERFUL HURRICANE FOR THE
NEXT 24 HOURS. BONNIE IS A LARGE HURRICANE. HURRICANE FORCE WINDS EXTEND
OUTWARD UP TO 145 MILES FROM THE CENTER...AND TROPICAL STORM FORCE WINDS
EXTEND OUTWARD UP TO 230 MILES.
AN AIR FORCE HURRICANE HUNTER AIRCRAFT RECENTLY ESTIMATED A MINIMUM CENTRAL
PRESSURE OF 965 MB...28.50 INCHES.
STORM SURGE FLOODING IS EXPECTED NEAR AND TO THE NORTH OF WHERE THE HURRICANE
REACHES THE COAST...INCLUDING IN PAMLICO SOUND AND ALBEMARLE SOUND...WITH
WATER LEVELS INCREASING UP TO 9 TO 11 FEET ABOVE NORMAL ASTRONOMICAL TIDAL
LEVELS.
LARGE SWELLS ARE PROPAGATING WELL AHEAD OF THE HURRICANE AND ARE IMPACTING
PORTIONS OF THE U.S. EAST COAST. SEE STATEMENTS FROM LOCAL NATIONAL WEATHER
SERVICE OFFICES.
RAINFALL TOTALS OF 5 TO 10 INCHES ARE POSSIBLE IN ASSOCIATION WITH BONNIE.
ISOLATED TORNADOESE ARE POSSIBLE IN THE VICINITY OF THE NORTH CAROLINA
OUTER BANKS WITHIN THE NEXT 12 TO 24 HOURS.
REPEATING THE 11 PM EDT POSITION...31.6 N... 76.8 W. MOVEMENT TOWARD...NORTH
NORTHWEST NEAR 14 MPH. MAXIMUM SUSTAINED WINDS...115 MPH.
MINIMUM CENTRAL PRESSURE... 965 MB.
AN INTERMEDIATE ADVISORY WILL BE ISSUED BY THE NATIONAL HURRICANE CENTER
AT 2 AM EDT FOLLOWED BY THE NEXT COMPLETE ADVISORY AT 5 AM EDT...WEDNESDAY.
(from National Weather Service at 23:06 ET)
25 August 1998
OK folks...this is the deal...
We are under a hurricane warning for Bonnie as of noon today. Hurricane
Bonnie is a strong category 3 storm located (at this moment - 8 PM EDT)
about 500 miles to our SSW (250 miles south of Cape Lookout, North Carolina)
and moving to the NNW at 16 mph. We are 130 miles inland and they have the
eye passing us to our east by about 50 miles. Of course, that's what they
said about Hurricane Fran in 1996 when Raleigh took a direct hit from a
category two storm. What a mess...anyway...
We are as ready as we can be. The big generator is on order for delivery
tomorrow. The sump pumps are checked and working. The portable generators
are tested and purring like kittens. I will be boarding up the east-facing
windows tomorrow afternoon. There is propane in the gas grill and the freezer
is full. So now we wait.
The current forecast for Raleigh calls for high winds in excess of 50
mph and heavy rain. And if that's all there is, we are just fine. But if
that eye gets closer to us, we could see winds in excess of 100 to 120 mph
and up to 20 inches of rain. That would be bad.
I am going to do a complete backup of all files around noon tomorrow
on an external hard drive. That drive will be placed in our safe deposit
box at our bank in the event of a complete catastrophe. That does not mean
that you can sit back and relax; as always, we encourage you to keep complete
back-ups of all your files at all times. If push comes to shove, we are
prepared to move the entire operation another 200 miles inland. And if Armageddon
strikes, we can move to Florida or Arizona (which are contingency plans
if the building is literally destroyed.)
Then again, nothing may happen at all. We will see.
I'll post another advisory late tonight after the midnight update from
the National Hurricane Center.
23 August 1998
We are planning a monstrous system upgrade in the next week. Every machine
we use for all services will be upgraded to faster machines. We are also
installing an NT network so that we can open up Front Page, ASP, and Cold
Fusion services. All work will be done between midnight and 6 am starting
on Tuesday night and continuing through at least Sunday. If we play our
cards right, we will have not more than a two to five minute outage on any
given service as we physically swap machines and drives. We'll keep you
posted as we go along.
1 August 1998
At about 6:30 am, we took down each of the servers to install a new piece
of software that accelerates web requests. We have been testing it for about
three weeks now and it seems to be a piece of killer software. It's name
it Nitro by Clearway Technologies
for those who want to know.
2 July 1998 through 6 July 1998
Starting on the evening of Thursday, 2 July and continuing through the
early morning of Monday, 6 July, we are planning a whole series of hardware
and software upgrades and reconfigurations -- a completion of what we started
last week. Throughout this time, the entire system or parts thereof may
be bouncing up and down for up to one-half hour at a time, though we don't
anticipate any single service to actually go down for more than a couple
of minutes at a time. Some of this work will have to be performed outside
our maintenance window, but we will keep it to a minimum.
When we are done, you perhaps will see no outward effect on any service,
but man...it's gonna make our job a whole lot easier. We will also be loading
the completed MGI database component to the server and will be putting that
software through live testing over the next two weeks or so (as well as
writing the tutorials for it.) When that is done, the entire MGI database-shopping
basket complex will be completed. It's been a long time coming, but it will
be worth the wait; you're not gonna believe what that thing can do. hehehe
23 June 1998
Today is not a good day. Come back tomorrow.
Actually, we are making some major changes in our LAN configuration,
adding several machines, and working with BellSouth to do some routing changes.
The whole process is estimated to take about 14 hours. Rather than starting
during the night and interrupting the peak time tomorrow, we have waited
until peak is over today and will be working throughout the night until
sun-up. As a result, some services will be bouncing up and down this afternoon
and evening as we physically move stuff. It also means that if anyone even
thinks about calling me between 6 am and noon tomorrow, I will hunt them
down and hurt them. Email will not be affected, nor will any listserv or
DNS.
12 June 1998
One of our users tried to violate PagePlop anti-spam policy this afternoon.
Not only did he try to use our email servers to spam, but he screwed it
up. Rather than using his program to sort out and send thousands of emails,
the list of email addresses got added as an attachment resulting in his
trying to send out thousands of emails with a 5.5 meg attachment. Needless
to say, the email servers were not very happy.
After about two hours, we finally got the whole thing sorted out and
the servers calmed down. during that time, you may have experienced some
difficulty in downloading email especially if the email you were trying
to download was of any significant size which required a connection to remain
open for more than a couple of minutes. No client email was lost as a result
of the spam attempt.
The offending user has been removed from our system permanently. His
web site is down. His email is gone. His DNS entry is history. And we are
in the process of orphaning his domain name so it no longer even points
to PagePlop.
5 June 1998
Sometimes you just have a bad day. We lost a drive today -- big time.
Since it was the first one we had lost, we had the opportunity to put in
to affect our recovery plans which up till now had been theoretical. Well,
sometimes the best laid plans...you get the rest.
What should have taken about five minutes actually took 27. We kinda
forgot about the time it takes to remount the drive and its files. Anyway,
we have learned our lesson and will be looking into serious hardware RAID
solutions which, by the way, run about $8,000 for the base models so if
anyone wants to send us money at random for the "Help Buy PagePlop
A Hardware RAID System" fund, we would greatly appreciate it.
29 May 1998
This is just too good...
The secure server crashed this morning. No big deal, though; you just
reboot it and the process is so fast that you don't even get a timeout of
incoming requests. Well, we ran into a little problem.
I hurt my ankle so Val had to go into the server room to reboot the machine.
but Val's got a bit of a problem -- she has no concept of geography. Not
that it takes a world explorer to find the server room, but all our machines
are named for countries and the numerous hard drives are cities within those
countries. Unfortunately for Val, only the city icons show up on the monitors
and not the country names. And unfortunately for one of the mail servers
and the listserv server, Val has no concept of what city is in which country.
A lot of machines got rebooted this morning, I tell you... :)
Labeling will begin shortly.
8 May 1998
We had a six minute outage of one of the web servers shortly after noon.
Here's how it went...
Steve: Hey Mikey, the server crashed.
Mikey: Yeah, I know.
Steve: Hey Mikey, I rebooted the server, but it didn't come back
up properly.
Mikey: Yeah, I know.
Steve: Hey Mikey. How about checking out the server?
Mikey: OK
Steve: See anything?
Mikey: Yeah.
Steve: What?
Mikey: I forgot to remove the old log files and the drive filled
up.
Steve: Oh. That was fairly ignorant, though I should have caught
it.
Mikey: Nah. It was my fault. I'm an idiot.
Steve: No Mikey. You're not an idiot. I'm an idiot.
Mikey: No Steve. You're not an idiot. I'm an idiot.
Steve: No. I'm the idiot.
Mikey: I'm the idiot.
Steve.: No, I'm...wait...let's blame it on Val.
Mikey: Great
Steve: Val's an idiot.
Mikey: Yes, Val's an idiot.
Steve: hehehe...Val is an idiot...Val is an idiot...
Mikey: Val is an...uh, Hi Val.
Val: What are you two discussing?
Steve and Mikey: Uh...nothing...
28 April 1998
One of the web servers hung this morning for about four minutes. There
was no apparent cause and has been stable for the rest of the day.
8 April 1998
Sometime this morning, a customer of one of our clients sent him and
email. That email went to our client's POP mailbox and also to an autoresponder
that our client has set up to send out an email when someone writes him.
Well, that autoresponder email was sent out. Unfortunately, our client has
the return address of the autoresponder pointing to the same POP account
that feeds the autoresponder.
Now, under normal circumstances, that would have caused no problems at
all, but in this instance, the customer of our client had his own autoresponder
set to respond to any incoming emails. And his autoresponder return address
was also the address of his autoresponder.
Some 4800 emails later, the email server came crashing to the ground,
screaming in pain. We got it all sorted out and things got back to normal
in about 15 minutes. No mail was lost, but was somewhat afraid to enter
the email server for a short time thereafter in fear that it would get yelled
at.
5 April 1998
It doesn't get any cooler than that. Right after 1 AM, the power slammed
off. And it wasn't one of those gradual things either. It was the type where
you could actually hear the noise from the wires as the electricity stopped
in its tracks. We were out for a bit over four minutes and every system
kicked in properly. But you knew that. So why am I posting this? Because
we finally tested out new toys in a real life situation.
We now have emergency lights. The whole neighborhood was dark and we
were lit up like a Christmas tree on the White House lawn. The whole thing
went off with almost no hitches...(remind me, though, that *every* room
in the house needs them.)
31 March 1998
Hey, it happens. The primary server went south (actually got hung) this
morning. We could have had it back up in about three minutes, but we kept
it down for a total of 17 to do some diagnostics. Fortunately, we discovered
the little conflict between two programs running on the computer and fixed
it, so that extra 14 minutes was not in vain.
20 March 1998
That was interesting. Beginning at a bit before 6 PM eastern time, severe
thunderstorms hit the Raleigh area, spawning large hail and tornadoes. We
were relatively lucky. We had several tornadoes pass just to our east and
to our west. The ones to the east missed us by about 1.5 miles and took
out a church, a shopping center, a hospital parking deck, and a whole bunch
of homes. The ones to our west were about 5 to 8 miles away and hit the
airport and points north of there. Directly overhead we saw a number of
cyclonic rotations in the clouds, but nothing ever dropped out of them over
our location. We had severe weather lasting until about 9 PM or so.
As of about 9 PM, some 23,000 homes were without power. We had intermittent
power outages throughout the evening, but nothing lasting more than a few
seconds at a time. It is now pushing 11 PM and all has calmed down; in fact,
the stars are out. I'm going to eat some crab legs now. Val is sleeping.
Mikey's at a movie. The servers are throwing a party and the UPS units are
already drunk after celebrating a hard nights' work well done.
17 March 1998
It appeared to have worked...bwhahahahahahaha...
We finally got finished about 8 am this morning, after about 16 hours
of work with no food in freezing rain and with vicious gargoyles nipping
at our toes...OK, perhaps no gargoyles, but the fact remains that the server
appears to be fully stable once again.
We will keep a close eye on things for the next several days and take
steps as necessary if need be.
13-16 March 1998
Not a very pretty three days. The primary web server has been bounding
up and down for the past 90 hours or so. On the morning of the 13th, and
after over three months of testing, we upgraded to the latest software version
of the web server and something is not going right. In all, the server has
crashed dozens of times for a total of 135 minutes of outage (including
one outage lasting 87 minutes which was primarily for diagnostic purposes.)
When the server goes down, in most cases the recovery time is less than
2 minutes or so and in many cases so fast that the end user does not even
time out on the connection. Indeed, not counting that 87 minute diagnostic
session, our percentage uptime over the 70 hours of this situation has still
exceeded 99.99 percent; even counting the diagnostics we have maintained
99.97 percent uptime during this issue.
But not wanting to put a full, rosy face on the picture, I still hate
that it is happening.
We are continuing to work on the situation in order to discover the exact
cause of the problem and will keep you posted. In the meantime, just bear
with us. If we can not get a satisfactory resolution to this situation by
early Monday morning, we are going to revert back to the old configuration.
If that should happen, we will be able to do a hot swap that will not affect
proper serving of your pages.
3 March 1998
GridNet lost a "hissy" card this morning in one of its circuits
around 10:45 am est. A new card was installed and connectivity was restored
around 11:49 am est.
2 March 1998
Well folks, we've reach the point where monthly statistics have gotten
completely out of hand. We've got so many clients who want statistics and
the logs themselves have gotten so big that it is taking us about 35 hours
to run all the reports using a fully dedicated machine for the task. So
for the time being, your stats will be delivered on the 2nd instead of the
1st of the month until we figure out a better way of doing this. Hopefully,
that will be done by the April statistics mailing on 1 (or maybe 2) May.
2 March 1998
The servers were down for a bit under three minutes this morning when
Mikey, the Wonder-Geek decided to pull an "I wonder what this does"
bone-head test. Fortunately, the momentary damage was limited to our own
network and did not spread to the entire civilized world.
Mikey is presently being flogged.
21 February 1998
BellSouth performed routine maintenance this morning that resulted in
an outage of approximately 22 minutes. Surgery was successful.
20 February 1998
OK...what's a hissy card? At least that's what BellSouth called the thing.
I have no idea and I get the sneaking suspicion that no one at BellSouth
knows either; in fact, I think they made it up. Nonetheless, that is what
is being blamed on a 25 minute telco outage this evening. Perhaps it's time
for a hissy fit.
20 February 1998
Talk about stupid. Last night we upgraded some software on the mail server.
Now that server has two programs that actually deliver mail. The first one
- and the target of the upgrade - is the one that accepts and sends email.
The other one is designed to sort that mail internally and make sure it
gets to the right mailbox on our system. I forgot to restart the second
one.
At about 8:30 am this morning, I realized what had happened and launched
that puppy. Thousands upon thousands of email that had been stacked up in
the hold queue started getting delivered. About ten minutes later, the server
heaved a sigh of relief and went to take a quick nap; delivering all that
mail was exhausting work.
Fortunately, we have built-in, anti-stupid protection and the email system
is set-up so that no mail gets lost. It just takes a vacation for a bit.
16 February 1998
Wow...long night. But step one is done. We spent a good seven hours moving
physical machines and upgrading software, moving services to different machines,
and upgrading telco stuff. For those who noticed, we bounced up and down
all night long as we made the various moves. We are now in a very good LAN
position to expand as intended with MGI.
Tomorrow night will see a few more changes, the most significant as far
as clients are concerned is that anyone who is using Secure Services will
get a different FTP login host name from their existing account on the regular
web servers. In the event you are effected and should not read your email
but every month or so, the host name for Secure Services will be changing
to ftp.secure.domain.(com, net, or org...whatever is applicable to your
account.) The login ID and passcodes will remain the same for both the secure
and unsecure side. And just in case I need to mention it, the host name
for the regular web servers will still be ftp.domain.(again...whatever...)
We're going to sleep now...
10 February 1998
One of the web servers decided to GO-NAC (Glitch Out - No Apparent Cause)
today, taking about 400 clients down for some five minutes. Nothing major,
just irritating. But it does give me a moment to explain something about
how situations like that are handled here.
There is a single computer that checks every machine we have on-site
and all our off-site locations. It also checks every single router between
us and the Internet. That check to the entire system is performed every
30 seconds. If something does not respond, no matter what device or service
it may be, this really nasty klaxon siren goes off.
Now that siren is sitting on my desk, but you can probably hear it in
Ghana when it sounds. And if for some reason we are out of earshot, it is
also hooked up to a paging system that sounds alarms on our pagers. So we
go running. (It's also worthy to note that some of the critical machines
are connected to autorebooters that will automatically re-start the machines
when alarms start sounding.)
Once we reach the server, it takes about three minutes for it to come
back up. But some people may notice that their FTP does not work for about
another eight minutes. The reason for that is due to file sharing. The FTP
server is not located on the same machines as the web servers. It takes
about eight minutes to re-establish that connection after the server comes
back on line.
7 February 1998
As if yesterday wasn't bad enough...
Some cretin cut a major fiber optic cable at MAE-WEST shortly after 9
PM eastern time this evening. That essentially crashed MAE-WEST. So much
of that traffic started routing through MAE-EAST which promptly started
crashing routers on the east coast. Things are a mess at this point with
no indication when the problems will be corrected. Generally, it takes about
four hours to splice a fiber line when it is cut then the whole system has
to come back to stabilization which put us at about 1 or 2 AM eastern before
things start coming back to normal. Time to eat since there is literally
nothing that anyone can do at this point.
6 February 1998
MAE-EAST is taking an enormous amount of traffic this afternoon which
is affecting any internet request that needs to pass through that point.
Packet losses are hitting 100 percent with respect to many of the other
major national providers such as MCI and Sprint. Until things calm down
later this evening, there will be difficulty in reaching sites and sending
email. Unfortunately, there is absolutely nothing we can do. Our system
is up and stable and BellSouth is showing no problems on their system right
up to MAE-EAST. We just need to ride it out.
4 February 1998
El Nino Strikes again! A BellSouth telco facility in RTP was depressed
from the torrent of rain that has plagued North Carolina for two days straight
and went quietly into that good night from loss of power at approximately
2:30 pm est taking all telco service in the greater Raleigh area with it.
Upon noticing the attempted power outage, BellSouth rescue teams went straight
to work and resuscitated the facility at 3 pm est. PagePlop would like to
thank BellSouth rescue for their quick response. We wish the facility well.
2 February 1998
Someone mailed something to someone that our mail server did not like.
We assigned it a file name beginning with the letter "A". That
particular email then went to the top of the process bin, but the process
bin did not know what to do with the email. So it sat there.
Other emails came in and piled up behind the file beginning with the
letter "A" and sat there themselves since the email server was
still trying to figure out what to do with the first email in the alphabetical
list. And sat there they did.
About 400 emails later, someone noticed that they were not getting mail.
We investigated. We fixed. And the floodgates opened. The poor email server
looked down the pipe and saw all those emails flying at it. It started to
cry for its mommy, but it handled the task admirably. We gave it a glass
of Ovaltine and it is now resting.
1 February 1998
One of our back-up power supply units decided it was time to retire this
morning. It packed up the wife and kids, loaded the van, and headed to the
Bermuda Triangle where it is sharing a condo with some old electrical equipment
that a family of shrimp salvaged from a downed airplane. Luckily, it put
its old house up for sale first so we noticed the sign in the yard and were
able to make an offer which was accepted. We bought the land, tore down
the old USP unit, and erected our own (we always keep an extra one spare
just in case we get the golden opportunity to take advantage of such a good
USP real estate deal on such short notice.)
It took us about four minutes to build the new USP, but obviously it
will take far longer to replant the trees and shrubbery; we are going for
a tiered effect with bordering landscape timbers. APC is assisting us as
well. Their emergency contact got back to our page within five minutes and
a new USP unit will be here tomorrow to replace what used to be our back-up
which is now our new home.
26 January 1998
The FTP server went south on us and caught me by surprise (I was helping
put in a hot water heater sans pager and out of range of the alarm system.)
It was down for about 20 minutes and took another 10 or so to stabilize
after reboot. Sorry if it caused any problems. I will be tying the FTP machine
into the autoreboot system with the week.
We are also getting reports of sites hosted on PagePlop being difficult
to reach today. There are a lot of problems on the net today, none of which
involve us or BellSouth. Mindspring, for instance, has been disconnected
from the rest of the world for at least 12 hours. And some other large networks
are showing between 10 and 25 percent packet loss. If you can't reach us,
it is because your provider is having problems either with their own system
or Internet provider. It's gonna be another long day.
23 January 1998
We had a short outage of the primary web server that lasted about eight
minutes. A preference file became corrupted and needed to be rebuilt. The
people responsible for maintaining that preference file have been flogged.
21 January 1998
BellSouth performed routine maintenance on their system between 2:03
AM and 2:39 AM EST. The result of that maintenance is faster and more reliable
service. We are happy.
12 January 1998
Well, that was a tough one involving two simultaneous errors. The first
obviously affected the web server. It went down at 8:30 AM EST. The problem
was tentatively traced back to a corrupted .gif file that really hosed the
server software. And if that wasn't bad enough, there is a bug in the software
we are using to check the integrity of the system. That bug has been reported
to the developer, and we are in the process of fixing the corruption problem
on the server.
We are now back on-line as of 12:10 EST. The next question is: Why do
things always seem to fail in pairs?
Return to the Server Status Menu
|