Oh what a week I’ve had!
Yes, the webserver is back. Not only was this website offline but so were a number of other websites,
including FreshPorts and
FreshSource. I’m very glad to have it back because I wasn’t
looking forward to transferring everything and setting up a new box.
I thank those of you who contacted me regarding hosting alternatives. I appreciate that greatly.
I will be contacting you via email to express my gratitude.
It all started some time ago….
It’s been an interesting week, which started last Friday. It was then we (myself and two
guys I work[ed] with) decided to
move our Ottawa colocated box to another location. Yes, we’re staying with
iStop but we are moving to a cheaper location within their domain.
That box was receiving much less traffic now that I had moved FreeBSDDiary and FreshPorts to this new
webserver out in Vancouver. With us all three of us out of work from layoffs, we’re
trying to keep costs down. Hence the move.
The move necessitated an IP renumber. That was OK. We have a backup mail server, out in Vancouver (this
box actually). iStop were even willing to transport the box for us at no extra charge. Great!
I modified the DNS settings in advance and had them ready to go. I even altered
/etc/rc.conf
on the box
so all they had to do was plug in and power on. What could go wrong?
Well….
I got a call from iStop saying they were ready to power down the box. That was my cue to do the shutdown.
Down went
the box about 13:00. In went the DNS changes. All that was left to do was sit around and wait for
the box to be powered back up, in the new location, with the new IP address.
Dum… de dum…. de dummdy dummdy de dum….
I got a call saying the box would not power back up. Damn. So I grabbed an extra
box from the basement, stopped by OEM Express for an extra power
supply unit, and headed out to the west end of the city.
When I arrived, I first attempted to use the backup box and just transfer over the disk and try that. No go.
Then I remembered this was a large disk in an old box and suspected BIOS problems. So I put the disk back
into the original box and tried the replacement PSU. Yep, that worked. So I put that in and m20 was up
and running. Great. That’s just want I needed.
Time spent: about 3 hours from leaving home and finishing the job. I was hot, tired, and needed
a beer. And pizza.
The pizza
I was just walking into Mr Mozzarella to pick up the pizza when I got a phone call from my
provider in Vancouver, BC Hosting. They were moving location,
renumbering IPs, and were wondering if they could do that tonight. Sure, I said. No worries.
So I picked up the pizza, headed home, and modified more DNS settings in about 7 different domains.
Shortly thereafter, the box was moved, and everything was back up and running. Sure it would
take a while for people to get back to the website, because of DNS propagation. But I provided
alternative access methods using different website prefixes. These worked, but not everyone
was able to get to them. I apologise for that.
On to New Zealand
Come Monday, everything seemed to be working. The DNS was propagating. Traffic was back to normal
levels. But graphs show
that resumption was a gradual process.
Satisfied, I decided it was time to modify some things with my New Zealand servers.
My provider, CityLink provide me with bandwidth for the
NZ ADSL mailing list as well as the
NZ FreeBSD website mirror
and cvsup mirror. They were also renumbering. So I added the new IP addresses and managed to block
one of the boxes off the net. Luckily, it had a
serial console available via the other box. That
was my savior that day. I didn’t want to risk changing the other box as it did not have a serial
console. So I relied upon Andrew Thompson to finalize those changes. While he was there, he strung
a second null-modem cable between the two boxes. If two boxes act as mutual serial consoles for each
other, then you need two cables. Proof to the contrary is quite welcome.
So there was that box up and running. Smoothly. Both of them.
DSL dies….
On Wednesday morning, I met with friends for a networking session. Before I left, I found that my DSL
was down. I left a message with iStop and they called me back later. It seems that one of Bell Canada’s
PPPoE servers went down. It was back by about noon and I was back online.
In the meantime, I pulled out a 33.6 modem and got online with that. I was pretty impressed with how easy
that was. Armed with my copy of The FreeBSD Handbook, I was able to figure
out to to get PPP running. First, I tried connecting with tip. That proved very useful. Here’s what I
saw. I typed what you see in bold.
[root@bast:/home/dan] # tip com1
connected
ATDT 5551212
CONNECT 9600/ARQ/V34/LAPM/V42BISlogin: myloginid
Password:
PPP session from (10.0.1.7) to 10.0.2.140 beginning....~}#@!}!}!} }0}"}&}
} } } }%}&VkC^DS~~}#@!}!}"} }0}"}&} } } } }%}&VkC^:}+~~}#@!}!}#} }0}"}&} } } } }
%}&VkC^}0C~~
[EOT]
I then added the following section to my /etc/ppp/ppp.conf
:
istopdialup:
set phone "555 1212"
set authname myloginid
set authkey mypassword
set login "TIMEOUT 10 gin: \\U word: \\P"
set ifaddr 0 0
add default HISADDR
disable DNS
disable ipv6cp
#
# disable all compression
#
disable deflate
disable pred1
disable vjcomp
NOTE: you may not need all those disable
statements. That
\\U
represents the user name. Similarly, the
\\P
is where my password is injected.
To dial, I just issued this command:
[root@bast:/etc/mail] # ppp -ddial istopdialup
That got me back online and I could read my email. But hmmm, my websites were all offline…
Vancouver disappears….
It wasn’t until Wednesday about 1pm that I found out that my Vancouver box was offline.
It was responding to pings, but nothing else. Hmmm…
I called my provider, but no answer. I couldn’t ping them either. I did eventually leave them a
message. And here I was on a dial up!
I shutdown the dial up about 13:30 and tried my DSL. It was back. Great. Now to figure out what’s
going on with that box.
Most of the afternoon was spent wondering where the box was and why my provider hadn’t called me back.
I was quite sure something big had happened. Perhaps a big cable cut. I asked friends out in BC if they’d
heard anything. Nothing.
On Thursday morning, I thought the first thing to do was contact everyone and let them know the box was
offline and there was no ETA for a return. I emailed all the FreeBSD Diary and FreshPorts subscribers.
It was then that I started to get queries regarding services. “How much bandwidth do you need?”, “How
much disk space?”. Within hours I had more than 15 queries and
two offers of fully dedicated boxes and a couple of free colo offers.
I appreciate that. I really do. Without such generosity I would
not be able to pay for this and other websites.
Thanks folks.
For those interested, here’s what the box is/does:
I think it does about 25-40GB a month in traffic. A fast box is good
as it will be running a pretty big database and processing is
sometimes intensive.
It was running fairly well on a PIII 800, 512MB ram and a 30GB HDD.
The box also ran The FreeBSD Diary, FreshSource, and some of my
personal websites. Actual disk space used I’m not sure but looking
at the daily reports, I think it was about 8 or 9 GB including the
base system. I need a full copy of the ports tree.
I need several other installed packages, some of which are not in
the ports tree. Some daemontools process which work the incoming
email, but the monitoring process runs as root.
The bright idea
As mentioned, by about 14:30, I had two offers of a dedicated box and a few more offers of free colocation.
I wasn’t looking forward to installing everything and getting the website caught up.
But what choice did I have?
I then tried another traceroute to the box. It didn’t get there. DOH! Let’s call the last ping
in the chain! I phoned DataFortress.com. Ahhhh, that’s the solution.
I got an answer, explained who I was, and got through to the guy who did the IP renumbering on Friday
night. All the boxes had been moved from BCHosting into DataFortress. Ahhh…! And the phone wasn’t
answered because the owner was out of the country and the staff were over at DataFortress… DOH!
The tech guy went to the box and checked what was going on. There was no response other than ping
from anywhere. He got in and found out the problem was /etc/resolv.conf
.
I was still using the old name servers from my previous IP address and those servers were no longer
available. So that’s why nothing was answering. DNS issues…..
He changed the settings and I was able to log in. Great. I have my box back.
Recovery
The first thing I noticed was Apache wasn’t running. Fine by me. Then I blocked all incoming http
except from my home IP address. I wanted to check things out before everything else came back.
Things were OK on FreshPorts, just out of date. I waited for the mail messages to start flowing in
from the FreeBSD mailing lists, and they did. After an hour or so, it was caught up with the previous 24
hours of email, and the database was up to date.
At some points, the load average spiked to over 25 while the messages flowed in, procmail struggled with
the messages, and I monitored things from my cold Ottawa location.
From my reconciliation, no commits have been missed. Everything is there. Phew.
It could have been done better
Anything can be done better. Including the processes described above.
I bear no ill will towards my provider.
They have provided good service. They continue to do so. This incident has been the biggest glitch in all
the time I’ve been with them.
I partly blame myself.
You see, I had thought about the name servers
earlier in the week. But I didn’t contact them to ask about it. Then it bit me. Grrr….
By the way, I thought I had a spare PSU here. I didn’t. So I bought an extra one while at OEM Express.
That’s now sitting here, ready go to. The failed PSU is waiting to be returned to the manufacturer.
It had a 24 month warranty and was purchased in June 2001.
Thanks
Everything is back. It’s all there.
And it’s very warming to know how many people are willing to help out.
Thank you.