Oh what a week I’ve had!
Yes, the webserver is back. Not only was this website offline but so were a number of other websites, including FreshPorts and FreshSource. I’m very glad to have it back because I wasn’t looking forward to transferring everything and setting up a new box.
I thank those of you who contacted me regarding hosting alternatives. I appreciate that greatly. I will be contacting you via email to express my gratitude.
It all started some time ago….
It’s been an interesting week, which started last Friday. It was then we (myself and two guys I work[ed] with) decided to move our Ottawa colocated box to another location. Yes, we’re staying with iStop but we are moving to a cheaper location within their domain. That box was receiving much less traffic now that I had moved FreeBSDDiary and FreshPorts to this new webserver out in Vancouver. With us all three of us out of work from layoffs, we’re trying to keep costs down. Hence the move.
The move necessitated an IP renumber. That was OK. We have a backup mail server, out in Vancouver (this
box actually). iStop were even willing to transport the box for us at no extra charge. Great!
I modified the DNS settings in advance and had them ready to go. I even altered
/etc/rc.conf on the box
so all they had to do was plug in and power on. What could go wrong?
I got a call from iStop saying they were ready to power down the box. That was my cue to do the shutdown. Down went the box about 13:00. In went the DNS changes. All that was left to do was sit around and wait for the box to be powered back up, in the new location, with the new IP address.
Dum… de dum…. de dummdy dummdy de dum….
I got a call saying the box would not power back up. Damn. So I grabbed an extra box from the basement, stopped by OEM Express for an extra power supply unit, and headed out to the west end of the city.
When I arrived, I first attempted to use the backup box and just transfer over the disk and try that. No go. Then I remembered this was a large disk in an old box and suspected BIOS problems. So I put the disk back into the original box and tried the replacement PSU. Yep, that worked. So I put that in and m20 was up and running. Great. That’s just want I needed.
Time spent: about 3 hours from leaving home and finishing the job. I was hot, tired, and needed a beer. And pizza.
I was just walking into Mr Mozzarella to pick up the pizza when I got a phone call from my provider in Vancouver, BC Hosting. They were moving location, renumbering IPs, and were wondering if they could do that tonight. Sure, I said. No worries. So I picked up the pizza, headed home, and modified more DNS settings in about 7 different domains.
Shortly thereafter, the box was moved, and everything was back up and running. Sure it would take a while for people to get back to the website, because of DNS propagation. But I provided alternative access methods using different website prefixes. These worked, but not everyone was able to get to them. I apologise for that.
On to New Zealand
Come Monday, everything seemed to be working. The DNS was propagating. Traffic was back to normal levels. But graphs show that resumption was a gradual process.
Satisfied, I decided it was time to modify some things with my New Zealand servers. My provider, CityLink provide me with bandwidth for the NZ ADSL mailing list as well as the NZ FreeBSD website mirror and cvsup mirror. They were also renumbering. So I added the new IP addresses and managed to block one of the boxes off the net. Luckily, it had a serial console available via the other box. That was my savior that day. I didn’t want to risk changing the other box as it did not have a serial console. So I relied upon Andrew Thompson to finalize those changes. While he was there, he strung a second null-modem cable between the two boxes. If two boxes act as mutual serial consoles for each other, then you need two cables. Proof to the contrary is quite welcome.
So there was that box up and running. Smoothly. Both of them.
On Wednesday morning, I met with friends for a networking session. Before I left, I found that my DSL was down. I left a message with iStop and they called me back later. It seems that one of Bell Canada’s PPPoE servers went down. It was back by about noon and I was back online.
In the meantime, I pulled out a 33.6 modem and got online with that. I was pretty impressed with how easy that was. Armed with my copy of The FreeBSD Handbook, I was able to figure out to to get PPP running. First, I tried connecting with tip. That proved very useful. Here’s what I saw. I typed what you see in bold.
I then added the following section to my
[root@bast:/home/dan] # tip com1
PPP session from (10.0.1.7) to 10.0.2.140 beginning....~}#@!}!}!} }0}"}&}
} } } }%}&VkC^DS~~}#@!}!}"} }0}"}&} } } } }%}&VkC^:}+~~}#@!}!}#} }0}"}&} } } } }
set phone "555 1212"
set authname myloginid
set authkey mypassword
set login "TIMEOUT 10 gin: \\U word: \\P"
set ifaddr 0 0
add default HISADDR
# disable all compression
NOTE: you may not need all those
disable statements. That
\\U represents the user name. Similarly, the
\\P is where my password is injected.
To dial, I just issued this command:
[root@bast:/etc/mail] # ppp -ddial istopdialup
That got me back online and I could read my email. But hmmm, my websites were all offline…
It wasn’t until Wednesday about 1pm that I found out that my Vancouver box was offline. It was responding to pings, but nothing else. Hmmm…
I called my provider, but no answer. I couldn’t ping them either. I did eventually leave them a message. And here I was on a dial up!
I shutdown the dial up about 13:30 and tried my DSL. It was back. Great. Now to figure out what’s going on with that box.
Most of the afternoon was spent wondering where the box was and why my provider hadn’t called me back. I was quite sure something big had happened. Perhaps a big cable cut. I asked friends out in BC if they’d heard anything. Nothing.
On Thursday morning, I thought the first thing to do was contact everyone and let them know the box was offline and there was no ETA for a return. I emailed all the FreeBSD Diary and FreshPorts subscribers. It was then that I started to get queries regarding services. “How much bandwidth do you need?”, “How much disk space?”. Within hours I had more than 15 queries and two offers of fully dedicated boxes and a couple of free colo offers.
I appreciate that. I really do. Without such generosity I would not be able to pay for this and other websites.
For those interested, here’s what the box is/does:
I think it does about 25-40GB a month in traffic. A fast box is good as it will be running a pretty big database and processing is sometimes intensive.
It was running fairly well on a PIII 800, 512MB ram and a 30GB HDD. The box also ran The FreeBSD Diary, FreshSource, and some of my personal websites. Actual disk space used I’m not sure but looking at the daily reports, I think it was about 8 or 9 GB including the base system. I need a full copy of the ports tree.
I need several other installed packages, some of which are not in the ports tree. Some daemontools process which work the incoming email, but the monitoring process runs as root.
The bright idea
As mentioned, by about 14:30, I had two offers of a dedicated box and a few more offers of free colocation. I wasn’t looking forward to installing everything and getting the website caught up. But what choice did I have?
I then tried another traceroute to the box. It didn’t get there. DOH! Let’s call the last ping in the chain! I phoned DataFortress.com. Ahhhh, that’s the solution.
I got an answer, explained who I was, and got through to the guy who did the IP renumbering on Friday night. All the boxes had been moved from BCHosting into DataFortress. Ahhh…! And the phone wasn’t answered because the owner was out of the country and the staff were over at DataFortress… DOH!
The tech guy went to the box and checked what was going on. There was no response other than ping
from anywhere. He got in and found out the problem was
I was still using the old name servers from my previous IP address and those servers were no longer
available. So that’s why nothing was answering. DNS issues…..
He changed the settings and I was able to log in. Great. I have my box back.
The first thing I noticed was Apache wasn’t running. Fine by me. Then I blocked all incoming http except from my home IP address. I wanted to check things out before everything else came back. Things were OK on FreshPorts, just out of date. I waited for the mail messages to start flowing in from the FreeBSD mailing lists, and they did. After an hour or so, it was caught up with the previous 24 hours of email, and the database was up to date.
At some points, the load average spiked to over 25 while the messages flowed in, procmail struggled with the messages, and I monitored things from my cold Ottawa location.
From my reconciliation, no commits have been missed. Everything is there. Phew.
It could have been done better
Anything can be done better. Including the processes described above.
I bear no ill will towards my provider. They have provided good service. They continue to do so. This incident has been the biggest glitch in all the time I’ve been with them.
I partly blame myself.
You see, I had thought about the name servers earlier in the week. But I didn’t contact them to ask about it. Then it bit me. Grrr….
By the way, I thought I had a spare PSU here. I didn’t. So I bought an extra one while at OEM Express. That’s now sitting here, ready go to. The failed PSU is waiting to be returned to the manufacturer. It had a 24 month warranty and was purchased in June 2001.
Everything is back. It’s all there.
And it’s very warming to know how many people are willing to help out.