System freezes up during reboot
I have a system that freezes during the reboot process. The computer is running FreeBSD 6.1-STABLE
and is very reliable and, well, umm, stable. This is not a random reboot problem. It is a problem that occurs
during reboot. The freeze occurs after the BIOS screen and just after the SCSI card loads its BIOS. What should
be appearing on the screen is the preload information. Usually, you see an underscore at the top left of the console.
Then it drops down one line and goes into the spinning charcter associated with a FreeBSD preload. That underscrore
never appears. The console goes completely blank and stays that way.
Pressing ctl-alt-del does not reboot the system. Pressing the RESET button on the system seems to have no affect.
A power cycle does reboot the system. After every such power cycle, the BIOS has reported that I either need to
press F1 to modify the BIOS settings or press F2 to load defaults. I have been pressing F1, then F10 to save and reboot.
The box then properly reboots.
This problem is annoying because I the system is frequently rebooted to use a new kernel. Having to trudge down into the
basement to power cycle, and then press F1, F10, and waid is annoying. Especially when it should not be necessary.
Testing the reboot process
This problem has been going on, more or less, since I got this box. It is an AMD 3000+ processor on an AV8 Deluxe motherboard.
It has 1GB of RAM and an 80GB SATA drive.
One Saturday, I was especially annoyed by this problem, so I rescued the system from the basement, and set it up on the dining
room table. “How long will this be here” I was asked. “Until it’s fixed”, which of course, could be forever. It is just as well
I fixed, or at least think I fixed, the problem yesterday. Tomorrow is a birthday BBQ, and we need the dining room table back.
With the system in the dining room, I can reboot it as I pass by. It’s a high traffic area. I have to go past the system any time I move
about the house. This makes it ideal. I can reboot the system without having to attend to it at all times. I would press ctl-alt-del
each time I went by. Sometimes it would hang. Sometimes it would reboot. There was no pattern.
For a while, I thought that the freeze only occurred after a “shutdown -r now”. I eventually proved that wrong. Then I thought it only
occured when rebooting over ssh. Wrong. There was no pattern. At one point, I’d done over a dozen reboots without a single freeze. I thought
I had it solved. I had been reseating cards, moving them around, removing them, reseating memory, cleaning off dust, and generally getting
frustrated with the whole problem. When it went away. OK. Great. Fixed!
Then the problem returned.
Is it ACPI?
For a while, I was chatting with people on IRC about this, we thought the problem might be ACPI.
I tried some recent ACPI patches. The patch appeared to make things worse, but probably had no effect
at all. The problem continued to occur after I backed out the patches.
I tried booting with and without ACPI. I played with BIOS settings. Nothing seemed to affect the problem.
The video card!
During my testing, I had tried removing the NIC (Intel 82559 Pro/100 Ethernet) and the SCSI card (Adaptec 2944 Ultra SCSI adapter) to see
if that affected the results. It did not. I tried moving around the VGA card. No changes. The problem still occurred on a seemingly random
basis. Yesterday, I was getting the problem for about 8 consecutive reboots.
I had a stash of about 4 video cards in the basement. I retrieved them and started further testing. I discovered that the problem did
not occur when I was using an AGP card. The problem did occur at least once whenever I used either one of the two PCI video cards.
Right now, I’m using an AGP card from ATI Technologies Inc (RV100 Radeon 7000 / Radeon VE). I’ve done about 20 reboots in a row without
a single freeze. Hopefully the problem has gone away.
The PCI VGA cards I tried (both of which were used during at least one reboot freeze) were:
- S3 SonicVibes (86c617) Audio Accelerator PCI Trio 64V2 DX/GX from S3 Graphics Co., Ltd.
- MGA 1064SG Hurricane/Cyclone 64-bit graphics chip from Matrox Electronic Systems Ltd.
It may not be the particular card in question… it may just be the use of a PCI card versus an AGP card.
Witness my testing
For what it’s worth, here is the history of the reboots and shutdowns:
last.txt.
Some interesting stats:
- There have been 168 shutdowns in May.
- There has been 167 reboots.
- Today (the first day after I found the AGP video solution), I have done 26 shutdowns
and 27 reboots.
That is about 6 reboots per day over the past 4 weeks.
So what caused the problem?
I always prefer to know the exact cause of a problem. With that knowledge you can positively identify the cause and verify the fix.
You can therefore prove that the problem has been fixed. However, in this case, I’m not 100% sure I’ve found the cause. But I do think/hope
it’s fixed.
Do you have any theories? If so please use the comment link at the bottom right of this page. Thank you.
Maybe it’s the agp driver that is causing problems?
What happens if you build a kernel without the agp driver? It’s included in the GENERIC kernel on amd64, i386 and pc98.
My money is on it just being a motherboard BIOS issue. Try various video options in the BIOS if you have them. Turn on and off video caching etc.. Your in the minority of people using a PCI video card with that system board, esp with Freebsd. So if no BIOS affect it the only thing you can do is stick with the AGP card.
In my experience Hangs of non busy mostly working systems are from two events. halts and unreturned interrupts. Halts can be usually related to error traps and the bios setup often can be told not to halt on errors (minor errors) the interrupts are more flaky which seems to fit you report. There are interrupt tests which will pound on the interrupts to see if one will skip or fail. I used to use diag but that was the dos days and I am showing my age. I have two cards now one is ISA slot card and it did more diagnostics. The newer one has card slot fingers on the top and bottom to handle ISA and PCI slots. I have seen none for the AGP slots. But I am only pointing you in what to test in this reply. The approach I would use would be to repeat the running looping Interrupt diags with diffrent boards inserted in the pci (nothing in the agp.) If a board causes trouble try it in another unit Test it if possable. This will pinpoint the slot, mother board, pci adaptor The repair of the failed device is most likely beyond you. If you want to go deeper a oscilloscope is needed a a big time budget.
I just had a very similar issue. I did a full, clean install of 6.2-STABLE on a new Compaq Evo D310 I acquired. I configured options in /etc/rc.conf and did a reboot to check the boot sequence. When the power came back on, there was no bios screen, nothing; just a blank screen.
After searching for a while, I found this article. I pulled the NVIDIA FX5700 video card the previous owner had installed and switched to the integrated card. Then it magically booted as normal. I find it distressing that FreeBSD did not handle this more gracefully. This is my time with FreeBSD, but I am a long time Linux user. Luckily, I want a headless box, so the video card does not matter to me.
I’ve seen this happen before. I had it happen on 2 different boxes at 2 different times.
For my boxes, the problem was something so simple that I kicked myself…the ribbon cables were bad. For whatever reason, it would hang on a reboot, and give me an error message right after the bios kicks in, as if it couldn’t read the mbr, or saw a change in the cylinder sizes or how the bios is calculating the cylinders.
Simply replacing the ribbon cables solved the problem both times.
Hope this helps!
[%sig%]