mknod – create the device, then mount
My primary mail server went down on 1 January. In the process of analyzing the problem, I leaned about a new tool: mknod. This article documents how I used that tool, a live filesystem CD, and a floppy disk to look at the disk of the dead box.
Happy New Years!
I first noticed a problem on New Years day. I couldn’t ssh into the box. Nor was it accepting email. Attempts to connect were met with:
smtp was also sick:
$ ssh m20
Last login: Thu Jan 1 11:38:35 2004 from betty
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
The Regents of the University of California. All rights reserved.
-bash: /etc/profile: Device not configured
Connection to m20.example.org closed.
$ telnet m20 25
Connected to m20.example.org.
Escape character is '^]'.
220 m20.example.org ESMTP Postfix
mail from: firstname.lastname@example.org
354 End data with <CR><LF>.<CR><LF>
test msg via m20
451 Error: queue file write error
Connection closed by foreign host.
Being a holiday, I wasn’t able to get access to the collocation facility. It wasn’t until January 4th that I was able to get there.
Take a camera!
As I was driving to the collocation facility, I remembered my camera. I thought about turning around to collect it, but didn’t. Bad idea. I’ve lost useful information because of that decision. The console contained messages which might have been useful. Next time, I hope I remember.
What I do remember is messages about see tuning(7). That’s it. Nothing else. If I’d had a camera, I would have taken a picture and we’d both be able to learn something from it. What a silly mistake.
I hit enter once, and that started a stream of messages far too rapid to read. CONTROL-S didn’t halt it, nor did SCROLL-LOCK. I tried another virtual console. I got a login prompt. But as soon as I touched a key, the tty died with the following message:
/: create / symlink failed, no inodes free
This happened with each virtual console I tried.
I went back to the main console to look closely at the scrolling messages. I could read nothing. I pressed the power switch, and that stopped the messages for a short time, before they started again. I was able to read something like this:
vm_fault_pager read error pid 1 init
So… it looks like init was having problems. This was a sick system. I rebooted the box.
The first reboot
The first reboot did nothing. It could not find the disk drive. I went into the BIOS setup and found that nothing was listed for the primary drive. Auto-detection found nothing. I had no choice but to take the system home with me.
booting at home
At home, I wanted to examine the system before booting it up in case I lost
anything by writing to the drive. I booted up from a CD I had, but couldn’t mount
any drives. I also had a 4.7-RELEASE from
FreeBSD Mall. Disk 2 contains a live
filesystem, which you can boot from and obtain a working FreeBSD system with
very little effort. I booted, and tried to mount my disk.
showed that the disk (ad0) was found. But I could not mount it because
/dev/ad0s1e did not exist, but
/dev/MAKEDEV was not present on this live filesystem.
I was talking out loud about this in an IRC channel, when Anton Berezin had this great idea:
mkdir -p /tmp/dev
/sbin/mknod ad0s1e c 116 0x00020000 root:operator
I tried it, but ran into a problem. This live filesystem CD did not have mknod(8)
Another great idea from Anton: no mknod, no device. copy mknod to a floppy 🙂
Remember: The 4.9-RELEASE live filesystem ISO image contains
mknod. I wouldn’t
have needed the floppy if I’d have that ISO just sitting around ready to
go. I now have a CD ready to go….
Floppy basicsI went back to my documentation on floppies. I fetched a fresh floppy from a box and did this:
That gives me a floppy with
disklabel -w -r /dev/rfd0 fd1440
mount /dev/fd0 /mnt
cp /sbin/mknod /mnt
mknod. From the live filesystem machine, I mounted the floppy and copied the file to /tmp for future use.
Trying mknod againThen I tried the original command again:
Now I had an error about no such group. There was no
/tmp/mknod ad0s1e c 116 0x00020000 root:operator
/etc/groupfile in this machine. Not to worry. You can use the numbers instead of the names.
This translates to root:wheel. Check
/tmp/mknod ad0s1e c 116 0x00020000 0:0
/etc/groupand you’ll see why.
This worked. I then mounted that new device:
That was was it. I had my drive mounted. I check around, found nothing unusual. I then repeated the procedure for each slice on my drive.
mount -r /tmp/dev/ad0s1e /mnt
/tmp/mknod ad0s1a c 116 0x00020000 0:0
/tmp/mknod ad0s1f c 116 0x00020000 0:0
/tmp/mknod ad0s1g c 116 0x00020000 0:0
A brief explanation:
- The c means a character type devices.
- 116 is the major number for this type of device, as found from
- 0x00020000 is a bitmask. You can see that here:
crw-r----- 2 root operator 116, 0x00020000 Aug 15 16:44 /dev/ad0s1a
crw-r----- 2 root operator 116, 0x00020001 Aug 15 16:44 /dev/ad0s1b
crw-r----- 2 root operator 116, 0x00020002 Aug 15 16:45 /dev/ad0s1c
crw-r----- 2 root operator 116, 0x00020003 Aug 15 16:45 /dev/ad0s1d
crw-r----- 2 root operator 116, 0x00020004 Aug 15 16:45 /dev/ad0s1e
crw-r----- 2 root operator 116, 0x00020005 Aug 15 16:45 /dev/ad0s1f
crw-r----- 2 root operator 116, 0x00020006 Aug 15 16:45 /dev/ad0s1g
crw-r----- 2 root operator 116, 0x00020007 Aug 15 16:45 /dev/ad0s1h
For some reason I was unable to mount more than one slice at a time. I kept getting a “device busy” message.
But I was able to examine the drive and find nothing obviously wrong. I then
booted the system into single user mode by pressing the space bar during
the boot count down, and then issued
For a bit more information about single user mode, please read this
When I booted into single user mode, I had to run fsck in order to clean the file systems. They were marked as dirty because of reboot. They would be marked clean if I had done a proper shutdown, which was not possible.
Kids, don’t try this at home!
I don’t plan to use this every day. In fact, I hope never to have to do it again. But it is nice to know how when you need to do it. This will help.