Jan 052004

mknod – create the device, then mount

My primary mail server went down on 1 January. In the process of analyzing the problem, I leaned about a new tool: mknod. This article documents how I used that tool, a live filesystem CD, and a floppy disk to look at the disk of the dead box.

Happy New Years!

I first noticed a problem on New Years day. I couldn’t ssh into the box. Nor was it accepting email. Attempts to connect were met with:

$ ssh m20
Last login: Thu Jan 1 11:38:35 2004 from betty
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
The Regents of the University of California. All rights reserved.

-bash: /etc/profile: Device not configured
Connection to m20.example.org closed.
smtp was also sick:
$ telnet m20 25
Connected to m20.example.org.
Escape character is '^]'.
220 m20.example.org ESMTP Postfix
helo bast.example.org
250 m20.example.org
mail from: dan@example.org
250 Ok
rcpt to:eric@example.com
250 Ok
354 End data with <CR><LF>.<CR><LF>
test msg via m20
451 Error: queue file write error
221 Bye
Connection closed by foreign host.

Being a holiday, I wasn’t able to get access to the collocation facility. It wasn’t until January 4th that I was able to get there.

Take a camera!

As I was driving to the collocation facility, I remembered my camera. I thought about turning around to collect it, but didn’t. Bad idea. I’ve lost useful information because of that decision. The console contained messages which might have been useful. Next time, I hope I remember.

What I do remember is messages about see tuning(7). That’s it. Nothing else. If I’d had a camera, I would have taken a picture and we’d both be able to learn something from it. What a silly mistake.

I hit enter once, and that started a stream of messages far too rapid to read. CONTROL-S didn’t halt it, nor did SCROLL-LOCK. I tried another virtual console. I got a login prompt. But as soon as I touched a key, the tty died with the following message:

/: create / symlink failed, no inodes free

This happened with each virtual console I tried.

I went back to the main console to look closely at the scrolling messages. I could read nothing. I pressed the power switch, and that stopped the messages for a short time, before they started again. I was able to read something like this:

vm_fault_pager read error pid 1 init

So… it looks like init was having problems. This was a sick system. I rebooted the box.

The first reboot

The first reboot did nothing. It could not find the disk drive. I went into the BIOS setup and found that nothing was listed for the primary drive. Auto-detection found nothing. I had no choice but to take the system home with me.

booting at home

At home, I wanted to examine the system before booting it up in case I lost anything by writing to the drive. I booted up from a CD I had, but couldn’t mount any drives. I also had a 4.7-RELEASE from FreeBSD Mall. Disk 2 contains a live filesystem, which you can boot from and obtain a working FreeBSD system with very little effort. I booted, and tried to mount my disk. dmesg(8) showed that the disk (ad0) was found. But I could not mount it because /dev/ad0s1e did not exist, but /dev/ad0s1 did. /dev/MAKEDEV was not present on this live filesystem.

I was talking out loud about this in an IRC channel, when Anton Berezin had this great idea:

mkdir -p /tmp/dev
cd /tmp/dev
/sbin/mknod ad0s1e c 116 0x00020000 root:operator

I tried it, but ran into a problem. This live filesystem CD did not have mknod(8)

Another great idea from Anton: no mknod, no device. copy mknod to a floppy 🙂

Remember: The 4.9-RELEASE live filesystem ISO image contains mknod. I wouldn’t have needed the floppy if I’d have that ISO just sitting around ready to go. I now have a CD ready to go….

Floppy basics

I went back to my documentation on floppies. I fetched a fresh floppy from a box and did this:
fdformat /dev/rfd0
disklabel -w -r /dev/rfd0 fd1440
newfs /dev/rfd0
mount /dev/fd0 /mnt
cp /sbin/mknod /mnt
umount /mnt
That gives me a floppy with mknod. From the live filesystem machine, I mounted the floppy and copied the file to /tmp for future use.

Trying mknod again

Then I tried the original command again:
/tmp/mknod ad0s1e c 116 0x00020000 root:operator
Now I had an error about no such group. There was no /etc/group file in this machine. Not to worry. You can use the numbers instead of the names.
/tmp/mknod ad0s1e c 116 0x00020000 0:0
This translates to root:wheel. Check /etc/passwd and /etc/group and you’ll see why.

This worked. I then mounted that new device:

mount -r /tmp/dev/ad0s1e /mnt
That was was it. I had my drive mounted. I check around, found nothing unusual. I then repeated the procedure for each slice on my drive.
/tmp/mknod ad0s1a c 116 0x00020000 0:0
/tmp/mknod ad0s1f c 116 0x00020000 0:0
/tmp/mknod ad0s1g c 116 0x00020000 0:0

A brief explanation:

  • The c means a character type devices.
  • 116 is the major number for this type of device, as found from /dev/MAKEDEV.
  • 0x00020000 is a bitmask. You can see that here:
    crw-r----- 2 root operator 116, 0x00020000 Aug 15 16:44 /dev/ad0s1a
    crw-r----- 2 root operator 116, 0x00020001 Aug 15 16:44 /dev/ad0s1b
    crw-r----- 2 root operator 116, 0x00020002 Aug 15 16:45 /dev/ad0s1c
    crw-r----- 2 root operator 116, 0x00020003 Aug 15 16:45 /dev/ad0s1d
    crw-r----- 2 root operator 116, 0x00020004 Aug 15 16:45 /dev/ad0s1e
    crw-r----- 2 root operator 116, 0x00020005 Aug 15 16:45 /dev/ad0s1f
    crw-r----- 2 root operator 116, 0x00020006 Aug 15 16:45 /dev/ad0s1g
    crw-r----- 2 root operator 116, 0x00020007 Aug 15 16:45 /dev/ad0s1h
This information was obtained from a working system…. Hopefully you’ll have one somewhere that you can access.

For some reason I was unable to mount more than one slice at a time. I kept getting a “device busy” message.

But I was able to examine the drive and find nothing obviously wrong. I then booted the system into single user mode by pressing the space bar during the boot count down, and then issued boot -s. For a bit more information about single user mode, please read this this FAQ.

When I booted into single user mode, I had to run fsck in order to clean the file systems. They were marked as dirty because of reboot. They would be marked clean if I had done a proper shutdown, which was not possible.

fsck /dev/ad0s1a
fsck /dev/ad0s1f
fsck /dev/ad0s1g
fsck /dev/ad0s1e

Kids, don’t try this at home!

I don’t plan to use this every day. In fact, I hope never to have to do it again. But it is nice to know how when you need to do it. This will help.