mknod – create the device, then mount
My primary mail server went down on 1 January. In the process of analyzing
the problem, I leaned about a new tool: mknod. This article documents how
I used that tool, a live filesystem CD, and a floppy disk to look at the
disk of the dead box.
Happy New Years!
I first noticed a problem on New Years day. I couldn’t ssh into the box. Nor was it accepting
email. Attempts to connect were met with:
$ ssh m20
Password:
Last login: Thu Jan 1 11:38:35 2004 from betty
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
The Regents of the University of California. All rights reserved.-bash: /etc/profile: Device not configured
Connection to m20.example.org closed.
$
smtp was also sick:
$ telnet m20 25
Trying 10.0.0.1...
Connected to m20.example.org.
Escape character is '^]'.
220 m20.example.org ESMTP Postfix
helo bast.example.org
250 m20.example.org
mail from: dan@example.org
250 Ok
rcpt to:eric@example.com
250 Ok
data
354 End data with <CR><LF>.<CR><LF>
test msg via m20
.
451 Error: queue file write error
quit
221 Bye
Connection closed by foreign host.
Being a holiday, I wasn’t able to get access to the collocation facility. It wasn’t
until January 4th that I was able to get there.
Take a camera!
As I was driving to the collocation facility, I remembered my camera. I
thought about turning around to collect it, but didn’t. Bad idea. I’ve
lost useful information because of that decision. The console contained
messages which might have been useful. Next time, I hope I remember.
What I do remember is messages about see tuning(7).
That’s it. Nothing else. If I’d had a camera, I would have taken a
picture and we’d both be able to learn something from it. What
a silly mistake.
I hit enter once, and that started a stream of messages far too rapid to
read. CONTROL-S didn’t halt it, nor did SCROLL-LOCK. I tried another
virtual console. I got a login prompt. But as soon as I touched a key,
the tty died with the following message:
/: create / symlink failed, no inodes free
This happened with each virtual console I tried.
I went back to the main console to look closely at the scrolling messages.
I could read nothing. I pressed the power switch, and that stopped the messages
for a short time, before they started again. I was able to read something like this:
vm_fault_pager read error pid 1 init
So… it looks like init was having problems. This was a sick system. I rebooted
the box.
The first reboot
The first reboot did nothing. It could not find the disk drive. I went into
the BIOS setup and found that nothing was listed for the primary drive.
Auto-detection found nothing. I had no choice but to take the system home
with me.
booting at home
At home, I wanted to examine the system before booting it up in case I lost
anything by writing to the drive. I booted up from a CD I had, but couldn’t mount
any drives. I also had a 4.7-RELEASE from
FreeBSD Mall. Disk 2 contains a live
filesystem, which you can boot from and obtain a working FreeBSD system with
very little effort. I booted, and tried to mount my disk.
dmesg(8)
showed that the disk (ad0) was found. But I could not mount it because
/dev/ad0s1e
did not exist, but
/dev/ad0s1
did.
/dev/MAKEDEV
was not present on this live filesystem.
I was talking out loud about this in an IRC channel, when Anton Berezin had this
great idea:
mkdir -p /tmp/dev
cd /tmp/dev
/sbin/mknod ad0s1e c 116 0x00020000 root:operator
I tried it, but ran into a problem. This live filesystem CD did not have
mknod(8)
Another great idea from Anton: no mknod, no device. copy mknod to a floppy 🙂
Remember: The 4.9-RELEASE live filesystem ISO image contains
mknod
. I wouldn’t
have needed the floppy if I’d have that ISO just sitting around ready to
go. I now have a CD ready to go….
Floppy basics
I went back to my documentation on floppies. I fetched a fresh
floppy from a box and did this:
fdformat /dev/rfd0
disklabel -w -r /dev/rfd0 fd1440
newfs /dev/rfd0
mount /dev/fd0 /mnt
cp /sbin/mknod /mnt
umount /mnt
That gives me a floppy with mknod
. From the live
filesystem machine, I mounted the floppy and copied the file to /tmp for
future use.
Trying mknod again
Then I tried the original command again:
/tmp/mknod ad0s1e c 116 0x00020000 root:operator
Now I had an error about no such group. There was no
/etc/group
file in this machine. Not to worry.
You can use the numbers instead of the names.
/tmp/mknod ad0s1e c 116 0x00020000 0:0
This translates to root:wheel. Check /etc/passwd
and /etc/group
and you’ll see why.
This worked. I then mounted that new device:
mount -r /tmp/dev/ad0s1e /mnt
That was was it. I had my drive mounted. I check around, found nothing unusual.
I then repeated the procedure for each slice on my drive.
/tmp/mknod ad0s1a c 116 0x00020000 0:0
/tmp/mknod ad0s1f c 116 0x00020000 0:0
/tmp/mknod ad0s1g c 116 0x00020000 0:0
A brief explanation:
- The c means a character type devices.
- 116 is the major number for this type of device, as found from
/dev/MAKEDEV
. - 0x00020000 is a bitmask. You can see that here:
crw-r----- 2 root operator 116, 0x00020000 Aug 15 16:44 /dev/ad0s1a
crw-r----- 2 root operator 116, 0x00020001 Aug 15 16:44 /dev/ad0s1b
crw-r----- 2 root operator 116, 0x00020002 Aug 15 16:45 /dev/ad0s1c
crw-r----- 2 root operator 116, 0x00020003 Aug 15 16:45 /dev/ad0s1d
crw-r----- 2 root operator 116, 0x00020004 Aug 15 16:45 /dev/ad0s1e
crw-r----- 2 root operator 116, 0x00020005 Aug 15 16:45 /dev/ad0s1f
crw-r----- 2 root operator 116, 0x00020006 Aug 15 16:45 /dev/ad0s1g
crw-r----- 2 root operator 116, 0x00020007 Aug 15 16:45 /dev/ad0s1h
This information was obtained from a working system…. Hopefully you’ll have one
somewhere that you can access.
For some reason I was unable to mount more than one slice at a time. I kept
getting a “device busy” message.
But I was able to examine the drive and find nothing obviously wrong. I then
booted the system into single user mode by pressing the space bar during
the boot count down, and then issued boot -s
.
For a bit more information about single user mode, please read this
this FAQ.
When I booted into single user mode, I had to run
fsck
in order to clean the file systems. They were marked as dirty because of
reboot. They would be marked clean if I had done a proper shutdown, which
was not possible.
fsck /dev/ad0s1a
fsck /dev/ad0s1f
fsck /dev/ad0s1g
fsck /dev/ad0s1e
Kids, don’t try this at home!
I don’t plan to use this every day. In fact, I hope never to have to do it again.
But it is nice to know how when you need to do it. This will help.