file system full
This is not a message you want to see very often:
set /kernel: pid 801 (mysqld), uid 88 on /var: file system full
Hmmm, mysqld ran out of space on /var.
That’s not nice.
Over a 36 hour period, the FreshPorts webserver
ran out of space on /var four times. This effectively ground the server
to a halt. Incoming http requests could not be answered and the database queries
were failing. To say the least, I was not amused. Once is annoying, twice is
frustrating, etc.
This little article documents that incident and shows the solution I came up with.
Ouch, FreshPorts is down
On Tuesday morning, I found that my FreshPorts
webserver was down. Well, actually, the box was up and running. But the main
page on the website was blank. And a message was repeating on the console quite
frequently:
/var: file system full
I also noticed that logcheck was giving me
messages such as this:
set /kernel: pid 801 (mysqld), uid 88 on /var: file system full set sendmail[89053]: NOQUEUE: low on space (have 89, SMTP-DAEMON needs 101 in /var/spool/mqueue)
Hmmm, /var? Why is that full?
I had a look around /var, first checking /var/log. Nothing
unusual there. Then I did this:
# cd /var # ls -lt | head
The resulting listing had /var/tmp at the top of the list. That said
to me that /var/tmp was the last directory written to. So I inspected
that directory:
# cd /var/tmp # ls SQL1b07_0.ISD SQL1b0d_0.ISM SQL1b11_0.ISD SQL1b15_0.ISM SQL1b07_0.ISM SQL1b0f_0.ISD SQL1b11_0.ISM SQL1b0d_0.ISD SQL1b0f_0.ISM SQL1b15_0.ISD
That’s odd. Why are those files there? By this time, I was convinced it was
mysqld which was having trouble. But I didn’t know why. I knew I had
to free up space. So I stopped apache and then mysqld:
# /usr/local/sbin/apachectl stop # mysqladmin -u root -p shutdown Enter password:
After that, the /var/tmp files, mentioned above, disappeared. Good.
That’s that fixed. Famous last words.
I restarted mysqld and then apache:
# /usr/local/etc/rc.d/mysqld.sh # /usr/local/sbin/apachectl start
I noted that /var utilization was still down at 15%. Good.
What is /var anyway?
/var is where your log files, spool files, accounting
information, some databases (/var/db) including the FreeBSD packages database (/var/db/pkg)
are kept. It’s used for files that can grow quickly. Unlike /usr,
which can be pretty much the same from host to host, /var typically contains
information which is specific to that host and that host alone. You’ll find that apache
puts it’s log files there by default. As does sendmail and many other
programs.
Have a look at /var/log and you’ll also find the following files:
- messages – system messages
- maillog – mail log
As I mentioned above, your apache logs are also stored in /var/log,
but I’ve changed that default. My apache logs are stored on another volume, for
convenience.
Also note, log files can be rotated; see man newsyslog and Apache – rotating log files.
What are the implications if /var fills up? It means the system was
unable to do anything. It had no space to write logs, no where to spool incoming
mail. Effectively, it was halted.
So what happened to cause this?
I had no idea why the space was filling up. Given that the webserver had been
running uninterrupted for several months, I suspected malice on the part of some
unsociable animal. That’s a natural reaction when the system suddenly starts
behaving abnormally. But I checked the logs and found nothing unusual which would
indicate any sort of attack. I was quite mystified as to the cause of the problem.
History never repeats
/var went to 100% about 6 hours later. I took the same
recovery steps and checked the logs. Nothing obvious. About 12 hours later, /var
filled up again. Nothing in the logs. Restarting mysqld released the space.
About 20
minutes later, I noticed /var was back up to 50%. I checked the logs to
see what was happening. Again, nothing jumped out at me. I shoved the probable
cause of this problem into the background and went on to answer email.
Eventually, the more complex database queries came to mind. I started playing
with some of the web pages while keeping an eye on /var/tmp. Eventually I
found a web page which would caused files to be created in /var/tmp. One
of my those queries quickly took /var to 50%. I started to wonder what
would happen if a couple of these queries were launched concurrently or coincided with an
incoming port commit. I’ll bet that would be enough to fill up /var/tmp.
There and then I decided I needed more /var.
Getting more space
My existing /var looked like this:
Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1e 19815 2778 15452 15% /var
That’s not a great deal of space. Only about 15MB free. So I decided to give it
some more space. Luckily, I had lots of spare disk, some of which was extremely
underutilized. The plan was to change the mount point for /var to another
drive. This must be done with care as a running system actually needs and uses /var
dynamically. You shouldn’t just umount /var and mount it somewhere else.
Here are my before and after images of /etc/fstab:
/dev/ad0s1e /var ufs rw 2 2 /dev/da1s1 /usr3 ufs rw 2 2
I changed this to:
/dev/ad0s1e /var-old ufs rw 2 2 /dev/da1s1 /var ufs rw 2 2
This means that the new /var will be where /usr3 once was. The
original /var will available under /var-old, just in case I need to
view it.
But I did not reboot, nor did I mount or umount anything yet.
The above could have been done in single user mode, but I decided to do it from my
nice GUI client where I could easily cut and paste.
These changes ensure that on the next reboot, the system will mount the proper volumes
in the right places.
In the next step, we’ll get the old /var over to the new location. And
we’ll do that from single user mode.
Moving /var via single user mode
In single user mode, you are the only user. It’s much safer to make
critical changes in single user mode. That’s why I dropped to single user mode:
shutdown now
to do this:
# cp -Rp /var /usr3
I did a copy because I wanted to keep the original files in the old location, just in
case. I could have done a mv instead of a cp, but I chose not
to. The above command copied everything from the existing /var to the new
location. Then I umounted the existing mount points:
# umount /var # umount /usr3
and umounted the new ones:
# mount /var # mount /var-old
I verified that the mounts had succeeded by viewing the output of mount.
This showed that the /dev/da1s1 was mounted as /var and that /dev/ad0s1e
was mounted as /var-old. Which is exactly what I needed.
Back to multi-user mode
Then I left single user mode and went back to multi-user mode:
# exit
An exit (a CONTROL-D will also work) from single user mode will take you back to
multi-user mode.
Side note: the next day, I checked my uptime:
# uptime 10:32PM up 33 days, 12 mins, 2 users, load averages: 0.18, 0.20, 0.15
The moral: dropping to single user mode does not affect your uptime.
The results
It’s now been just over 24 hours since I created the new /var,
giving it 2GB of space (that’s really overkill and is far far more than it will never
need, but it was a quick and easy solution). Eventually, I’ll partition that
disk and give /var about 100MB. But that’s another article.
A side
effect is that the box seems to be a faster. Mind you, this may just be my perception and
there has been no actual improvement. But, theoretically, by putting /var on
its own SCSI disk, speed can be increased. One disk can be writing data to /var
while the other is reading the database files on /usr.
My theory is that I’ve always had those SQL files created in /var/tmp, but
they’ve never gotten to such a large size. Perhaps as more and more new ports have been
added to the FreshPorts database, a larger temp file
is needed for particular types of updates. And it is merely coincidence that these /var
peaks have just started reaching the capacity of the disk.
As always, should you or any of your IM force be caught or killed with more information
about the above, your comments will be appreciated.
Other ideas
Shortly after writing this article, several people wrote in regarding /var/tmp.
They
all mentioned setting TMPDIR to a different location, say /usr4/tmp and
restarting mysqld.
Today I ran into a "file system full" error on FreeBSD myself. I ran "df -k" and noticed that a partition was 106 percent full. I clean out some some files and brought it down to under 100%, and things seemed to work fine again. Where was that extra 6% hanging out, though? Was data being lost, or was it being safely stored and restored from elsewhere in the system?
Part of the filesystem is reserved for root when the filesystem is full, i think it’s around 10%.. i get this alot on my / filesystem now 🙂
Have you checked the <A HREF="http://www.freebsd.org/">FAQ</A>? If you find it, please post it here.
Default newfs settings reserve 10% for root-only space to recover
in case filesystem starts filling up.
You can live safely with ~99,99% of filesystem being really used
by user data only if it is static filesystem – archives without updates. Random I/O against such filesystem will be abnormaly slow.
Keep at least 20% free on heavily accessed drives.
Keeping / and/or /var starving at 100% full means losing performance.
I’m starting to have this problem both on my / and /var filesystems. But I do have a lot of space on my /usr filesystem
as I’m a newbie does any body ahve a fix for this or a command to show what files are under what filesystem so I can change them to the /usr with a symbolic link or what files are safe to rm from the filesystems
thanks for the help in advance
chris
Read the article. Read it slowly. Understand what is being said. I can’t add much more than what is said there. Does your problem fit into what is demonstrated there?
Do you really need more space on /var or are you not rotating logfiles?
As for /, what is taking up the space there?
FYI
‘cp -Rp /var /usr3’ should be ‘cp -Rp /var/ /usr3’
What is your reasoning for this?