Feb 272006
How I test tapes and tape drives
I’ve written a number of articles about backups and about Bacula. With anything, you must test it to ensure it works properly. Do you test your tape drives? Do you test your tapes? Do you test the backup process by also doing a restore? In this article, I’ll show you a script that pulls statistics from my DLT drive, and a script I use for testing tapes in a tape library. Why? If you just bought a tape drive, you want to know that it works. If you just took delivery of 10 used DLT tapes you want to know they work. You also want to know that if you’re doing a backup, you can also do a restore.A practical example
When I first started using Bacula, I had a DDS drive with a 4-tape magazine. It worked well. I started testing Bacula to make sure it did what I expected. The first thing I did was test backups that spanned two or more tapes. I created a large file, slightly larger than would fit on one tape. Then I told Bacula to back it up. The backup worked fine. Then I restored. That’s when I found the problem. The restore file was smaller than the original file. This led us to find a bug in FreeBSD (since fixed) related to pthreads. The moral: always test. Statistics from the tape drive I am a fan of DLT drives. I’ve been using them for over a year. What is DLT? Ask Google about Digital Linear Tape. DLT has been around for quite some time. It is quite robust. The tape drive mechanism is what impressed me. The recording surface is touched only by the recording head. This results in very little tape wear. When I first started to use DLT, I hooked the up into a SCSI chain and wanted to know how much throughput I should expect. I was asking questions on the FreeBSD SCSI mailing list when someone sent me a very interesting script. It seems there are many factors that can affect throughput, two of which are tape quality and drive quality. If there are many errors, the drive must stop and start, moving the tape back and forth until it gets the right results. This can dramatically affect throughput. What you want to hear when writing is a constant feed of tape going through the drive. You don’t want to hear stop, rewind, start, repeat. Ideally, you only hear a stop/start when you get to the end of the tape. The script in question allows me to query the tape drive and obtain a number of interesting statistics, one of which is “corrected errors”. Here is some sample output from my production tape drive:In the above you can see the tape drive is correcting 20 errors for every GB of data written to the tape. This is a relatively good value. Other drives I’ve tested were getting upwards of 4000 corrected errors. I use this script as a relative indicator of good versus poor tapes and drives. I’ve found that then same tape used on two different drives can give very different error rates. The script This script was written for FreeBSD and makes use of camcontrol(8). I’m also quite sure that it interrogates the DLT drive using commands specific to DLT. Therefore, I doubt this script will work on non-FreeBSD systems and with non-DLT drives. I’m quite sure a similar script would be written for other operating systems. If you know of such a script, please let me know and I’ll happily include it here. You can download the script here. Use at your own risk, may cause cancer, your milage may vary, etc. Testing tapes When all I had were single tape drives, I would pop in tape, tar up some data, and check the stats. Now that I have a tape library, I can automate the changing of tapes and test 10 tapes at a time. This is a great time saver. This script dumps a bunch of files onto each tape, extracts the stats, and logs them. Then it does the same thing to the next tape. No read verification is done. All I’m interested in is corrected errors. You can get the script here. The output looks like this:# ~/bin/dlt sa0 The tape is 'sa0' READING Corrected errors with substantial delay: 0 Corrected errors with possible delay : 0 Total errors : 73 Total errors corrected : 73 Total times correction algorithm used : 0 Total bytes processed : 1955930720 Total corrected errors / GB : 40 Total uncorrected errors : 0 Read compression ratio : 191% On tape Mbytes read : 5 On tape kbytes read residual : 971386 WRITING Corrected errors with substantial delay: 0 Corrected errors with possible delay : 0 Total errors : 147 Total errors corrected : 147 Total times correction algorithm used : 0 Total bytes processed : 7736171760 Total corrected errors / GB : 20 Total uncorrected errors : 0 Write compression ratio : 228% Host requested Mbytes written : 13342 Host requested kbytes written residual : 487424 On tape Mbytes written : 5852 On tape kbytes written residual : 0
The information logged in /var/log/messages is:
By running the same tests multiple times, and perhaps on different drives, you can compare the results. For example, here’s the results for one of my tapes. I have appreviated the results so they fit better in your broswer. In this case,
JYN260
is the barcode on the tape.
For this test, I had the following tapes in the magazine:# grep JYN260 /var/log/messages Feb 21 10:12:16 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 69 - uncorrected errors : 0 Feb 21 15:12:55 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 58 - uncorrected errors : 0 Feb 21 18:19:11 TapeTesting: /dev/ch1 : JYN260S2 - corrected errors / GB : 23 - uncorrected errors : 0 Feb 21 21:34:45 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 36 - uncorrected errors : 0 Feb 21 23:39:33 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 32 - uncorrected errors : 0 Feb 22 10:14:17 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 23 - uncorrected errors : 0 Feb 22 12:03:11 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 26 - uncorrected errors : 0 Feb 22 13:31:39 TapeTesting: /dev/ch1 : JYN260S2 - corrected errors / GB : 12 - uncorrected errors : 0 Feb 22 16:15:47 TapeTesting: /dev/ch0 : JYN260S2 - corrected errors / GB : 26 - uncorrected errors : 0 Feb 24 11:00:03 TapeTesting: /dev/ch1 : JYN260S2 - corrected errors / GB : 12 - uncorrected errors : 0
Here is what the script looks like when you run it:# ~dan/rc-chio-changer /dev/ch1 list 2:JYN257S2 3:JYN249S2 4:001320 5:JYN265S2
# ./tape-testing.sh ch1 sa1 loading 2 tar: Removing leading '/' from member names unloading 2 loading 3 tar: Removing leading '/' from member names unloading 3 loading 4 tar: Removing leading '/' from member names unloading 4 loading 5 tar: Removing leading '/' from member names unloading 5
where I want to test changer ch1
and drive sa1
.
In this case, I’m testing the changer identified by /dev/ch1
.
Looking in /var/log/messages
we can see these results:
As you can see, JYN257S2 has the fewest errors. I repeated this test several times for each tape, trying to write a large amount of data to each tape, so I could get a code impression of quality. I actually didn’t find very many tapes that had over 50 corrected errors per GB. I also found that the error rate would decrease the more times I wrote to the tape. Testing drives If you use the same tapes on different drives, you can then get a idea of the relative quality of each drive. For example, I could get corrected error rates of 6 or 8 / GB on one drive, but the same tapes on another drive gives 600+ corrected errors per drive. That is clearly an issue with the drive, not the tape. Things that can affect throughput This isn’t strictly related to tape testing, but it based upone some recent observations. The following is just a set of items which I think will affect throughput (i.e. KB/s) when backing up data.Feb 26 11:07:11 TapeTesting: /dev/ch1 : JYN257S2 - corrected errors / GB : 3 - uncorrected errors : 0 Feb 26 11:29:32 TapeTesting: /dev/ch1 : JYN249S2 - corrected errors / GB : 35 - uncorrected errors : 0 Feb 26 11:51:01 TapeTesting: /dev/ch1 : 001320 - corrected errors / GB : 9 - uncorrected errors : 0 Feb 26 12:13:29 TapeTesting: /dev/ch1 : JYN265S2 - corrected errors / GB : 95 - uncorrected errors : 0