ZFS: do not give it all your HDD
I’m about to rebuild my ZFS array
(which I documented in my other diary). The array has been running
for a while, but I recently learned some new facts about ZFS which spurred me on to rebuilding my array
with future-proofing in mind.
This is my plan
for tonight. As I type this, Jerry is
over tonight, doing the heavy lifting for me. I am nursing a broken left elbow. The two new HDD have
been installed and the system has been powered back up.
Tonight we will do the following:
- identify the newly installed HDD
- put a file system on those HDD
- copy the existing ZFS array over to that new FS
- destroy the existing ZFS array
- partition each individual drive using gpart
- add the drives back into the array
- copy the data back
- partition the two new FS and put them into the new array
This article originally covered all of the above steps. That soon led to a multi-day 3000 line document.
I thought it best to break it it up into a few smaller articles. To this end, this article will cover only
the part about partitioning HDD so as to avoid future problems at preplacement time.
Don’t use all your HDD
In this section, let’s assume you are building a new ZFS array. I will talk about how I like to
partition my HDD with a little buffer zone, and why.
Let’s assumed ada0 and ada6 are the drives you want to use. This is the list
of ada devices from dmesg:
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C) ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C)
As you can see, each of these devices is contains 3907029168 sectors, each containing 512 bytes. For a total
of 1863 GB, on a 2TB HDD. However, not all 2TB HDD contain this same space. Even the same model of HDD can
vary. If you are using ZFS, you should be aware of the following from man zpool:
zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attach- ing new_device, waiting for it to resilver, and then detaching old_device. The size of new_device must be greater than or equal to the minimum size of all the devices in a mirror or raidz configuration.
Thus, if your replacement HDD is just 1 sector smaller than the original, you cannot use it.
But there is a cunning plan. Partition the HDD and give only the partition to ZFS. Now, this isn’t useful
to you in hindsight if your array is broken now. This strategy is only useful when setting up a new array.
The idea is to use slightly less than your entire HDD. Thus, if a replacement HDD happens to be smaller,
you’re covered.
Using gpart
There are another approaches to this, but I’m using gpart.
# gpart create -s GPT ad1 gpart: provider 'ad1': Invalid argument
Oh. Yes, wrong name. Let’s try this:
# gpart create -s GPT ada0 #
Now let’s see what we have:
# gpart show ada0 => 34 3907029101 ada1 GPT (1.8T) 34 3907029101 - free - (1.8T) #
From the above, we can see one partition of 3907029101 sectors, starting
at sector 34.
Each sector is 512 bytes as can be seen here (in bold):
# camcontrol identify ada0 pass0: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-8 SATA 2.x device model Hitachi HDS722020ALA330 firmware revision JKAOA28A serial number JK1131YAHLJWLV WWN 5000cca221d68596 cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 268435455 sectors LBA48 supported 3907029168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM 7200 Feature Support Enable Value Vendor read ahead yes yes write cache yes yes flush cache yes yes overlap no Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes no 254/0xFE 128/0x80 media status notification no no power-up in Standby yes no write-read-verify no no 0/0x0 unload no no free-fall no no data set management (TRIM) no
I plan to leave 200MB free at the end of each HDD. Thus, the gpart commend to
add a new partition is:
gpart add -b 2048 -s 3906824301 -t freebsd-zfs -l disk00 ada0
Please note that the above math is incorrect, but only slightly. It leaves some 99MB free, which is
completely acceptable for this effort. The correct math is:
gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk00 ada0
where:
- -b 2048 – starts the partition 2048 sectors in from the start of the disk (leaving 1MB free)
- the start is also on a 4KB boundary, which will give better performance on some HDD
- -s 3906824301 leaves us 200MB free at the end of the HDD (note incorrect math).
- -l disk00 creates a label which you can use when adding this device to the pool
Creating the pool
Let’s assume we did the above with 5HDD. the command to create the new pool is:
# zpool create -f storage raidz2 gpt/disk00 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 # zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk00 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 errors: No known data errors
There, done.
When it comes time to replace one of the above devices, let’s say gpt/disk02, you do this:
# zpool offline storage gpt/disk02
Then you remove that HDD from the system, and insert the new HDD. You partition the new HDD just like you did above,
adjusting the math, and you instantly have a new partition exactly the same size as all the others.
Now add that disk back in:
# zpool replace storage gpt/disk02
Done. Let the array resilver, and you’re good to go. Hopefully, this approach will save us both
from headaches in the future.
Great article
but i think there is a minor typo.
gpart add -b 2048 -s 3906824301 -t freebsd-zfs -l disk00 ad0
Should these commands read gpart add -b 2048 -s 3906824301 -t freebsd-zfs -l disk00 ada0
change ad0 to ada0
Another question.
Why not label the disk with glabel.
And best wishes with the arm…..
regards,
Johan
[%sig%]
Post Edited (02-08-10 11:16)
Johanh wrote:
> Great article
> but i think there is a minor typo.
> gpart add -b 2048 -s 3906824301 -t freebsd-zfs -l disk00 ad0
>
> Should these commands read gpart add -b 2048 -s 3906824301 -t
> freebsd-zfs -l disk00 ada0
>
> change ad0 to ada0
ad0 changed to ada0
> Another question.
> Why not label the disk with glabel.
I seem to recall that coming up… I think gpart overwrites what glabel does. A test would confirm, but I have no spare HDD. I guess I could use a file…
> And best wishes with the arm…..
Thanks. It hurts all the time.
—
The Man Behind The Curtain
Reasons not to use glabel:
<http://docs.freebsd.org/cgi/mid.cgi?db=irt&id=AANLkTili3-Bk_ZZgZSqgy_mFKDhHDk_ZFqjga7AjpuPY@mail.gmail.com>
<http://docs.freebsd.org/cgi/getmsg.cgi?fetch=503216+0+archive/2010/freebsd-stable/20100725.freebsd-stable>
—
The Man Behind The Curtain
[quote]
gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk00 ada0
where:
* -b 2048 – starts the partition 2048 sectors in from the start of the disk (leaving 1KB free)
[/quote]
Should be:
* -b 2048 – starts the partition 2048 sectors in from the start of the disk (leaving [b]1MB[/b] free)
Fixed. Thank you.
—
The Man Behind The Curtain
Put an 8-16GB swap partition at the end of each drive. Now you have a *usable* buffer space to guard against drive size changes.