ZFS: Resizing your zpool

ZFS: Resizing your zpool

I’m about to rebuild my ZFS array
(which I documented in my other diary). The array has been running
for a while, but I recently learned some new facts about ZFS which spurred me on to rebuilding my array
with future-proofing in mind.

This is my plan
for tonight. As I type this, Jerry is
over tonight, doing the heavy lifting for me. I am nursing a broken left elbow. The two new HDD have
been installed and the system has been powered back up.

Tonight we will do the following:

  • identify the newly installed HDD
  • put a file system on those HDD
  • copy the existing ZFS array over to that new FS (call this temp)
  • destory the existing ZFS array
  • parition each individual drive using gpart
  • add the drives back into the array
  • copy the data back
  • partition the two new FS and put them into the new array

My approach works because the existing data can fit on the two new HDD.

I have already covered how I’m going to use gpart to partition and label my HDD.
See ZFS: don’t give it all your HDD for details on that.

Identifying the new HDD

Jerry and I just inserted the two new hard drives and put the system back together and powered it up.
This is the full dmesg output after installing the new HDD:

Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-STABLE #0: Fri Mar  5 00:46:11 EST 2010
    dan@kraken.example.org:/usr/obj/usr/src/sys/KRAKEN amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Phenom(tm) II X4 945 Processor (3010.17-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f42  Stepping = 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 4113461248 (3922 MB)
ACPI APIC Table: <111909 APIC1708>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ACPI Warning: Optional field Pm2ControlBlock has zero address or length:        0       0/1 (20100121/tbfadt-655)
ioapic0 <Version 2.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <111909 RSDT1708> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of fee00000, 1000 (3) failed
acpi0: reservation of ffb80000, 80000 (3) failed
acpi0: reservation of fec10000, 20 (3) failed
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, dfe00000 (3) failed
ACPI HPET table warning: Sequence is non-zero (2)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0
pci8: <ACPI PCI bus> on pcib1
em0: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0xec00-0xec1f mem 0xfbfe0000-0xfbffffff,0xfbf00000-0xfbf7ffff,0xfbfdc000-0xfbfdffff irq 18 at device 0.0 on pci8
em0: Using MSIX interrupts
em0: [ITHREAD]
em0: [ITHREAD]
em0: [ITHREAD]
em0: Ethernet address: 00:1b:21:51:ab:2d
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 5.0 on pci0
pci6: <ACPI PCI bus> on pcib2
pcib3: <PCI-PCI bridge> irq 17 at device 0.0 on pci6
pci7: <PCI bus> on pcib3
siis0: <SiI3124 SATA controller> port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef0000-0xfbef7fff irq 17 at device 4.0 on pci7
siis0: [ITHREAD]
siisch0: <SIIS channel> at channel 0 on siis0
siisch0: [ITHREAD]
siisch1: <SIIS channel> at channel 1 on siis0
siisch1: [ITHREAD]
siisch2: <SIIS channel> at channel 2 on siis0
siisch2: [ITHREAD]
siisch3: <SIIS channel> at channel 3 on siis0
siisch3: [ITHREAD]
pcib4: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0
pci5: <ACPI PCI bus> on pcib4
re0: <RealTek 8168/8168B/8168C/8168CP/8168D/8168DP/8111B/8111C/8111CP/8111DP PCIe Gigabit Ethernet> port 0xc800-0xc8ff mem 0xfbdff000-0xfbdfffff irq 18 at device 0.0 on pci5
re0: Using 1 MSI messages
re0: Chip rev. 0x38000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211B media interface> PHY 1 on miibus0
rgephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
re0: Ethernet address: e0:cb:4e:42:f0:ff
re0: [FILTER]
pcib5: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
pci4: <ACPI PCI bus> on pcib5
fwohci0: <1394 Open Host Controller Interface> port 0xb800-0xb8ff mem 0xfbcff800-0xfbcfffff irq 19 at device 0.0 on pci4
fwohci0: [ITHREAD]
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:1e:8c:00:00:c4:3c:f9
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
dcons_crom0: <dcons configuration ROM> on firewire0
dcons_crom0: bus_addr 0x1574000
fwe0: <Ethernet over FireWire> on firewire0
if_fwe0: Fake Ethernet address: 02:1e:8c:c4:3c:f9
fwe0: Ethernet address: 02:1e:8c:c4:3c:f9
fwip0: <IP over FireWire> on firewire0
fwip0: Firewire address: 00:1e:8c:00:00:c4:3c:f9 @ 0xfffe00000000, S400, maxrec 2048
fwohci0: Initiate bus reset
fwohci0: fwohci_intr_core: BUS reset
fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1, CYCLEMASTER mode
pcib6: <ACPI PCI-PCI bridge> irq 19 at device 11.0 on pci0
pci2: <ACPI PCI bus> on pcib6
pcib7: <PCI-PCI bridge> irq 19 at device 0.0 on pci2
pci3: <PCI bus> on pcib7
siis1: <SiI3124 SATA controller> port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf0000-0xfbbf7fff irq 19 at device 4.0 on pci3
siis1: [ITHREAD]
siisch4: <SIIS channel> at channel 0 on siis1
siisch4: [ITHREAD]
siisch5: <SIIS channel> at channel 1 on siis1
siisch5: [ITHREAD]
siisch6: <SIIS channel> at channel 2 on siis1
siisch6: [ITHREAD]
siisch7: <SIIS channel> at channel 3 on siis1
siisch7: [ITHREAD]
ahci0: <ATI IXP700 AHCI SATA controller> port 0x8000-0x8007,0x7000-0x7003,0x6000-0x6007,0x5000-0x5003,0x4000-0x400f mem 0xfb3fe400-0xfb3fe7ff irq 22 at device 17.0 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich3: [ITHREAD]
ohci0: <OHCI (generic) USB controller> mem 0xfb3f6000-0xfb3f6fff irq 16 at device 18.0 on pci0
ohci0: [ITHREAD]
usbus0: <OHCI (generic) USB controller> on ohci0
ohci1: <OHCI (generic) USB controller> mem 0xfb3f7000-0xfb3f7fff irq 16 at device 18.1 on pci0
ohci1: [ITHREAD]
usbus1: <OHCI (generic) USB controller> on ohci1
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfb3fe800-0xfb3fe8ff irq 17 at device 18.2 on pci0
ehci0: [ITHREAD]
ehci0: AMD SB600/700 quirk applied
usbus2: EHCI version 1.0
usbus2: <EHCI (generic) USB 2.0 controller> on ehci0
ohci2: <OHCI (generic) USB controller> mem 0xfb3fc000-0xfb3fcfff irq 18 at device 19.0 on pci0
ohci2: [ITHREAD]
usbus3: <OHCI (generic) USB controller> on ohci2
ohci3: <OHCI (generic) USB controller> mem 0xfb3fd000-0xfb3fdfff irq 18 at device 19.1 on pci0
ohci3: [ITHREAD]
usbus4: <OHCI (generic) USB controller> on ohci3
ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfb3fec00-0xfb3fecff irq 19 at device 19.2 on pci0
ehci1: [ITHREAD]
ehci1: AMD SB600/700 quirk applied
usbus5: EHCI version 1.0
usbus5: <EHCI (generic) USB 2.0 controller> on ehci1
pci0: <serial bus, SMBus> at device 20.0 (no driver attached)
atapci0: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
pci0: <multimedia, HDA> at device 20.2 (no driver attached)
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib8: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci1: <ACPI PCI bus> on pcib8
vgapci0: <VGA-compatible display> mem 0xfb400000-0xfb7fffff,0xfbad0000-0xfbadffff irq 20 at device 5.0 on pci1
ahc0: <Adaptec 2944 Ultra SCSI adapter> port 0x9800-0x98ff mem 0xfbaff000-0xfbafffff irq 21 at device 6.0 on pci1
ahc0: [ITHREAD]
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
ohci4: <OHCI (generic) USB controller> mem 0xfb3ff000-0xfb3fffff irq 18 at device 20.5 on pci0
ohci4: [ITHREAD]
usbus6: <OHCI (generic) USB controller> on ohci4
acpi_button0: <Power Button> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc97ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
Timecounters tick every 1.000 msec
firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0)  (me) 
firewire0: bus manager 0 
(noperiph:siisch4:0:-1:-1): rescan already queued
(noperiph:siisch5:0:-1:-1): rescan already queued
(noperiph:siisch6:0:-1:-1): rescan already queued
(noperiph:siisch7:0:-1:-1): rescan already queued
(noperiph:siisch0:0:-1:-1): rescan already queued
(noperiph:siisch2:0:-1:-1): rescan already queued
(noperiph:siisch3:0:-1:-1): rescan already queued
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 480Mbps High Speed USB v2.0
usbus3: 12Mbps Full Speed USB v1.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 480Mbps High Speed USB v2.0
usbus6: 12Mbps Full Speed USB v1.0
ugen0.1: <ATI> at usbus0
uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <ATI> at usbus1
uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <ATI> at usbus2
uhub2: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
ugen3.1: <ATI> at usbus3
uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3
ugen4.1: <ATI> at usbus4
uhub4: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <ATI> at usbus5
uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
ugen6.1: <ATI> at usbus6
uhub6: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
uhub6: 2 ports with 2 removable, self powered
uhub0: 3 ports with 3 removable, self powered
uhub1: 3 ports with 3 removable, self powered
uhub3: 3 ports with 3 removable, self powered
uhub4: 3 ports with 3 removable, self powered
uhub2: 6 ports with 6 removable, self powered
uhub5: 6 ports with 6 removable, self powered
(probe0:ahc0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 
(probe0:ahc0:0:0:0): CAM status: SCSI Status Error
(probe0:ahc0:0:0:0): SCSI status: Check Condition
(probe0:ahc0:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(probe5:ahc0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 
(probe5:ahc0:0:5:0): CAM status: SCSI Status Error
(probe5:ahc0:0:5:0): SCSI status: Check Condition
(probe5:ahc0:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
ada0 at siisch0 bus 0 scbus0 target 0 lun 0
ada0: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1 at siisch2 bus 0 scbus2 target 0 lun 0
ada1: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2 at siisch3 bus 0 scbus3 target 0 lun 0
ada2: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3 at siisch4 bus 0 scbus4 target 0 lun 0
ada3: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x devicech0 at ahc0 bus 0 scbus12 target 0 lun 0
ch0: <DEC TL800    (C) DEC 0326> Removable Changer SCSI-2 device 
ch0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
ch0: 10 slots, 1 drive, 1 picker, 0 portals

ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4 at siisch5 bus 0 scbus5 target 0 lun 0
ada4: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5 at siisch6 bus 0 scbus6 target 0 lun 0
ada5: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada6 at siisch7 bus 0 scbus7 target 0 lun 0
ada6: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada7 at ahcich0 bus 0 scbus8 target 0 lun 0
ada7: <ST380815AS 4.AAB> ATA-7 SATA 2.x device
ada7: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada7: Command Queueing enabled
ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
ada8 at ahcich2 bus 0 scbus10 target 0 lun 0
ada8: <WDC WD1600AAJS-75M0A0 02.03E02> ATA-8 SATA 2.x device
ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada8: Command Queueing enabled
ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C)
sa0 at ahc0 bus 0 scbus12 target 5 lun 0
sa0: <DEC TZ89     (C) DEC 1837> Removable Sequential Access SCSI-2 device 
sa0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SMP: AP CPU #2 Launched!
cd0 at ahcich1 bus 0 scbus9 target 0 lun 0SMP: AP CPU #1 Launched!
cd0: 
<TSSTcorp CDDVDW SH-S223C SB01> Removable CD-ROM SCSI-0 device 
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)SMP: AP CPU #3 Launched!
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed

GEOM_MIRROR: Device mirror/gm0 launched (2/2).
GEOM: mirror/gm0s1: geometry does not match label (16h,63s != 255h,63s).
Trying to mount root from ufs:/dev/mirror/gm0s1a
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 3
ZFS storage pool version 14

That’s a good start, but it doesn’t tell which are the new drives.
However, let’s see what drives are already in use. From the following command,
I can get the list of HDD used in the existing array:

  pool: storage
 state: ONLINE
 scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	storage     ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0
	    ada3    ONLINE       0     0     0
	    ada4    ONLINE       0     0     0
	    ada5    ONLINE       0     0     0

errors: No known data errors

I happen to know my OS runs on a gmirror. This i my gmirror array, which I boot from:

      Name    Status  Components
mirror/gm0  COMPLETE  ada7
                      ada8

That leaves me with ada0 and ada6 as the new drives. If you grep for ada drives in the above dmsg output
you’ll find this is the list of ada devices from dmesg:

ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C)

This machine has two SATA controllers. We added one new HDD to each controller. One now has 4 drives,
the other has three.

Copying data to the new HDD

I am omitting the steps to partition, label, and mount the two new drives. And the
newfs. That’s not relevant to this tutorial, which is mostly about ZFS.
I now have the following.

# mount
/dev/mirror/gm0s1a on / (ufs, local, soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/mirror/gm0s1e on /tmp (ufs, local, soft-updates)
/dev/mirror/gm0s1f on /usr (ufs, local, soft-updates)
/dev/mirror/gm0s1d on /var (ufs, local, soft-updates)
storage on /storage (zfs, local)
/dev/ada0s1d on /new0 (ufs, local, soft-updates)
/dev/ada6s1d on /new6 (ufs, local, soft-updates)
#

Which translates to:

$ df -h
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0s1a    989M    494M    416M    54%    /
devfs                 1.0K    1.0K      0B   100%    /dev
/dev/mirror/gm0s1e    3.9G     70K    3.6G     0%    /tmp
/dev/mirror/gm0s1f     58G    4.5G     49G     8%    /usr
/dev/mirror/gm0s1d    3.9G    152M    3.4G     4%    /var
storage               7.1T    3.1T    4.0T    43%    /storage
/dev/ada0s1d          1.8T    4.0K    1.6T     0%    /new0
/dev/ada6s1d          1.8T    4.0K    1.6T     0%    /new6

Testing the existing ZFS array

For future comparison, here is a simple test on the existing ZFS array:

# dd if=/dev/random of=/storage/dan/NewDriveTesting/file1 bs=1m count=20480 20480+0 records in
20480+0 records out
21474836480 bytes transferred in 333.867807 secs (64321375 bytes/sec)

Not very astounding. But there’s a reason. This is CPU bound. If I then try
copying that file around, it’s a better representation of the power. Compare
that to the time for /dev/zero:

# dd if=/dev/zero of=/storage/dan/NewDriveTesting/file-zero bs=1m count=20480 20480+0 records in
20480+0 records in
20480+0 records out
21474836480 bytes transferred in 124.919368 secs (171909583 bytes/sec)

Copying data off the array

I’ve divided up my data into two parts, one for each of the two HDD.
This 897G copy just started:

# cd /storage/bacula/volumes
# cp -rp FileAuto-0* bast catalog dbclone kraken laptop-freebsd laptop-vista latens \
            nyi polo supernews /new0/bacula/volumes/

And this 1.8T:

# cd /storage/bacula/volumes/ngaio
# cp -rp FileAuto-0{1..7}* /new6/bacula/volumes/ngaio/

At present, the zpool iostat is running like this:

$ zpool iostat 30
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     3.92T  5.15T    514      0  63.8M      0
storage     3.92T  5.15T    493      3  61.2M  6.23K
storage     3.92T  5.15T    505      0  62.7M      0
storage     3.92T  5.15T    499      3  62.0M  6.30K
storage     3.92T  5.15T    514      0  63.8M      0
storage     3.92T  5.15T    601      3  74.7M  6.13K
storage     3.92T  5.15T    604      0  74.9M      0
storage     3.92T  5.15T    754      3  93.7M  6.17K
storage     3.92T  5.15T    713      0  88.5M      0
storage     3.92T  5.15T    645      4  80.1M  7.48K
storage     3.92T  5.15T    725      0  90.1M      0
storage     3.92T  5.15T    717      3  89.0M  6.73K

I may be waiting a while… 🙂 12 hours by my reckoning.

So far, as of 9:39 PM EST:

$ df -h /new0 /new6
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/ada0s1d          1.8T    178G    1.4T    11%    /new0
/dev/ada6s1d          1.8T    131G    1.5T     8%    /new6

11:49PM – gone to bed. expect no updates until after 10AM EST.

9:00 AM

It appears my calculations were incorrect. I have run out of space:

# cp -rp FileAuto-0{1..7}* /new6/bacula/volumes/ngaio/
cp: FileAuto-01*: No such file or directory

/new6: write failed, filesystem is full
cp: /new6/bacula/volumes/ngaio/FileAuto-0291: No space left on device

/new6: write failed, filesystem is full
cp: /new6/bacula/volumes/ngaio/FileAuto-0290: No space left on device
cp: /new6/bacula/volumes/ngaio/FileAuto-0289: No space left on device
cp: /new6/bacula/volumes/ngaio/FileAuto-0288: No space left on device
cp: /new6/bacula/volumes/ngaio/FileAuto-0280: No space left on device
cp: /new6/bacula/volumes/ngaio/FileAuto-0276: No space left on device
cp: /new6/bacula/volumes/ngaio/FileAuto-0799: No space left on device

#

10:43 AM

Now we’re copying again… this time, using rsync FTW.

11:11 AM

The final copy:

# df -h /new?
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/ada0s1d          1.8T    1.0T    640G    61%    /new0
/dev/ada6s1d          1.8T    1.8T   -143G   109%    /new6

# cd /storage/bacula/volumes/ngaio
# cp -rp FileAuto-08* FileAuto-09* /new0/bacula/volumes/ngaio

12:47 PM

The copy has finished. Now I wish to verify. rsync will help.

1:57 PM

I have a complete list of files on both old and new filesystems. They look great.

Destroying the old pool

With the backup complete and confirmed, I’m ready to offline the backup drives,
just to avoid to mistakes, using the umount command.

Time to destory the existing pool. This removes data.

$ zpool status
   pool: storage
  state: ONLINE
  scrub: none requested
 config:
     
          NAME        STATE     READ WRITE CKSUM
          storage    ONLINE       0     0     0
           raidz1    ONLINE       0     0     0
             ada1    ONLINE       0     0     0
             ada2    ONLINE       0     0     0
             ada3    ONLINE       0     0     0
             ada4    ONLINE       0     0     0
             ada5    ONLINE       0     0     0

errors: No known data errors

$ sudo zpool destroy -f storage
$ zpool status
no pools available
$

There. Gone. Time to start rebuilding the array. I went through the partitioning
process describe in ZFS: don’t give it all your HDD.

Creating a 7HDD zpool with only 5HDD (fails)

WARNING: This attempt failed. Read the next section for the successful approach.

I want to create a new zpool that contains all 7 HDD. The problem is, two of those HDD now contain
my data. I will take this approach to solving this:

  1. create two sparse files
  2. create a new zpool with the 5HDD and the two sparse files
  3. remove the two sparse files from the array
  4. copy data from my 2 HDD to the array
  5. add my two HDD into the array filling the two empty slots

First, I destroy my existing pool. WARNING: This destroys all data in the pool.

zpool destroy storage

Now I create two sparse files, roughly the same size as the HDD partitions. Actually, slightly smaller.

$ dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0  oseek=1862g
0+0 records in
0+0 records out
0 bytes transferred in 0.000010 secs (0 bytes/sec)

$ dd if=/dev/zero of=/tmp/sparsefile2.img bs=1 count=0  oseek=1862g
0+0 records in
0+0 records out
0 bytes transferred in 0.000011 secs (0 bytes/sec)

$ ls -l /tmp/sparsefile2.img /tmp/sparsefile1.img
-rw-r--r--  1 dan  wheel  1999307276288 Jul 25 12:52 /tmp/sparsefile1.img
-rw-r--r--  1 dan  wheel  1999307276288 Jul 25 12:52 /tmp/sparsefile2.img


$ ls -ls /tmp/sparsefile2.img /tmp/sparsefile1.img
64 -rw-r--r--  1 dan  wheel  1999307276288 Jul 25 12:52 /tmp/sparsefile1.img
64 -rw-r--r--  1 dan  wheel  1999307276288 Jul 25 12:52 /tmp/sparsefile2.img

Although these sparse files look
to be 1862GB in size, they only take up 64 blocks, as shown in the last ls output.

This command creates a new pool that includes the above two files. It is the same
zpool command used in the tests above, but just includes two more parameters:

# zpool create storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 \
       /tmp/sparsefile1.img /tmp/sparsefile2.img
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: raidz contains both files and devices

Oh damn. Yes. Umm, let’s try that -f option.

# zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 \
       /tmp/sparsefile1.img /tmp/sparsefile2.img

# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   ONLINE       0     0     0
          raidz2                  ONLINE       0     0     0
            gpt/disk01            ONLINE       0     0     0
            gpt/disk02            ONLINE       0     0     0
            gpt/disk03            ONLINE       0     0     0
            gpt/disk04            ONLINE       0     0     0
            gpt/disk05            ONLINE       0     0     0
            /tmp/sparsefile1.img  ONLINE       0     0     0
            /tmp/sparsefile2.img  ONLINE       0     0     0

errors: No known data errors

Let’s offline the two sparse files from the array.

# zpool detach storage /tmp/sparsefile2.img
cannot detach /tmp/sparsefile2.img: only applicable to mirror and replacing vdevs

# zpool remove storage /tmp/sparsefile2.img
cannot remove /tmp/sparsefile2.img: only inactive hot spares or cache devices can be removed

OK, neither of those work. Let’s try this:

# zpool offline storage /tmp/sparsefile2.img

Oh oh. The system went away. Panic I bet.. Yes.. System panic. But after reboot, I went
ahead and deleted the two sparse files out from underneath ZFS. That is not the
right thing to do. Now I see:

# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   DEGRADED     0     0     0
          raidz2                  DEGRADED     0     0     0
            gpt/disk01            ONLINE       0     0     0
            gpt/disk02            ONLINE       0     0     0
            gpt/disk03            ONLINE       0     0     0
            gpt/disk04            ONLINE       0     0     0
            gpt/disk05            ONLINE       0     0     0
            /tmp/sparsefile1.img  UNAVAIL      0     0     0  cannot open
            /tmp/sparsefile2.img  UNAVAIL      0     0     0  cannot open

errors: No known data errors

So… let’s try a scrub… no, that dies too. So does ‘zpool destroy storage’.

From email:

rm /boot/zfs/zpool.cache
then wipe the drives/partitions by writing 16KB at the beginning and end

# dd if=/dev/zero of=/dev/ada1 bs=512 count=32
32+0 records in
32+0 records out
16384 bytes transferred in 0.008233 secs (1990023 bytes/sec)

# dd if=/dev/zero of=/dev/ada1 bs=512 count=32 oseek=3907029073
32+0 records in
32+0 records out
16384 bytes transferred in 0.008202 secs (1997601 bytes/sec)

Repeat for ada2..5

but that was on the raw device. I should have done it on the partition:

# dd if=/dev/zero of=/dev/gpt/disk01 bs=512 count=32
32+0 records in
32+0 records out
16384 bytes transferred in 0.008974 secs (1825752 bytes/sec)

# dd if=/dev/zero of=/dev/gpt/disk01 bs=512 count=32 oseek=3906824269
32+0 records in
32+0 records out
16384 bytes transferred in 0.008934 secs (1833889 bytes/sec)

Where did I get these values? From ‘gpart show’:

=<        34  3907029101  ada1  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  3906824301     1  freebsd-zfs  (1.8T)
  3906825325      203810        - free -  (100M)

Am I doing the right math?

3906824301 (from above) – 32 blocks = 3906824269 (used below in oseek)

Now to try memory disks…

# mdconfig -a -t malloc -s 1862g -u 0
# mdconfig -a -t malloc -s 1862g -u 1

# zpool create storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 /dev/md0 /dev/md1
invalid vdev specification
use '-f' to override the following errors:
raidz contains devices of different sizes

# zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 /dev/md0 /dev/md1
# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME            STATE     READ WRITE CKSUM
        storage         ONLINE       0     0     0
          raidz2        ONLINE       0     0     0
            gpt/disk01  ONLINE       0     0     0
            gpt/disk02  ONLINE       0     0     0
            gpt/disk03  ONLINE       0     0     0
            gpt/disk04  ONLINE       0     0     0
            gpt/disk05  ONLINE       0     0     0
            md0         ONLINE       0     0     0
            md1         ONLINE       0     0     0

errors: No known data errors
# zpool offline storage md0
# zpool offline storage md1
cannot offline md1: no valid replicas

Creating a 7HDD zpool with only 5HDD (succeeds)

I may have a cunning plan, suggested to me by Pawel Tyll:

Given that I have:

  • 5 empty 2 TB HDD
  • 2 full 2 TB HDD
  • FreeBSD 8.1-STABLE
  1. For each of my 5 HDD, create 2x1TB partitions (labeled live and backup respectively)
  2. Create a 7-device raidz2 zpool using one partition from each HDD and two /dev/md devices; call this live.
  3. Create a 5-device raidz1 zpool using 1TB partitions from each HDD; call this backup
  4. Copy the data from the two HDD into the zpool backup.
  5. scrub the backup pool to ensure it’s OK
  6. create a 2TB partition on each of the 2 HDD which are not in any pool
  7. replace the two md units in the live pool with the 2 HDD
  8. try a zfs send | zfs receive from the backup pool to the live pool
  9. scrub the live pool
  10. destroy the backup pool
  11. for i = 1..5
      offline drive i
      partition into 1x2TB drive.
      put back into live pool using replace
    end for
    

Hmm, that’s pretty straight forward and very cunning.

In the below, you can see that the first part of each HDD is used for the backup pool,
and the second part is used for the live pool. That is not ideal. For this situation,
it is better to put the backup data on the second partition, and the live data on the
first partition. Why? When we go to drop the backup partition, we can just grow
the live partition, and retain our data. This is untested. 🙂

So this is the partitioning I used:

# gpart add -b 1024 -s 1953412151 -t freebsd-zfs -l disk05-backup ada5
ada5p1 added

# gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk05-live ada5
ada5p2 added

# gpart show ada5
=>        34  3907029101  ada5  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

# gpart add -b 1024       -s 1953412151 -t freebsd-zfs -l disk04-backup ada4
ada4p1 added
# gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk04-live   ada4
ada4p2 added

# gpart add -b 1024       -s 1953412151 -t freebsd-zfs -l disk03-backup ada3
ada3p1 added
# gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk03-live   ada3
ada3p2 added

# gpart add -b 1024       -s 1953412151 -t freebsd-zfs -l disk02-backup ada2
ada2p1 added
# gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk02-live   ada2
ada2p2 added

# gpart add -b 1024       -s 1953412151 -t freebsd-zfs -l disk01-backup ada1
ada1p1 added
# gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk01-live   ada1

Now let’s look at the partision on those drives:


# gpart show ada1 ada2 ada3 ada4 ada5
=>        34  3907029101  ada1  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada2  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada3  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada4  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada5  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

#

Now we configure two memory disks:

# mdconfig -a -t malloc -s 931g -u 0
# mdconfig -a -t malloc -s 931g -u 1

Now we create the 7 vdev raidz2, with the live partitions from 5HDD, and two memory disks:

# zpool create -f storage raidz2 gpt/disk01-live gpt/disk02-live gpt/disk03-live gpt/disk04-live gpt/disk05-live /dev/md0 /dev/md1
# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            md0              ONLINE       0     0     0
            md1              ONLINE       0     0     0

errors: No known data errors
#

Next, we create the other pool, which will be raidz1 on 5 vdevs:

# zpool create -f MyBackup raidz1 gpt/disk01-backup gpt/disk02-backup gpt/disk03-backup gpt/disk04-backup gpt/disk05-backup
# zpool status
  pool: MyBackup
 state: ONLINE
 scrub: none requested
config:

        NAME                   STATE     READ WRITE CKSUM
        MyBackup               ONLINE       0     0     0
          raidz1               ONLINE       0     0     0
            gpt/disk01-backup  ONLINE       0     0     0
            gpt/disk02-backup  ONLINE       0     0     0
            gpt/disk03-backup  ONLINE       0     0     0
            gpt/disk04-backup  ONLINE       0     0     0
            gpt/disk05-backup  ONLINE       0     0     0

errors: No known data errors

  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            md0              ONLINE       0     0     0
            md1              ONLINE       0     0     0

errors: No known data errors

Copying data to the backup pool

The following iostat output shows the files copying from the two temp HDD to the backup pool.

# zpool iostat MyBackup 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
MyBackup    5.62G  4.53T      0    327     78  29.7M
MyBackup    5.67G  4.53T      0    480      0  40.7M
MyBackup    5.72G  4.53T      0    528      0  48.7M
MyBackup    5.77G  4.53T      0    383      0  31.6M
MyBackup    5.82G  4.53T      0    482      0  42.7M
MyBackup    5.87G  4.53T      0    509      0  47.2M
MyBackup    5.92G  4.53T      0    451      0  36.9M
MyBackup    5.96G  4.53T      0    424      0  34.6M
MyBackup    5.99G  4.53T      0    460      0  40.5M
MyBackup    6.04G  4.53T      0    457      0  40.4M
MyBackup    6.09G  4.53T      0    457      0  40.4M
MyBackup    6.14G  4.53T      0    457      0  40.5M
MyBackup    6.19G  4.53T      0    456      0  40.3M
MyBackup    6.23G  4.53T      0    456      0  40.3M
MyBackup    6.28G  4.53T      0    388      0  39.8M
MyBackup    6.33G  4.53T      0    488      0  41.1M
MyBackup    6.38G  4.53T      0    554      0  40.8M
MyBackup    6.43G  4.52T      0    457      0  40.4M
MyBackup    6.48G  4.52T      0    459      0  40.5M
MyBackup    6.53G  4.52T      0    475      0  40.5M
MyBackup    6.58G  4.52T      0    477      0  40.5M
MyBackup    6.65G  4.52T      0    464      0  41.2M
MyBackup    6.70G  4.52T      0    586      0  55.5M
MyBackup    6.75G  4.52T      0    528      0  41.7M
MyBackup    6.79G  4.52T      0    467      0  38.5M
MyBackup    6.84G  4.52T      0    447      0  38.5M
MyBackup    6.89G  4.52T      0    445      0  38.4M
MyBackup    6.96G  4.52T      0    513      0  39.1M
MyBackup    6.98G  4.52T      0    445      0  38.4M
MyBackup    7.03G  4.52T      0    445      0  38.5M
MyBackup    7.10G  4.52T      0    449      0  38.3M
MyBackup    7.15G  4.52T      0    513      0  46.1M
MyBackup    7.19G  4.52T      0    651      0  50.2M
MyBackup    7.24G  4.52T      0    445      0  38.5M
MyBackup    7.31G  4.52T      0    515      0  46.5M
MyBackup    7.36G  4.52T      0    598      0  49.7M
MyBackup    7.43G  4.52T      0    445      0  38.5M
MyBackup    7.47G  4.52T      0    687      0  57.7M
MyBackup    7.54G  4.52T      0    447      0  38.5M
MyBackup    7.59G  4.52T      0    544      0  50.0M
MyBackup    7.64G  4.52T      0    571      0  46.2M
MyBackup    7.68G  4.52T      0    451      0  38.5M
MyBackup    7.75G  4.52T      0    449      0  38.4M
MyBackup    7.80G  4.52T      0    676      0  57.7M
MyBackup    7.85G  4.52T      0    557      0  43.9M
MyBackup    7.92G  4.52T      0    580      0  53.2M
MyBackup    7.97G  4.52T      0    426      0  36.2M
MyBackup    8.02G  4.52T      0    398      0  27.0M
MyBackup    8.06G  4.52T      0    451      0  38.7M
MyBackup    8.11G  4.52T      0    480      0  41.7M
MyBackup    8.16G  4.52T      0    447      0  35.7M
MyBackup    8.21G  4.52T      0    484      0  42.7M
MyBackup    8.25G  4.52T      0    554      0  49.3M
MyBackup    8.32G  4.52T      0    454      0  38.5M
MyBackup    8.34G  4.52T      0    503      0  37.3M
MyBackup    8.38G  4.52T      0    437      0  36.9M
MyBackup    8.45G  4.52T      0    438      0  37.1M
MyBackup    8.50G  4.52T      0    691      0  55.5M
MyBackup    8.54G  4.52T      0    439      0  36.9M
MyBackup    8.61G  4.52T      0    437      0  36.9M
MyBackup    8.65G  4.52T      0    580      0  53.4M
MyBackup    8.72G  4.52T      0    563      0  39.8M
MyBackup    8.75G  4.52T      0    455      0  37.0M
MyBackup    8.79G  4.52T      0    438      0  37.1M
MyBackup    8.84G  4.52T      0    374      0  26.7M
MyBackup    8.88G  4.52T      0    579      0  47.6M
MyBackup    8.93G  4.52T      0    442      0  37.2M
MyBackup    8.95G  4.52T      0    197      0  18.3M
MyBackup    8.97G  4.52T      0     14      0  19.0K
MyBackup    8.97G  4.52T      0    104      0  5.33M
MyBackup    8.98G  4.52T      0    129      0  11.9M
MyBackup    8.98G  4.52T      0     36      0  47.0K
MyBackup    9.00G  4.52T      0    321      0  22.4M
MyBackup    9.04G  4.52T      0    413      0  35.4M
MyBackup    9.08G  4.52T      0    428      0  35.5M
MyBackup    9.13G  4.52T      0    445      0  35.5M
MyBackup    9.17G  4.52T      0    497      0  35.8M
MyBackup    9.21G  4.52T      0    425      0  35.5M
MyBackup    9.28G  4.52T      0    442      0  36.9M
MyBackup    9.32G  4.52T      0    621      0  51.7M
MyBackup    9.36G  4.52T      0    425      0  35.4M
MyBackup    9.43G  4.52T      0    519      0  46.4M
MyBackup    9.47G  4.52T      0    564      0  42.2M
MyBackup    9.54G  4.52T      0    644      0  53.5M
MyBackup    9.60G  4.52T      0    529      0  47.6M
MyBackup    9.66G  4.52T      0    562      0  43.5M
MyBackup    9.73G  4.52T      0    616      0  50.8M
MyBackup    9.78G  4.52T      0    711      0  59.9M
MyBackup    9.85G  4.52T      0    692      0  59.9M
MyBackup    9.92G  4.52T      0    477      0  39.7M
MyBackup    10.0G  4.52T      0    688      0  59.7M
MyBackup    10.0G  4.52T      0    691      0  59.5M
MyBackup    10.1G  4.52T      0    520      0  39.9M
MyBackup    10.2G  4.52T      0    690      0  59.5M
MyBackup    10.2G  4.52T      0    457      0  39.4M
^C

10:39 PM

And the copy continues:

 $ zpool iostat MyBackup 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
MyBackup    2.68T  1.85T      0    456  1.19K  25.4M
MyBackup    2.68T  1.85T      0    487      0  27.8M
MyBackup    2.68T  1.85T      0    488      0  27.9M
MyBackup    2.68T  1.85T      0    323      0  18.0M
MyBackup    2.68T  1.85T      1    545  2.50K  28.4M
MyBackup    2.68T  1.85T      1    489  3.00K  27.7M
MyBackup    2.68T  1.85T      0    324      0  18.4M
MyBackup    2.68T  1.85T      1    486  3.00K  27.7M
MyBackup    2.68T  1.85T      0    331      0  18.9M

4:48 AM

only 500GB to go. At 30M/s, that should take about 5 hours…

$ df -h
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0s1a    989M    508M    402M    56%    /
devfs                 1.0K    1.0K      0B   100%    /dev
/dev/mirror/gm0s1e    3.9G    496K    3.6G     0%    /tmp
/dev/mirror/gm0s1f     58G    4.6G     48G     9%    /usr
/dev/mirror/gm0s1d    3.9G    155M    3.4G     4%    /var
/dev/ada0s1d          1.8T    1.4T    264G    84%    /new0
/dev/ada6s1d          1.8T    1.7T   -125G   108%    /new6
storage               4.4T     39K    4.4T     0%    /storage
MyBackup              3.6T    2.6T    940G    74%    /MyBackup

A couple of processes have been busy:

$ ps auwx | grep cp
root    12709  0.0  0.0  5824  2048   3  D+   Sun09PM  31:32.24 cp -rp /new6/bacula .
root    12692  0.0  0.1  5824  2120   1  D+   Sun09PM  31:43.20 cp -rp /new0/bacula /new0/pgsql .

The iostat of the two source HDD:

$ iostat ada0 ada6 2
       tty            ada0             ada6             cpu
 tin  tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   0     5 127.52  95 11.88  127.53  94 11.76   0  0  3  1 96
   0    92 127.44 100 12.51  127.39  92 11.51   0  0  4  0 96
   0    31 127.52 116 14.50  128.00  76  9.50   0  0  3  0 96
   0    31 127.42  96 12.00  127.39  92 11.50   0  0  3  0 97
   0    31 128.00 104 13.00  127.39  92 11.50   0  0  4  1 95
   0    31 127.34  84 10.50  128.00 108 13.49   0  0  2  1 97
   0    31 127.60 140 17.50  127.39  92 11.50   0  0  3  1 95
   0    31 127.44 100 12.50  127.39  92 11.50   0  0  4  1 96
   0    31 127.39  92 11.50  127.46 104 13.00   0  0  4  0 96
   0    31 127.54 120 15.00  127.50 112 14.00   0  0  5  0 95
   0    31 128.00  96 11.99  128.00  96 11.99   0  0  2  1 97
   0    31 127.47 105 13.13  127.47 105 13.13   0  0  3  0 97
   0    31 127.59 135 16.87  127.53 119 14.87   0  0  4  1 95
   0    31 127.44 100 12.50  127.42  96 12.00   0  0  3  0 96
   0    31 127.44 100 12.50  127.42  96 12.00   0  0  3  1 95
   0    31 127.44 100 12.50  128.00  96 12.00   0  0  3  1 97
   0    32 128.00 120 14.99  127.50 112 14.00   1  0  6  1 92

5:48 AM

One of the cp processesd has finished:

$ ps auwx | grep cp
root    12709  0.9  0.0  5824  2048   3  D+   Sun09PM  32:13.30 cp -rp /new6/bacula .

7:12 AM

Oh oh, we’re down to just one TB free on the array!

MyBackup    3.52T  1.01T      0    490      0  23.9M
MyBackup    3.53T  1.01T      0    509      0  23.7M
MyBackup    3.53T  1.01T      0    501     51  24.6M
MyBackup    3.53T  1.00T      0    490      0  23.6M
MyBackup    3.53T  1.00T      0    508      0  24.4M
MyBackup    3.53T  1.00T      0    475      0  22.5M
MyBackup    3.53T  1.00T      0    503      0  24.5M
MyBackup    3.53T  1.00T      0    520      0  25.6M
MyBackup    3.53T  1.00T      0    489      0  24.0M
MyBackup    3.53T  1.00T      0    475    153  22.3M
MyBackup    3.53T  1.00T      0    518      0  24.8M
MyBackup    3.53T  1.00T      0    513      0  25.9M
MyBackup    3.53T  1024G      0    498    127  24.3M
MyBackup    3.53T  1023G      0    524      0  25.3M

According to recent calculations there is about 300GB (or about 3.4 hours) left to copy:

First, the backups:

$ du -ch /new6/bacula/volumes/ngaio /new0/bacula/volumes/ngaio/
1.7T    /new6/bacula/volumes/ngaio
479G    /new0/bacula/volumes/ngaio/
2.2T    total

Compared to what we have in ZFS:

$ du -ch /MyBackup/bacula/volumes/ngaio/
1.9T    /MyBackup/bacula/volumes/ngaio/
1.9T    total

There are about 456 files in the backup:

$ ls /new6/bacula/volumes/ngaio /new0/bacula/volumes/ngaio/ | wc -l
     456

And about 401 files in the ZFS array:

$ ls /MyBackup/bacula/volumes/ngaio/ | wc -l
      401

NOTE: this is not the total file count: it applies just to the directory now being copied.

Given that we are missing 55 files, each 5GB , more or less, that gives 275GB, which is more or less
close to the 300GB estimated above.

Scrub the backup data

12:02 PM

The backup has finished. Time for a good scrubbing then a copy to the live pool!

MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0
MyBackup    3.88T   663G      0      0      0      0

The above is an idle pool. The below is the start of the scrub.
Don’t worry, the time is overestimated and quickly drops.

# zpool scrub MyBackup

# zpool status MyBackup
  pool: MyBackup
 state: ONLINE
 scrub: scrub in progress for 0h0m, 0.02% done, 28h26m to go
config:

        NAME                   STATE     READ WRITE CKSUM
        MyBackup               ONLINE       0     0     0
          raidz1               ONLINE       0     0     0
            gpt/disk01-backup  ONLINE       0     0     0
            gpt/disk02-backup  ONLINE       0     0     0
            gpt/disk03-backup  ONLINE       0     0     0
            gpt/disk04-backup  ONLINE       0     0     0
            gpt/disk05-backup  ONLINE       0     0     0

errors: No known data errors

By 12:47 PM, the status was:

scrub: scrub in progress for 0h44m, 14.89% done, 4h11m to go

5:26 PM

scrub: scrub completed after 4h48m with 0 errors on Tue Jul 27 16:52:04 2010

Next steps: snapshot, put those two HDD into the live pool, followed by zfs send | zfs receive

Screwing up the memory disks….

6:52 PM

One of the great features of ZFS is a send/receive function. You can send a ZFS snapshot
from one filesystem to another. In this section, I mess up the live pool, but then fix it.
Then I partition those two spare HDD and add them in the live pool.

This is how I create the snapshot for the pool:

# zfs snapshot MyBackup@2010.07.27

Note that MyBackup is the name of the pool. To see the list of snapshots:

# zfs list -t snapshot
NAME                  USED  AVAIL  REFER  MOUNTPOINT
MyBackup@2010.07.27      0      -  3.10T  -

Now what does the live pool look like?

# zpool status
  pool: MyBackup
 state: ONLINE
 scrub: scrub completed after 4h48m with 0 errors on Tue Jul 27 16:52:04 2010
config:

        NAME                   STATE     READ WRITE CKSUM
        MyBackup               ONLINE       0     0     0
          raidz1               ONLINE       0     0     0
            gpt/disk01-backup  ONLINE       0     0     0
            gpt/disk02-backup  ONLINE       0     0     0
            gpt/disk03-backup  ONLINE       0     0     0
            gpt/disk04-backup  ONLINE       0     0     0
            gpt/disk05-backup  ONLINE       0     0     0

errors: No known data errors

  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            md0              ONLINE       0     0     0
            md1              ONLINE       0     0     0

errors: No known data errors
#

Hmm, it’s fine. OK, that’s just because ZFS doesn’t yet know those items are gone.
Even after adding a file.

I made an error. I should not have rm’d those devices. What I did:

  1. shutdown -p now
  2. boot into single user mode
  3. run the same mdconfig statements I did previously
  4. exit

After running a scrub, the status is:

# zpool status storage
  pool: storage
 state: ONLINE
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: scrub completed after 0h0m with 0 errors on Tue Jul 27 19:02:04 2010
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            md0              UNAVAIL      0   113     0  corrupted data
            md1              UNAVAIL      0   117     0  corrupted data

errors: No known data errors
#

Adding in the two new HDD

Let’s partition those two spare HDD:

# gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk06-live ada0
ada0p1 added
# gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk07-live ada6
ada6p1 added

# gpart show ada0 ada6
=>        34  3907029101  ada0  GPT  (1.8T)
          34        2014        - free -  (1.0M)
        2048  3906617453     1  freebsd-zfs  (1.8T)
  3906619501      409634        - free -  (200M)

=>        34  3907029101  ada6  GPT  (1.8T)
          34        2014        - free -  (1.0M)
        2048  3906617453     1  freebsd-zfs  (1.8T)
  3906619501      409634        - free -  (200M)

These commands replace the memory drives with the new HDD:


# zpool replace storage md0 gpt/disk06-live
# zpool replace storage md1 gpt/disk07-live
# zpool status storage
  pool: storage
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Tue Jul 27 19:29:08 2010
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0  14K resilvered
            gpt/disk02-live  ONLINE       0     0     0  15.5K resilvered
            gpt/disk03-live  ONLINE       0     0     0  17.5K resilvered
            gpt/disk04-live  ONLINE       0     0     0  19K resilvered
            gpt/disk05-live  ONLINE       0     0     0  18K resilvered
            gpt/disk06-live  ONLINE       0     0     0  16.5K resilvered
            gpt/disk07-live  ONLINE       0     0     0  24.5K resilvered

errors: No known data errors
#

Now we scrub the pool, just to be sure.

# zpool scrub storage
# zpool status storage
  pool: storage
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Tue Jul 27 19:29:51 2010
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            gpt/disk06-live  ONLINE       0     0     0
            gpt/disk07-live  ONLINE       0     0     0

errors: No known data errors
#

Copying data from one ZFS array to another

The following will take a long time. I recommend doing it on the
console or in a screen session.

# zfs send MyBackup@2010.07.27 | zfs receive storage
cannot receive new filesystem stream: destination 'storage' exists
must specify -F to overwrite it
warning: cannot send 'MyBackup@2010.07.27': Broken pipe

# zfs send MyBackup@2010.07.27 | zfs receive storage/Retored

The write has started:

# zpool iostat 10
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
MyBackup    3.88T   663G     18      0  2.18M  1.03K
storage     6.24G  6.31T      0     18    125  2.12M
----------  -----  -----  -----  -----  -----  -----
MyBackup    3.88T   663G    847      0   105M      0
storage     7.64G  6.31T      0    841    716   102M
----------  -----  -----  -----  -----  -----  -----
MyBackup    3.88T   663G    782      0  97.1M      0
storage     9.04G  6.30T      0    844    665   102M
----------  -----  -----  -----  -----  -----  -----

The above was done at 7:37 PM and is filling up at about 8GB/minute or about 90MB/s:

# zpool iostat storage 60
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage      158G  6.16T      0    292    666  35.3M
storage      166G  6.15T      1    798  1.53K  96.0M
storage      175G  6.14T      1    832  3.20K   100M
storage      183G  6.13T      0    828  2.53K  99.8M
storage      190G  6.13T      0    737  1.35K  88.9M
storage      198G  6.12T      0    759  1.30K  91.6M
storage      205G  6.11T      0    710    537  86.0M
storage      213G  6.10T      0    770    810  93.1M
storage      220G  6.10T      0    770  1.99K  92.9M
storage      228G  6.09T      0    774    853  93.5M
storage      236G  6.08T      0    774    989  93.5M
storage      243G  6.07T      0    773  1.02K  93.5M

5:44 AM

Copy has finished. scrub started.

The scrub finished. No data loss. I’ve copied over all the data from the
backup pool to the new live pool. Now I want to start removing the old
pools.

Removing the old data

Don’t do any of this. I should have done it much later.

# zfs destroy storage/Retored
cannot destroy 'storage/Retored': filesystem has children
use '-r' to destroy the following datasets:
storage/Retored@2010.07.27
storage/Retored@2010.07.28

OK, I don’t need those:

# zfs destroy storage/Retored@2010.07.28
cannot destroy 'storage/Retored@2010.07.28': snapshot has dependent clones
use '-R' to destroy the following datasets:
storage/bacula

OH wait! I need to keep storage/bacula! That’s my live data. With ZFS,
it seems a clone is forever linked to the snapshot from which it was taken.
I’d like to unlink it. I’m guessing the only way is cp. Or better yet:

I think I’ll worry about this after I get rid of the backup pool.

# zfs umount MyBackup
# zpool destroy MyBackup

Repartitioning the HDD

You will recall we put two partitions on those first 5 HDD but only only one partition on the two new HDD.
Now it’s time to repartition those 5x2TB HDD which have two partitions, each
931GB, into one large parition. Look at ada1..ada5 below. And note
that ada0 and ada6 are already a single 1.8TB partition.

# gpart show
=>        34  3907029101  ada1  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada2  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada3  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada4  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>        34  3907029101  ada5  GPT  (1.8T)
          34         990        - free -  (495K)
        1024  1953412151     1  freebsd-zfs  (931G)
  1953413175  1953412151     2  freebsd-zfs  (931G)
  3906825326      203809        - free -  (100M)

=>       63  156301362  mirror/gm0  MBR  (75G)
         63  156301425           1  freebsd  [active]  (75G)

=>        0  156301425  mirror/gm0s1  BSD  (75G)
          0    2097152             1  freebsd-ufs  (1.0G)
    2097152   12582912             2  freebsd-swap  (6.0G)
   14680064    8388608             4  freebsd-ufs  (4.0G)
   23068672    8388608             5  freebsd-ufs  (4.0G)
   31457280  124844145             6  freebsd-ufs  (60G)

=>        34  3907029101  ada0  GPT  (1.8T)
          34        2014        - free -  (1.0M)
        2048  3906617453     1  freebsd-zfs  (1.8T)
  3906619501      409634        - free -  (200M)

=>        34  3907029101  ada6  GPT  (1.8T)
          34        2014        - free -  (1.0M)
        2048  3906617453     1  freebsd-zfs  (1.8T)
  3906619501      409634        - free -  (200M)

Let us offline ada1.

# zpool status
  pool: storage
 state: ONLINE
 scrub: scrub completed after 5h12m with 0 errors on Thu Jul 29 07:51:45 2010
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            gpt/disk06-live  ONLINE       0     0     0
            gpt/disk07-live  ONLINE       0     0     0

errors: No known data errors

We’ll start with gpt/disk01-live

# zpool offline storage gpt/disk01-live
# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 5h12m with 0 errors on Thu Jul 29 07:51:45 2010
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              DEGRADED     0     0     0
          raidz2             DEGRADED     0     0     0
            gpt/disk01-live  OFFLINE      0    89     0
            gpt/disk02-live  ONLINE       0     0     0
            gpt/disk03-live  ONLINE       0     0     0
            gpt/disk04-live  ONLINE       0     0     0
            gpt/disk05-live  ONLINE       0     0     0
            gpt/disk06-live  ONLINE       0     0     0
            gpt/disk07-live  ONLINE       0     0     0

errors: No known data errors

Destroy the two partitions. All data will be lost:

# gpart delete -i 1 ada1
ada1p1 deleted
# gpart delete -i 2 ada1
ada1p2 deleted
# gpart show ada1
=>        34  3907029101  ada1  GPT  (1.8T)
          34  3907029101        - free -  (1.8T)

Clear out the first and last 16KB on the HDD. This may not be useful as
we have already destroy the partition…

# dd if=/dev/zero of=/dev/ada1 bs=512 count=32
32+0 records in
32+0 records out


# dd if=/dev/zero of=/dev/ada1 bs=512 count=32 oseek=3907029073
32+0 records in
32+0 records out

Get the disk ready to be a GEOM device:

# gpart create -s GPT ada1
ada1 created

Create the new partition

# gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk01-live ada1
ada1p1 added

# gpart show ada1
=>        34  3907029101  ada1  GPT  (1.8T)
          34        2014        - free -  (1.0M)
        2048  3906617453     1  freebsd-zfs  (1.8T)
  3906619501      409634        - free -  (200M)

Hmm, just to be safe, let’s clear the first and last 16KB of that partition (NOTE: my
first attemps fail, see below for details).

NOTE that I use ada1p1, not ada1 as above.

# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=32
32+0 records in
32+0 records out
16384 bytes transferred in 0.008515 secs (1924161 bytes/sec)

And for the last 16KB:

# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=32 oseek=3906617421
32+0 records in
32+0 records out

The number 3906617421 come from the partition size (3906617453) found in the
gpart show output minus 32 sectors of 512 bytes each, for a total of 16KB.
A rough test to make sure you did the math right: add one to the oseek parameter
if if you see this error, but not with the original oseek, you specified the right
value:

# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=32 oseek=3906617422
dd: /dev/ada1p1: end of device
32+0 records in
31+0 records out
15872 bytes transferred in 0.007949 secs (1996701 bytes/sec)

Now lets see:

# gpart status
         Name  Status  Components
       ada2p1     N/A  ada2
       ada3p1     N/A  ada3
       ada4p1     N/A  ada4
       ada5p1     N/A  ada5
 mirror/gm0s1     N/A  mirror/gm0
mirror/gm0s1a     N/A  mirror/gm0s1
       ada0p1     N/A  ada0
       ada6p1     N/A  ada6
       ada1p1     N/A  ada1

Now it’s time to replace the old 1TB device with the new 2TB device.

# zpool replace storage gpt/disk01-live gpt/disk01-live
invalid vdev specification
use '-f' to override the following errors:
/dev/gpt/disk01-live is part of potentially active pool 'storage'

It turns out my math was wrong. The ZFS meta-data is much larger.

Completely removing ZFS meta-data

In the previous section, the ‘invalid vdev specification’ message was trying to
tell me that the new vdev was marked as being used in an existing pool. I could
have used the -f option to force the replace. Instead, I want to completely
remove the meta-data.

It turns out that ZFS stores two labels
at the begining of the vdev, and two labels at the end. Look at this:

# zdb -l /dev/gpt/disk01-live
--------------------------------------------
LABEL 0
--------------------------------------------
    version=14
    name='storage'
    state=0
    txg=4
    pool_guid=2339808256841075165
    hostid=3600270990
    hostname='kraken.example.org'
    top_guid=18159926173103963460
    guid=16482477649650197853
    vdev_tree
        type='raidz'
        id=0
        guid=18159926173103963460
        nparity=2
        metaslab_array=23
        metaslab_shift=37
        ashift=9
        asize=13995117903872
        is_log=0
        children[0]
                type='disk'
                id=0
                guid=16482477649650197853
                path='/dev/gpt/disk01'
                whole_disk=0
        children[1]
                type='disk'
                id=1
                guid=8540639469082160959
                path='/dev/gpt/disk02'
                whole_disk=0
        children[2]
                type='disk'
                id=2
                guid=6533883554281261104
                path='/dev/gpt/disk03'
                whole_disk=0
        children[3]
                type='disk'
                id=3
                guid=1801494265368466138
                path='/dev/gpt/disk04'
                whole_disk=0
        children[4]
                type='disk'
                id=4
                guid=7430995867171691858
                path='/dev/gpt/disk05'
                whole_disk=0
        children[5]
                type='file'
                id=5
                guid=11845728232134214029
                path='/tmp/sparsefile1.img'
        children[6]
                type='file'
                id=6
                guid=353355856440925066
                path='/tmp/sparsefile2.img'
--------------------------------------------
LABEL 1
--------------------------------------------
    version=14
    name='storage'
    state=0
    txg=4
    pool_guid=2339808256841075165
    hostid=3600270990
    hostname='kraken.example.org'
    top_guid=18159926173103963460
    guid=16482477649650197853
    vdev_tree
        type='raidz'
        id=0
        guid=18159926173103963460
        nparity=2
        metaslab_array=23
        metaslab_shift=37
        ashift=9
        asize=13995117903872
        is_log=0
        children[0]
                type='disk'
                id=0
                guid=16482477649650197853
                path='/dev/gpt/disk01'
                whole_disk=0
        children[1]
                type='disk'
                id=1
                guid=8540639469082160959
                path='/dev/gpt/disk02'
                whole_disk=0
        children[2]
                type='disk'
                id=2
                guid=6533883554281261104
                path='/dev/gpt/disk03'
                whole_disk=0
        children[3]
                type='disk'
                id=3
                guid=1801494265368466138
                path='/dev/gpt/disk04'
                whole_disk=0
        children[4]
                type='disk'
                id=4
                guid=7430995867171691858
                path='/dev/gpt/disk05'
                whole_disk=0
        children[5]
                type='file'
                id=5
                guid=11845728232134214029
                path='/tmp/sparsefile1.img'
        children[6]
                type='file'
                id=6
                guid=353355856440925066
                path='/tmp/sparsefile2.img'
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3

Each label is about 128KB. It was suggested I overwrite a few MB at each end.
Let us do the math:

5MB of 512 btye sectors = 10240 sectors

Let us do the dd again:

# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=10240
10240+0 records in
10240+0 records out
5242880 bytes transferred in 2.494232 secs (2102002 bytes/sec)


# zdb -l /dev/gpt/disk01-live
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3

There, that killed it.

Replacing the vdev

Now we can try this replacement again:

# zpool replace storage gpt/disk01-live gpt/disk01-live

There, replaced. Now look at the status:

# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.01% done, 100h19m to go
config:

        NAME                       STATE     READ WRITE CKSUM
        storage                    DEGRADED     0     0     0
          raidz2                   DEGRADED     0     0     0
            replacing              DEGRADED     0     0     0
              gpt/disk01-live/old  OFFLINE      0   638     0
              gpt/disk01-live      ONLINE       0     0     0  86.6M resilvered
            gpt/disk02-live        ONLINE       0     0     0  57.8M resilvered
            gpt/disk03-live        ONLINE       0     0     0  57.8M resilvered
            gpt/disk04-live        ONLINE       0     0     0  57.8M resilvered
            gpt/disk05-live        ONLINE       0     0     0  57.8M resilvered
            gpt/disk06-live        ONLINE       0     0     0  57.8M resilvered
            gpt/disk07-live        ONLINE       0     0     0  57.8M resilvered

errors: No known data errors
#

Notice how gpt/disk01-live/old is OFFLINE and gpt/disk01-live is ONLINE
and all vdevs are being resilvered.

I will now wait for the resilvering to complete before proceeding with the
other HDD. Started 10:58 AM

A partition trick

While I was invistigating the above by talking with Pawel Jakub Dawidek (pjd)
on IRC, he mentioned: is the data you want to keep on the first partition?
If so, you could remove the second partition and grow the first.

Clever. But if you look at what I did above, I labelled the first partitions
as backup, and the second partition as live. This is demonstrated here:

# gpart list ada5
Geom name: ada5
fwheads: 16
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada5p1
   Mediasize: 1000147021312 (931G)
   Sectorsize: 512
   Mode: r0w0e0
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: disk05-backup
   length: 1000147021312
   offset: 524288
   type: freebsd-zfs
   index: 1
   end: 1953413174
   start: 1024
2. Name: ada5p2
   Mediasize: 1000147021312 (931G)
   Sectorsize: 512
   Mode: r1w1e2
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: disk05-live
   length: 1000147021312
   offset: 1000147545600
   type: freebsd-zfs
   index: 2
   end: 3906825325
   start: 1953413175
Consumers:
1. Name: ada5
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e3

More drive repartitioning

The resilver has finished:

# zpool status
  pool: storage
 state: ONLINE
 scrub: resilver completed after 5h34m with 0 errors on Sat Jul 31 16:30:49 2010
config:

        NAME                 STATE     READ WRITE CKSUM
        storage              ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            gpt/disk01-live  ONLINE       0     0     0  790G resilvered
            gpt/disk02-live  ONLINE       0     0     0  402M resilvered
            gpt/disk03-live  ONLINE       0     0     0  402M resilvered
            gpt/disk04-live  ONLINE       0     0     0  402M resilvered
            gpt/disk05-live  ONLINE       0     0     0  402M resilvered
            gpt/disk06-live  ONLINE       0     0     0  402M resilvered
            gpt/disk07-live  ONLINE       0     0     0  402M resilvered

errors: No known data errors
#

On to ada2! Do be careful with these commands. They could destroy an
existing HDD that you need. 🙂

# zpool offline storage gpt/disk02-live
# gpart delete -i 1 ada2
# gpart delete -i 2 ada2
# dd if=/dev/zero of=/dev/ada2p1 bs=512 count=10240
# dd if=/dev/zero of=/dev/ada2p1 bs=512 count=10240 oseek=3907018928
# gpart create -s GPT ada2
# gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk02-live ada2
# zpool replace storage gpt/disk02-live
# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h1m, 0.37% done, 8h36m to go
config:

        NAME                       STATE     READ WRITE CKSUM
        storage                    DEGRADED     0     0     0
          raidz2                   DEGRADED     0     0     0
            gpt/disk01-live        ONLINE       0     0     0  79.9M resilvered
            replacing              DEGRADED     0     0     0
              gpt/disk02-live/old  OFFLINE      0   958     0
              gpt/disk02-live      ONLINE       0     0     0  2.95G resilvered
            gpt/disk03-live        ONLINE       0     0     0  79.9M resilvered
            gpt/disk04-live        ONLINE       0     0     0  79.9M resilvered
            gpt/disk05-live        ONLINE       0     0     0  79.9M resilvered
            gpt/disk06-live        ONLINE       0     0     0  79.9M resilvered
            gpt/disk07-live        ONLINE       0     0     0  79.9M resilvered

errors: No known data errors

That should be done by morning. Then onto ada3, ada4, and ada5.

4PM Sunday, August 1, 2010: ada3 started.

# zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
storage          3.91T   524G  1.75G  /storage
storage/Retored  3.11T   524G  38.4K  /storage/Retored
storage/bacula    815G   524G  3.21T  /storage/bacula
storage/pgsql    1.75G   524G  1.75G  /storage/pgsql

You should notice that I have 525G available… That should
jump significantly once I’m done this process.

As of 7:15 Monday Morning, ada4 has been repartitioned and resilvered, and
ada5 is undergoing resilvering now. ETA: 1:15 PM.

Where’s the space?

It’s now 6:37 PM and the resilvering is complete. All vdevs are the same size.
However, I have far less space than expected:

$ zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
storage          3.92T   510G  1.75G  /storage
storage/Retored  3.11T   510G  38.4K  /storage/Retored
storage/bacula    829G   510G  3.23T  /storage/bacula
storage/pgsql    1.75G   510G  1.75G  /storage/pgsql

I expected to have 8TB or so…

After talking to my friends on IRC, I found myself composing
an email.
Before I got any replies, Wyze mentioned import/export. So, I tried it:

# df -h
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0s1a    989M    508M    402M    56%    /
devfs                 1.0K    1.0K      0B   100%    /dev
/dev/mirror/gm0s1e    3.9G    500K    3.6G     0%    /tmp
/dev/mirror/gm0s1f     58G    4.6G     48G     9%    /usr
/dev/mirror/gm0s1d    3.9G    156M    3.4G     4%    /var
storage               512G    1.7G    510G     0%    /storage
storage/pgsql         512G    1.7G    510G     0%    /storage/pgsql
storage/bacula        3.7T    3.2T    510G    87%    /storage/bacula
storage/Retored       510G     39K    510G     0%    /storage/Retored


# zpool export storage
# zpool import storage


# df -h
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0s1a    989M    508M    402M    56%    /
devfs                 1.0K    1.0K      0B   100%    /dev
/dev/mirror/gm0s1e    3.9G    500K    3.6G     0%    /tmp
/dev/mirror/gm0s1f     58G    4.6G     48G     9%    /usr
/dev/mirror/gm0s1d    3.9G    156M    3.4G     4%    /var
storage               5.0T    1.7G    5.0T     0%    /storage
storage/Retored       5.0T     39K    5.0T     0%    /storage/Retored
storage/bacula        8.2T    3.2T    5.0T    39%    /storage/bacula
storage/pgsql         5.0T    1.7G    5.0T     0%    /storage/pgsql

THERE! And:

$ zfs get used,available storage
NAME     PROPERTY   VALUE    SOURCE
storage  used       3.92T    -
storage  available  4.96T    -

For a total of 8.88TB. That’s good enough!

I’m told
that more recent versions of ZFS (>= 16) includes a new autoexpand property. If this
property is set on a given pool, the available space will be
made available automatically as soon as the last disk in a vdev is
replaced.

Any questions? Suggestions?

2 thoughts on “ZFS: Resizing your zpool”

  1. This right here is why you are my hero. I’ve been wanting ZFS since it came out but was both waiting for the code to mature and for prices to come down enough enough to make my raid5 array long obsolete. Who would have thunk it, me actually wanting/preferring software raid but the features are too hard to resist. 1.8 * 7 = Just over 12 TB, enough to make a grown geek cry.

    I have to admit I only just skimmed over the whole thing the first time but I’m going to have to go back and reread it a few times. Still not sure why you created the sparse files/md drives. I though ZFS had a simple add to pool command.

    Anyways, thank you for sharing your Diary all these years.

    1. You can add HDD to a pool, but you cannot go from raidz1 to raidz2 for example.

      From <http://en.wikipedia.org/wiki/ZFS#Limitations&gt;:

      It is not possible to add a disk as a column to a RAID-Z, RAID-Z2, or RAID-Z3 vdev. This feature depends on the block pointer rewrite functionality due to be added soon. You can however create a new RAID-Z vdev and add it to the zpool.

      I should explain this in more detail. In short, I had to create a 7 vdev array with only 5 physical HDD.


      The Man Behind The Curtain

Leave a Comment

Scroll to Top