ZFS: Resizing your zpool
I’m about to rebuild my ZFS array
(which I documented in my other diary). The array has been running
for a while, but I recently learned some new facts about ZFS which spurred me on to rebuilding my array
with future-proofing in mind.
This is my plan
for tonight. As I type this, Jerry is
over tonight, doing the heavy lifting for me. I am nursing a broken left elbow. The two new HDD have
been installed and the system has been powered back up.
Tonight we will do the following:
- identify the newly installed HDD
- put a file system on those HDD
- copy the existing ZFS array over to that new FS (call this temp)
- destory the existing ZFS array
- parition each individual drive using gpart
- add the drives back into the array
- copy the data back
- partition the two new FS and put them into the new array
My approach works because the existing data can fit on the two new HDD.
I have already covered how I’m going to use gpart to partition and label my HDD.
See ZFS: don’t give it all your HDD for details on that.
Identifying the new HDD
Jerry and I just inserted the two new hard drives and put the system back together and powered it up.
This is the full dmesg output after installing the new HDD:
Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 dan@kraken.example.org:/usr/obj/usr/src/sys/KRAKEN amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Phenom(tm) II X4 945 Processor (3010.17-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f42 Stepping = 2 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x802009<SSE3,MON,CX16,POPCNT> AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 4113461248 (3922 MB) ACPI APIC Table: <111909 APIC1708> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0 0/1 (20100121/tbfadt-655) ioapic0 <Version 2.1> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <111909 RSDT1708> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of fee00000, 1000 (3) failed acpi0: reservation of ffb80000, 80000 (3) failed acpi0: reservation of fec10000, 20 (3) failed acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, dfe00000 (3) failed ACPI HPET table warning: Sequence is non-zero (2) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0 pci8: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0xec00-0xec1f mem 0xfbfe0000-0xfbffffff,0xfbf00000-0xfbf7ffff,0xfbfdc000-0xfbfdffff irq 18 at device 0.0 on pci8 em0: Using MSIX interrupts em0: [ITHREAD] em0: [ITHREAD] em0: [ITHREAD] em0: Ethernet address: 00:1b:21:51:ab:2d pcib2: <ACPI PCI-PCI bridge> irq 17 at device 5.0 on pci0 pci6: <ACPI PCI bus> on pcib2 pcib3: <PCI-PCI bridge> irq 17 at device 0.0 on pci6 pci7: <PCI bus> on pcib3 siis0: <SiI3124 SATA controller> port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef0000-0xfbef7fff irq 17 at device 4.0 on pci7 siis0: [ITHREAD] siisch0: <SIIS channel> at channel 0 on siis0 siisch0: [ITHREAD] siisch1: <SIIS channel> at channel 1 on siis0 siisch1: [ITHREAD] siisch2: <SIIS channel> at channel 2 on siis0 siisch2: [ITHREAD] siisch3: <SIIS channel> at channel 3 on siis0 siisch3: [ITHREAD] pcib4: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0 pci5: <ACPI PCI bus> on pcib4 re0: <RealTek 8168/8168B/8168C/8168CP/8168D/8168DP/8111B/8111C/8111CP/8111DP PCIe Gigabit Ethernet> port 0xc800-0xc8ff mem 0xfbdff000-0xfbdfffff irq 18 at device 0.0 on pci5 re0: Using 1 MSI messages re0: Chip rev. 0x38000000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211B media interface> PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto re0: Ethernet address: e0:cb:4e:42:f0:ff re0: [FILTER] pcib5: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0 pci4: <ACPI PCI bus> on pcib5 fwohci0: <1394 Open Host Controller Interface> port 0xb800-0xb8ff mem 0xfbcff800-0xfbcfffff irq 19 at device 0.0 on pci4 fwohci0: [ITHREAD] fwohci0: OHCI version 1.10 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:1e:8c:00:00:c4:3c:f9 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: <IEEE1394(FireWire) bus> on fwohci0 dcons_crom0: <dcons configuration ROM> on firewire0 dcons_crom0: bus_addr 0x1574000 fwe0: <Ethernet over FireWire> on firewire0 if_fwe0: Fake Ethernet address: 02:1e:8c:c4:3c:f9 fwe0: Ethernet address: 02:1e:8c:c4:3c:f9 fwip0: <IP over FireWire> on firewire0 fwip0: Firewire address: 00:1e:8c:00:00:c4:3c:f9 @ 0xfffe00000000, S400, maxrec 2048 fwohci0: Initiate bus reset fwohci0: fwohci_intr_core: BUS reset fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1, CYCLEMASTER mode pcib6: <ACPI PCI-PCI bridge> irq 19 at device 11.0 on pci0 pci2: <ACPI PCI bus> on pcib6 pcib7: <PCI-PCI bridge> irq 19 at device 0.0 on pci2 pci3: <PCI bus> on pcib7 siis1: <SiI3124 SATA controller> port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf0000-0xfbbf7fff irq 19 at device 4.0 on pci3 siis1: [ITHREAD] siisch4: <SIIS channel> at channel 0 on siis1 siisch4: [ITHREAD] siisch5: <SIIS channel> at channel 1 on siis1 siisch5: [ITHREAD] siisch6: <SIIS channel> at channel 2 on siis1 siisch6: [ITHREAD] siisch7: <SIIS channel> at channel 3 on siis1 siisch7: [ITHREAD] ahci0: <ATI IXP700 AHCI SATA controller> port 0x8000-0x8007,0x7000-0x7003,0x6000-0x6007,0x5000-0x5003,0x4000-0x400f mem 0xfb3fe400-0xfb3fe7ff irq 22 at device 17.0 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich2: [ITHREAD] ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich3: [ITHREAD] ohci0: <OHCI (generic) USB controller> mem 0xfb3f6000-0xfb3f6fff irq 16 at device 18.0 on pci0 ohci0: [ITHREAD] usbus0: <OHCI (generic) USB controller> on ohci0 ohci1: <OHCI (generic) USB controller> mem 0xfb3f7000-0xfb3f7fff irq 16 at device 18.1 on pci0 ohci1: [ITHREAD] usbus1: <OHCI (generic) USB controller> on ohci1 ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfb3fe800-0xfb3fe8ff irq 17 at device 18.2 on pci0 ehci0: [ITHREAD] ehci0: AMD SB600/700 quirk applied usbus2: EHCI version 1.0 usbus2: <EHCI (generic) USB 2.0 controller> on ehci0 ohci2: <OHCI (generic) USB controller> mem 0xfb3fc000-0xfb3fcfff irq 18 at device 19.0 on pci0 ohci2: [ITHREAD] usbus3: <OHCI (generic) USB controller> on ohci2 ohci3: <OHCI (generic) USB controller> mem 0xfb3fd000-0xfb3fdfff irq 18 at device 19.1 on pci0 ohci3: [ITHREAD] usbus4: <OHCI (generic) USB controller> on ohci3 ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfb3fec00-0xfb3fecff irq 19 at device 19.2 on pci0 ehci1: [ITHREAD] ehci1: AMD SB600/700 quirk applied usbus5: EHCI version 1.0 usbus5: <EHCI (generic) USB 2.0 controller> on ehci1 pci0: <serial bus, SMBus> at device 20.0 (no driver attached) atapci0: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] pci0: <multimedia, HDA> at device 20.2 (no driver attached) isab0: <PCI-ISA bridge> at device 20.3 on pci0 isa0: <ISA bus> on isab0 pcib8: <ACPI PCI-PCI bridge> at device 20.4 on pci0 pci1: <ACPI PCI bus> on pcib8 vgapci0: <VGA-compatible display> mem 0xfb400000-0xfb7fffff,0xfbad0000-0xfbadffff irq 20 at device 5.0 on pci1 ahc0: <Adaptec 2944 Ultra SCSI adapter> port 0x9800-0x98ff mem 0xfbaff000-0xfbafffff irq 21 at device 6.0 on pci1 ahc0: [ITHREAD] aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs ohci4: <OHCI (generic) USB controller> mem 0xfb3ff000-0xfb3fffff irq 18 at device 20.5 on pci0 ohci4: [ITHREAD] usbus6: <OHCI (generic) USB controller> on ohci4 acpi_button0: <Power Button> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] cpu0: <ACPI CPU> on acpi0 acpi_throttle0: <ACPI CPU Throttling> on cpu0 hwpstate0: <Cool`n'Quiet 2.0> on cpu0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc97ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range Timecounters tick every 1.000 msec firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me) firewire0: bus manager 0 (noperiph:siisch4:0:-1:-1): rescan already queued (noperiph:siisch5:0:-1:-1): rescan already queued (noperiph:siisch6:0:-1:-1): rescan already queued (noperiph:siisch7:0:-1:-1): rescan already queued (noperiph:siisch0:0:-1:-1): rescan already queued (noperiph:siisch2:0:-1:-1): rescan already queued (noperiph:siisch3:0:-1:-1): rescan already queued usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 480Mbps High Speed USB v2.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 480Mbps High Speed USB v2.0 usbus6: 12Mbps Full Speed USB v1.0 ugen0.1: <ATI> at usbus0 uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <ATI> at usbus1 uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <ATI> at usbus2 uhub2: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2 ugen3.1: <ATI> at usbus3 uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3 ugen4.1: <ATI> at usbus4 uhub4: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen5.1: <ATI> at usbus5 uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5 ugen6.1: <ATI> at usbus6 uhub6: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6 uhub6: 2 ports with 2 removable, self powered uhub0: 3 ports with 3 removable, self powered uhub1: 3 ports with 3 removable, self powered uhub3: 3 ports with 3 removable, self powered uhub4: 3 ports with 3 removable, self powered uhub2: 6 ports with 6 removable, self powered uhub5: 6 ports with 6 removable, self powered (probe0:ahc0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:ahc0:0:0:0): CAM status: SCSI Status Error (probe0:ahc0:0:0:0): SCSI status: Check Condition (probe0:ahc0:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (probe5:ahc0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe5:ahc0:0:5:0): CAM status: SCSI Status Error (probe5:ahc0:0:5:0): SCSI status: Check Condition (probe5:ahc0:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) ada0 at siisch0 bus 0 scbus0 target 0 lun 0 ada0: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1 at siisch2 bus 0 scbus2 target 0 lun 0 ada1: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2 at siisch3 bus 0 scbus3 target 0 lun 0 ada2: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3 at siisch4 bus 0 scbus4 target 0 lun 0 ada3: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x devicech0 at ahc0 bus 0 scbus12 target 0 lun 0 ch0: <DEC TL800 (C) DEC 0326> Removable Changer SCSI-2 device ch0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit) ch0: 10 slots, 1 drive, 1 picker, 0 portals ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada4 at siisch5 bus 0 scbus5 target 0 lun 0 ada4: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada5 at siisch6 bus 0 scbus6 target 0 lun 0 ada5: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada6 at siisch7 bus 0 scbus7 target 0 lun 0 ada6: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada6: Command Queueing enabled ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada7 at ahcich0 bus 0 scbus8 target 0 lun 0 ada7: <ST380815AS 4.AAB> ATA-7 SATA 2.x device ada7: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada7: Command Queueing enabled ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C) ada8 at ahcich2 bus 0 scbus10 target 0 lun 0 ada8: <WDC WD1600AAJS-75M0A0 02.03E02> ATA-8 SATA 2.x device ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada8: Command Queueing enabled ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C) sa0 at ahc0 bus 0 scbus12 target 5 lun 0 sa0: <DEC TZ89 (C) DEC 1837> Removable Sequential Access SCSI-2 device sa0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit) SMP: AP CPU #2 Launched! cd0 at ahcich1 bus 0 scbus9 target 0 lun 0SMP: AP CPU #1 Launched! cd0: <TSSTcorp CDDVDW SH-S223C SB01> Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)SMP: AP CPU #3 Launched! cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed GEOM_MIRROR: Device mirror/gm0 launched (2/2). GEOM: mirror/gm0s1: geometry does not match label (16h,63s != 255h,63s). Trying to mount root from ufs:/dev/mirror/gm0s1a ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. ZFS filesystem version 3 ZFS storage pool version 14
That’s a good start, but it doesn’t tell which are the new drives.
However, let’s see what drives are already in use. From the following command,
I can get the list of HDD used in the existing array:
pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 errors: No known data errors
I happen to know my OS runs on a gmirror. This i my gmirror array, which I boot from:
Name Status Components mirror/gm0 COMPLETE ada7 ada8
That leaves me with ada0 and ada6 as the new drives. If you grep for ada drives in the above dmsg output
you’ll find this is the list of ada devices from dmesg:
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C) ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C)
This machine has two SATA controllers. We added one new HDD to each controller. One now has 4 drives,
the other has three.
Copying data to the new HDD
I am omitting the steps to partition, label, and mount the two new drives. And the
newfs. That’s not relevant to this tutorial, which is mostly about ZFS.
I now have the following.
# mount /dev/mirror/gm0s1a on / (ufs, local, soft-updates) devfs on /dev (devfs, local, multilabel) /dev/mirror/gm0s1e on /tmp (ufs, local, soft-updates) /dev/mirror/gm0s1f on /usr (ufs, local, soft-updates) /dev/mirror/gm0s1d on /var (ufs, local, soft-updates) storage on /storage (zfs, local) /dev/ada0s1d on /new0 (ufs, local, soft-updates) /dev/ada6s1d on /new6 (ufs, local, soft-updates) #
Which translates to:
$ df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0s1a 989M 494M 416M 54% / devfs 1.0K 1.0K 0B 100% /dev /dev/mirror/gm0s1e 3.9G 70K 3.6G 0% /tmp /dev/mirror/gm0s1f 58G 4.5G 49G 8% /usr /dev/mirror/gm0s1d 3.9G 152M 3.4G 4% /var storage 7.1T 3.1T 4.0T 43% /storage /dev/ada0s1d 1.8T 4.0K 1.6T 0% /new0 /dev/ada6s1d 1.8T 4.0K 1.6T 0% /new6
Testing the existing ZFS array
For future comparison, here is a simple test on the existing ZFS array:
# dd if=/dev/random of=/storage/dan/NewDriveTesting/file1 bs=1m count=20480 20480+0 records in 20480+0 records out 21474836480 bytes transferred in 333.867807 secs (64321375 bytes/sec)
Not very astounding. But there’s a reason. This is CPU bound. If I then try
copying that file around, it’s a better representation of the power. Compare
that to the time for /dev/zero:
# dd if=/dev/zero of=/storage/dan/NewDriveTesting/file-zero bs=1m count=20480 20480+0 records in 20480+0 records in 20480+0 records out 21474836480 bytes transferred in 124.919368 secs (171909583 bytes/sec)
Copying data off the array
I’ve divided up my data into two parts, one for each of the two HDD.
This 897G copy just started:
# cd /storage/bacula/volumes # cp -rp FileAuto-0* bast catalog dbclone kraken laptop-freebsd laptop-vista latens \ nyi polo supernews /new0/bacula/volumes/
And this 1.8T:
# cd /storage/bacula/volumes/ngaio # cp -rp FileAuto-0{1..7}* /new6/bacula/volumes/ngaio/
At present, the zpool iostat is running like this:
$ zpool iostat 30 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- storage 3.92T 5.15T 514 0 63.8M 0 storage 3.92T 5.15T 493 3 61.2M 6.23K storage 3.92T 5.15T 505 0 62.7M 0 storage 3.92T 5.15T 499 3 62.0M 6.30K storage 3.92T 5.15T 514 0 63.8M 0 storage 3.92T 5.15T 601 3 74.7M 6.13K storage 3.92T 5.15T 604 0 74.9M 0 storage 3.92T 5.15T 754 3 93.7M 6.17K storage 3.92T 5.15T 713 0 88.5M 0 storage 3.92T 5.15T 645 4 80.1M 7.48K storage 3.92T 5.15T 725 0 90.1M 0 storage 3.92T 5.15T 717 3 89.0M 6.73K
I may be waiting a while… 🙂 12 hours by my reckoning.
So far, as of 9:39 PM EST:
$ df -h /new0 /new6 Filesystem Size Used Avail Capacity Mounted on /dev/ada0s1d 1.8T 178G 1.4T 11% /new0 /dev/ada6s1d 1.8T 131G 1.5T 8% /new6
11:49PM – gone to bed. expect no updates until after 10AM EST.
9:00 AM
It appears my calculations were incorrect. I have run out of space:
# cp -rp FileAuto-0{1..7}* /new6/bacula/volumes/ngaio/ cp: FileAuto-01*: No such file or directory /new6: write failed, filesystem is full cp: /new6/bacula/volumes/ngaio/FileAuto-0291: No space left on device /new6: write failed, filesystem is full cp: /new6/bacula/volumes/ngaio/FileAuto-0290: No space left on device cp: /new6/bacula/volumes/ngaio/FileAuto-0289: No space left on device cp: /new6/bacula/volumes/ngaio/FileAuto-0288: No space left on device cp: /new6/bacula/volumes/ngaio/FileAuto-0280: No space left on device cp: /new6/bacula/volumes/ngaio/FileAuto-0276: No space left on device cp: /new6/bacula/volumes/ngaio/FileAuto-0799: No space left on device #
10:43 AM
Now we’re copying again… this time, using rsync FTW.
11:11 AM
The final copy:
# df -h /new? Filesystem Size Used Avail Capacity Mounted on /dev/ada0s1d 1.8T 1.0T 640G 61% /new0 /dev/ada6s1d 1.8T 1.8T -143G 109% /new6 # cd /storage/bacula/volumes/ngaio # cp -rp FileAuto-08* FileAuto-09* /new0/bacula/volumes/ngaio
12:47 PM
The copy has finished. Now I wish to verify. rsync will help.
1:57 PM
I have a complete list of files on both old and new filesystems. They look great.
Destroying the old pool
With the backup complete and confirmed, I’m ready to offline the backup drives,
just to avoid to mistakes, using the umount command.
Time to destory the existing pool. This removes data.
$ zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 errors: No known data errors $ sudo zpool destroy -f storage $ zpool status no pools available $
There. Gone. Time to start rebuilding the array. I went through the partitioning
process describe in ZFS: don’t give it all your HDD.
Creating a 7HDD zpool with only 5HDD (fails)
WARNING: This attempt failed. Read the next section for the successful approach.
I want to create a new zpool that contains all 7 HDD. The problem is, two of those HDD now contain
my data. I will take this approach to solving this:
- create two sparse files
- create a new zpool with the 5HDD and the two sparse files
- remove the two sparse files from the array
- copy data from my 2 HDD to the array
- add my two HDD into the array filling the two empty slots
First, I destroy my existing pool. WARNING: This destroys all data in the pool.
zpool destroy storage
Now I create two sparse files, roughly the same size as the HDD partitions. Actually, slightly smaller.
$ dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=1862g 0+0 records in 0+0 records out 0 bytes transferred in 0.000010 secs (0 bytes/sec) $ dd if=/dev/zero of=/tmp/sparsefile2.img bs=1 count=0 oseek=1862g 0+0 records in 0+0 records out 0 bytes transferred in 0.000011 secs (0 bytes/sec) $ ls -l /tmp/sparsefile2.img /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 1999307276288 Jul 25 12:52 /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 1999307276288 Jul 25 12:52 /tmp/sparsefile2.img $ ls -ls /tmp/sparsefile2.img /tmp/sparsefile1.img 64 -rw-r--r-- 1 dan wheel 1999307276288 Jul 25 12:52 /tmp/sparsefile1.img 64 -rw-r--r-- 1 dan wheel 1999307276288 Jul 25 12:52 /tmp/sparsefile2.img
Although these sparse files look
to be 1862GB in size, they only take up 64 blocks, as shown in the last ls output.
This command creates a new pool that includes the above two files. It is the same
zpool command used in the tests above, but just includes two more parameters:
# zpool create storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 \ /tmp/sparsefile1.img /tmp/sparsefile2.img invalid vdev specification use '-f' to override the following errors: mismatched replication level: raidz contains both files and devices
Oh damn. Yes. Umm, let’s try that -f option.
# zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 \ /tmp/sparsefile1.img /tmp/sparsefile2.img # zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img ONLINE 0 0 0 /tmp/sparsefile2.img ONLINE 0 0 0 errors: No known data errors
Let’s offline the two sparse files from the array.
# zpool detach storage /tmp/sparsefile2.img cannot detach /tmp/sparsefile2.img: only applicable to mirror and replacing vdevs # zpool remove storage /tmp/sparsefile2.img cannot remove /tmp/sparsefile2.img: only inactive hot spares or cache devices can be removed
OK, neither of those work. Let’s try this:
# zpool offline storage /tmp/sparsefile2.img
Oh oh. The system went away. Panic I bet.. Yes.. System panic. But after reboot, I went
ahead and deleted the two sparse files out from underneath ZFS. That is not the
right thing to do. Now I see:
# zpool status pool: storage state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 cannot open /tmp/sparsefile2.img UNAVAIL 0 0 0 cannot open errors: No known data errors
So… let’s try a scrub… no, that dies too. So does ‘zpool destroy storage’.
From email:
rm /boot/zfs/zpool.cache
then wipe the drives/partitions by writing 16KB at the beginning and end
# dd if=/dev/zero of=/dev/ada1 bs=512 count=32 32+0 records in 32+0 records out 16384 bytes transferred in 0.008233 secs (1990023 bytes/sec) # dd if=/dev/zero of=/dev/ada1 bs=512 count=32 oseek=3907029073 32+0 records in 32+0 records out 16384 bytes transferred in 0.008202 secs (1997601 bytes/sec)
Repeat for ada2..5
but that was on the raw device. I should have done it on the partition:
# dd if=/dev/zero of=/dev/gpt/disk01 bs=512 count=32 32+0 records in 32+0 records out 16384 bytes transferred in 0.008974 secs (1825752 bytes/sec) # dd if=/dev/zero of=/dev/gpt/disk01 bs=512 count=32 oseek=3906824269 32+0 records in 32+0 records out 16384 bytes transferred in 0.008934 secs (1833889 bytes/sec)
Where did I get these values? From ‘gpart show’:
=< 34 3907029101 ada1 GPT (1.8T) 34 990 - free - (495K) 1024 3906824301 1 freebsd-zfs (1.8T) 3906825325 203810 - free - (100M)
Am I doing the right math?
3906824301 (from above) – 32 blocks = 3906824269 (used below in oseek)
Now to try memory disks…
# mdconfig -a -t malloc -s 1862g -u 0 # mdconfig -a -t malloc -s 1862g -u 1 # zpool create storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 /dev/md0 /dev/md1 invalid vdev specification use '-f' to override the following errors: raidz contains devices of different sizes # zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 /dev/md0 /dev/md1 # zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 md0 ONLINE 0 0 0 md1 ONLINE 0 0 0 errors: No known data errors # zpool offline storage md0 # zpool offline storage md1 cannot offline md1: no valid replicas
Creating a 7HDD zpool with only 5HDD (succeeds)
I may have a cunning plan, suggested to me by Pawel Tyll:
Given that I have:
- 5 empty 2 TB HDD
- 2 full 2 TB HDD
- FreeBSD 8.1-STABLE
- For each of my 5 HDD, create 2x1TB partitions (labeled live and backup respectively)
- Create a 7-device raidz2 zpool using one partition from each HDD and two /dev/md devices; call this live.
- Create a 5-device raidz1 zpool using 1TB partitions from each HDD; call this backup
- Copy the data from the two HDD into the zpool backup.
- scrub the backup pool to ensure it’s OK
- create a 2TB partition on each of the 2 HDD which are not in any pool
- replace the two md units in the live pool with the 2 HDD
- try a zfs send | zfs receive from the backup pool to the live pool
- scrub the live pool
- destroy the backup pool
-
for i = 1..5 offline drive i partition into 1x2TB drive. put back into live pool using replace end for
Hmm, that’s pretty straight forward and very cunning.
In the below, you can see that the first part of each HDD is used for the backup pool,
and the second part is used for the live pool. That is not ideal. For this situation,
it is better to put the backup data on the second partition, and the live data on the
first partition. Why? When we go to drop the backup partition, we can just grow
the live partition, and retain our data. This is untested. 🙂
So this is the partitioning I used:
# gpart add -b 1024 -s 1953412151 -t freebsd-zfs -l disk05-backup ada5 ada5p1 added # gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk05-live ada5 ada5p2 added # gpart show ada5 => 34 3907029101 ada5 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) # gpart add -b 1024 -s 1953412151 -t freebsd-zfs -l disk04-backup ada4 ada4p1 added # gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk04-live ada4 ada4p2 added # gpart add -b 1024 -s 1953412151 -t freebsd-zfs -l disk03-backup ada3 ada3p1 added # gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk03-live ada3 ada3p2 added # gpart add -b 1024 -s 1953412151 -t freebsd-zfs -l disk02-backup ada2 ada2p1 added # gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk02-live ada2 ada2p2 added # gpart add -b 1024 -s 1953412151 -t freebsd-zfs -l disk01-backup ada1 ada1p1 added # gpart add -b 1953413175 -s 1953412151 -t freebsd-zfs -l disk01-live ada1
Now let’s look at the partision on those drives:
# gpart show ada1 ada2 ada3 ada4 ada5 => 34 3907029101 ada1 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada2 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada3 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada4 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada5 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) #
Now we configure two memory disks:
# mdconfig -a -t malloc -s 931g -u 0 # mdconfig -a -t malloc -s 931g -u 1
Now we create the 7 vdev raidz2, with the live partitions from 5HDD, and two memory disks:
# zpool create -f storage raidz2 gpt/disk01-live gpt/disk02-live gpt/disk03-live gpt/disk04-live gpt/disk05-live /dev/md0 /dev/md1 # zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 md0 ONLINE 0 0 0 md1 ONLINE 0 0 0 errors: No known data errors #
Next, we create the other pool, which will be raidz1 on 5 vdevs:
# zpool create -f MyBackup raidz1 gpt/disk01-backup gpt/disk02-backup gpt/disk03-backup gpt/disk04-backup gpt/disk05-backup # zpool status pool: MyBackup state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM MyBackup ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk01-backup ONLINE 0 0 0 gpt/disk02-backup ONLINE 0 0 0 gpt/disk03-backup ONLINE 0 0 0 gpt/disk04-backup ONLINE 0 0 0 gpt/disk05-backup ONLINE 0 0 0 errors: No known data errors pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 md0 ONLINE 0 0 0 md1 ONLINE 0 0 0 errors: No known data errors
Copying data to the backup pool
The following iostat output shows the files copying from the two temp HDD to the backup pool.
# zpool iostat MyBackup 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- MyBackup 5.62G 4.53T 0 327 78 29.7M MyBackup 5.67G 4.53T 0 480 0 40.7M MyBackup 5.72G 4.53T 0 528 0 48.7M MyBackup 5.77G 4.53T 0 383 0 31.6M MyBackup 5.82G 4.53T 0 482 0 42.7M MyBackup 5.87G 4.53T 0 509 0 47.2M MyBackup 5.92G 4.53T 0 451 0 36.9M MyBackup 5.96G 4.53T 0 424 0 34.6M MyBackup 5.99G 4.53T 0 460 0 40.5M MyBackup 6.04G 4.53T 0 457 0 40.4M MyBackup 6.09G 4.53T 0 457 0 40.4M MyBackup 6.14G 4.53T 0 457 0 40.5M MyBackup 6.19G 4.53T 0 456 0 40.3M MyBackup 6.23G 4.53T 0 456 0 40.3M MyBackup 6.28G 4.53T 0 388 0 39.8M MyBackup 6.33G 4.53T 0 488 0 41.1M MyBackup 6.38G 4.53T 0 554 0 40.8M MyBackup 6.43G 4.52T 0 457 0 40.4M MyBackup 6.48G 4.52T 0 459 0 40.5M MyBackup 6.53G 4.52T 0 475 0 40.5M MyBackup 6.58G 4.52T 0 477 0 40.5M MyBackup 6.65G 4.52T 0 464 0 41.2M MyBackup 6.70G 4.52T 0 586 0 55.5M MyBackup 6.75G 4.52T 0 528 0 41.7M MyBackup 6.79G 4.52T 0 467 0 38.5M MyBackup 6.84G 4.52T 0 447 0 38.5M MyBackup 6.89G 4.52T 0 445 0 38.4M MyBackup 6.96G 4.52T 0 513 0 39.1M MyBackup 6.98G 4.52T 0 445 0 38.4M MyBackup 7.03G 4.52T 0 445 0 38.5M MyBackup 7.10G 4.52T 0 449 0 38.3M MyBackup 7.15G 4.52T 0 513 0 46.1M MyBackup 7.19G 4.52T 0 651 0 50.2M MyBackup 7.24G 4.52T 0 445 0 38.5M MyBackup 7.31G 4.52T 0 515 0 46.5M MyBackup 7.36G 4.52T 0 598 0 49.7M MyBackup 7.43G 4.52T 0 445 0 38.5M MyBackup 7.47G 4.52T 0 687 0 57.7M MyBackup 7.54G 4.52T 0 447 0 38.5M MyBackup 7.59G 4.52T 0 544 0 50.0M MyBackup 7.64G 4.52T 0 571 0 46.2M MyBackup 7.68G 4.52T 0 451 0 38.5M MyBackup 7.75G 4.52T 0 449 0 38.4M MyBackup 7.80G 4.52T 0 676 0 57.7M MyBackup 7.85G 4.52T 0 557 0 43.9M MyBackup 7.92G 4.52T 0 580 0 53.2M MyBackup 7.97G 4.52T 0 426 0 36.2M MyBackup 8.02G 4.52T 0 398 0 27.0M MyBackup 8.06G 4.52T 0 451 0 38.7M MyBackup 8.11G 4.52T 0 480 0 41.7M MyBackup 8.16G 4.52T 0 447 0 35.7M MyBackup 8.21G 4.52T 0 484 0 42.7M MyBackup 8.25G 4.52T 0 554 0 49.3M MyBackup 8.32G 4.52T 0 454 0 38.5M MyBackup 8.34G 4.52T 0 503 0 37.3M MyBackup 8.38G 4.52T 0 437 0 36.9M MyBackup 8.45G 4.52T 0 438 0 37.1M MyBackup 8.50G 4.52T 0 691 0 55.5M MyBackup 8.54G 4.52T 0 439 0 36.9M MyBackup 8.61G 4.52T 0 437 0 36.9M MyBackup 8.65G 4.52T 0 580 0 53.4M MyBackup 8.72G 4.52T 0 563 0 39.8M MyBackup 8.75G 4.52T 0 455 0 37.0M MyBackup 8.79G 4.52T 0 438 0 37.1M MyBackup 8.84G 4.52T 0 374 0 26.7M MyBackup 8.88G 4.52T 0 579 0 47.6M MyBackup 8.93G 4.52T 0 442 0 37.2M MyBackup 8.95G 4.52T 0 197 0 18.3M MyBackup 8.97G 4.52T 0 14 0 19.0K MyBackup 8.97G 4.52T 0 104 0 5.33M MyBackup 8.98G 4.52T 0 129 0 11.9M MyBackup 8.98G 4.52T 0 36 0 47.0K MyBackup 9.00G 4.52T 0 321 0 22.4M MyBackup 9.04G 4.52T 0 413 0 35.4M MyBackup 9.08G 4.52T 0 428 0 35.5M MyBackup 9.13G 4.52T 0 445 0 35.5M MyBackup 9.17G 4.52T 0 497 0 35.8M MyBackup 9.21G 4.52T 0 425 0 35.5M MyBackup 9.28G 4.52T 0 442 0 36.9M MyBackup 9.32G 4.52T 0 621 0 51.7M MyBackup 9.36G 4.52T 0 425 0 35.4M MyBackup 9.43G 4.52T 0 519 0 46.4M MyBackup 9.47G 4.52T 0 564 0 42.2M MyBackup 9.54G 4.52T 0 644 0 53.5M MyBackup 9.60G 4.52T 0 529 0 47.6M MyBackup 9.66G 4.52T 0 562 0 43.5M MyBackup 9.73G 4.52T 0 616 0 50.8M MyBackup 9.78G 4.52T 0 711 0 59.9M MyBackup 9.85G 4.52T 0 692 0 59.9M MyBackup 9.92G 4.52T 0 477 0 39.7M MyBackup 10.0G 4.52T 0 688 0 59.7M MyBackup 10.0G 4.52T 0 691 0 59.5M MyBackup 10.1G 4.52T 0 520 0 39.9M MyBackup 10.2G 4.52T 0 690 0 59.5M MyBackup 10.2G 4.52T 0 457 0 39.4M ^C
10:39 PM
And the copy continues:
$ zpool iostat MyBackup 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- MyBackup 2.68T 1.85T 0 456 1.19K 25.4M MyBackup 2.68T 1.85T 0 487 0 27.8M MyBackup 2.68T 1.85T 0 488 0 27.9M MyBackup 2.68T 1.85T 0 323 0 18.0M MyBackup 2.68T 1.85T 1 545 2.50K 28.4M MyBackup 2.68T 1.85T 1 489 3.00K 27.7M MyBackup 2.68T 1.85T 0 324 0 18.4M MyBackup 2.68T 1.85T 1 486 3.00K 27.7M MyBackup 2.68T 1.85T 0 331 0 18.9M
4:48 AM
only 500GB to go. At 30M/s, that should take about 5 hours…
$ df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0s1a 989M 508M 402M 56% / devfs 1.0K 1.0K 0B 100% /dev /dev/mirror/gm0s1e 3.9G 496K 3.6G 0% /tmp /dev/mirror/gm0s1f 58G 4.6G 48G 9% /usr /dev/mirror/gm0s1d 3.9G 155M 3.4G 4% /var /dev/ada0s1d 1.8T 1.4T 264G 84% /new0 /dev/ada6s1d 1.8T 1.7T -125G 108% /new6 storage 4.4T 39K 4.4T 0% /storage MyBackup 3.6T 2.6T 940G 74% /MyBackup
A couple of processes have been busy:
$ ps auwx | grep cp root 12709 0.0 0.0 5824 2048 3 D+ Sun09PM 31:32.24 cp -rp /new6/bacula . root 12692 0.0 0.1 5824 2120 1 D+ Sun09PM 31:43.20 cp -rp /new0/bacula /new0/pgsql .
The iostat of the two source HDD:
$ iostat ada0 ada6 2 tty ada0 ada6 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 5 127.52 95 11.88 127.53 94 11.76 0 0 3 1 96 0 92 127.44 100 12.51 127.39 92 11.51 0 0 4 0 96 0 31 127.52 116 14.50 128.00 76 9.50 0 0 3 0 96 0 31 127.42 96 12.00 127.39 92 11.50 0 0 3 0 97 0 31 128.00 104 13.00 127.39 92 11.50 0 0 4 1 95 0 31 127.34 84 10.50 128.00 108 13.49 0 0 2 1 97 0 31 127.60 140 17.50 127.39 92 11.50 0 0 3 1 95 0 31 127.44 100 12.50 127.39 92 11.50 0 0 4 1 96 0 31 127.39 92 11.50 127.46 104 13.00 0 0 4 0 96 0 31 127.54 120 15.00 127.50 112 14.00 0 0 5 0 95 0 31 128.00 96 11.99 128.00 96 11.99 0 0 2 1 97 0 31 127.47 105 13.13 127.47 105 13.13 0 0 3 0 97 0 31 127.59 135 16.87 127.53 119 14.87 0 0 4 1 95 0 31 127.44 100 12.50 127.42 96 12.00 0 0 3 0 96 0 31 127.44 100 12.50 127.42 96 12.00 0 0 3 1 95 0 31 127.44 100 12.50 128.00 96 12.00 0 0 3 1 97 0 32 128.00 120 14.99 127.50 112 14.00 1 0 6 1 92
5:48 AM
One of the cp processesd has finished:
$ ps auwx | grep cp root 12709 0.9 0.0 5824 2048 3 D+ Sun09PM 32:13.30 cp -rp /new6/bacula .
7:12 AM
Oh oh, we’re down to just one TB free on the array!
MyBackup 3.52T 1.01T 0 490 0 23.9M MyBackup 3.53T 1.01T 0 509 0 23.7M MyBackup 3.53T 1.01T 0 501 51 24.6M MyBackup 3.53T 1.00T 0 490 0 23.6M MyBackup 3.53T 1.00T 0 508 0 24.4M MyBackup 3.53T 1.00T 0 475 0 22.5M MyBackup 3.53T 1.00T 0 503 0 24.5M MyBackup 3.53T 1.00T 0 520 0 25.6M MyBackup 3.53T 1.00T 0 489 0 24.0M MyBackup 3.53T 1.00T 0 475 153 22.3M MyBackup 3.53T 1.00T 0 518 0 24.8M MyBackup 3.53T 1.00T 0 513 0 25.9M MyBackup 3.53T 1024G 0 498 127 24.3M MyBackup 3.53T 1023G 0 524 0 25.3M
According to recent calculations there is about 300GB (or about 3.4 hours) left to copy:
First, the backups:
$ du -ch /new6/bacula/volumes/ngaio /new0/bacula/volumes/ngaio/ 1.7T /new6/bacula/volumes/ngaio 479G /new0/bacula/volumes/ngaio/ 2.2T total
Compared to what we have in ZFS:
$ du -ch /MyBackup/bacula/volumes/ngaio/ 1.9T /MyBackup/bacula/volumes/ngaio/ 1.9T total
There are about 456 files in the backup:
$ ls /new6/bacula/volumes/ngaio /new0/bacula/volumes/ngaio/ | wc -l 456
And about 401 files in the ZFS array:
$ ls /MyBackup/bacula/volumes/ngaio/ | wc -l 401
NOTE: this is not the total file count: it applies just to the directory now being copied.
Given that we are missing 55 files, each 5GB , more or less, that gives 275GB, which is more or less
close to the 300GB estimated above.
Scrub the backup data
12:02 PM
The backup has finished. Time for a good scrubbing then a copy to the live pool!
MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0 MyBackup 3.88T 663G 0 0 0 0
The above is an idle pool. The below is the start of the scrub.
Don’t worry, the time is overestimated and quickly drops.
# zpool scrub MyBackup # zpool status MyBackup pool: MyBackup state: ONLINE scrub: scrub in progress for 0h0m, 0.02% done, 28h26m to go config: NAME STATE READ WRITE CKSUM MyBackup ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk01-backup ONLINE 0 0 0 gpt/disk02-backup ONLINE 0 0 0 gpt/disk03-backup ONLINE 0 0 0 gpt/disk04-backup ONLINE 0 0 0 gpt/disk05-backup ONLINE 0 0 0 errors: No known data errors
By 12:47 PM, the status was:
scrub: scrub in progress for 0h44m, 14.89% done, 4h11m to go
5:26 PM
scrub: scrub completed after 4h48m with 0 errors on Tue Jul 27 16:52:04 2010
Next steps: snapshot, put those two HDD into the live pool, followed by zfs send | zfs receive
Screwing up the memory disks….
6:52 PM
One of the great features of ZFS is a send/receive function. You can send a ZFS snapshot
from one filesystem to another. In this section, I mess up the live pool, but then fix it.
Then I partition those two spare HDD and add them in the live pool.
This is how I create the snapshot for the pool:
# zfs snapshot MyBackup@2010.07.27
Note that MyBackup is the name of the pool. To see the list of snapshots:
# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT MyBackup@2010.07.27 0 - 3.10T -
Now what does the live pool look like?
# zpool status pool: MyBackup state: ONLINE scrub: scrub completed after 4h48m with 0 errors on Tue Jul 27 16:52:04 2010 config: NAME STATE READ WRITE CKSUM MyBackup ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk01-backup ONLINE 0 0 0 gpt/disk02-backup ONLINE 0 0 0 gpt/disk03-backup ONLINE 0 0 0 gpt/disk04-backup ONLINE 0 0 0 gpt/disk05-backup ONLINE 0 0 0 errors: No known data errors pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 md0 ONLINE 0 0 0 md1 ONLINE 0 0 0 errors: No known data errors #
Hmm, it’s fine. OK, that’s just because ZFS doesn’t yet know those items are gone.
Even after adding a file.
I made an error. I should not have rm’d those devices. What I did:
- shutdown -p now
- boot into single user mode
- run the same mdconfig statements I did previously
- exit
After running a scrub, the status is:
# zpool status storage pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: scrub completed after 0h0m with 0 errors on Tue Jul 27 19:02:04 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 md0 UNAVAIL 0 113 0 corrupted data md1 UNAVAIL 0 117 0 corrupted data errors: No known data errors #
Adding in the two new HDD
Let’s partition those two spare HDD:
# gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk06-live ada0 ada0p1 added # gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk07-live ada6 ada6p1 added # gpart show ada0 ada6 => 34 3907029101 ada0 GPT (1.8T) 34 2014 - free - (1.0M) 2048 3906617453 1 freebsd-zfs (1.8T) 3906619501 409634 - free - (200M) => 34 3907029101 ada6 GPT (1.8T) 34 2014 - free - (1.0M) 2048 3906617453 1 freebsd-zfs (1.8T) 3906619501 409634 - free - (200M)
These commands replace the memory drives with the new HDD:
# zpool replace storage md0 gpt/disk06-live # zpool replace storage md1 gpt/disk07-live # zpool status storage pool: storage state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Jul 27 19:29:08 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 14K resilvered gpt/disk02-live ONLINE 0 0 0 15.5K resilvered gpt/disk03-live ONLINE 0 0 0 17.5K resilvered gpt/disk04-live ONLINE 0 0 0 19K resilvered gpt/disk05-live ONLINE 0 0 0 18K resilvered gpt/disk06-live ONLINE 0 0 0 16.5K resilvered gpt/disk07-live ONLINE 0 0 0 24.5K resilvered errors: No known data errors #
Now we scrub the pool, just to be sure.
# zpool scrub storage # zpool status storage pool: storage state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Tue Jul 27 19:29:51 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 0 gpt/disk07-live ONLINE 0 0 0 errors: No known data errors #
Copying data from one ZFS array to another
The following will take a long time. I recommend doing it on the
console or in a screen session.
# zfs send MyBackup@2010.07.27 | zfs receive storage cannot receive new filesystem stream: destination 'storage' exists must specify -F to overwrite it warning: cannot send 'MyBackup@2010.07.27': Broken pipe # zfs send MyBackup@2010.07.27 | zfs receive storage/Retored
The write has started:
# zpool iostat 10 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- MyBackup 3.88T 663G 18 0 2.18M 1.03K storage 6.24G 6.31T 0 18 125 2.12M ---------- ----- ----- ----- ----- ----- ----- MyBackup 3.88T 663G 847 0 105M 0 storage 7.64G 6.31T 0 841 716 102M ---------- ----- ----- ----- ----- ----- ----- MyBackup 3.88T 663G 782 0 97.1M 0 storage 9.04G 6.30T 0 844 665 102M ---------- ----- ----- ----- ----- ----- -----
The above was done at 7:37 PM and is filling up at about 8GB/minute or about 90MB/s:
# zpool iostat storage 60 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- storage 158G 6.16T 0 292 666 35.3M storage 166G 6.15T 1 798 1.53K 96.0M storage 175G 6.14T 1 832 3.20K 100M storage 183G 6.13T 0 828 2.53K 99.8M storage 190G 6.13T 0 737 1.35K 88.9M storage 198G 6.12T 0 759 1.30K 91.6M storage 205G 6.11T 0 710 537 86.0M storage 213G 6.10T 0 770 810 93.1M storage 220G 6.10T 0 770 1.99K 92.9M storage 228G 6.09T 0 774 853 93.5M storage 236G 6.08T 0 774 989 93.5M storage 243G 6.07T 0 773 1.02K 93.5M
5:44 AM
Copy has finished. scrub started.
The scrub finished. No data loss. I’ve copied over all the data from the
backup pool to the new live pool. Now I want to start removing the old
pools.
Removing the old data
Don’t do any of this. I should have done it much later.
# zfs destroy storage/Retored cannot destroy 'storage/Retored': filesystem has children use '-r' to destroy the following datasets: storage/Retored@2010.07.27 storage/Retored@2010.07.28
OK, I don’t need those:
# zfs destroy storage/Retored@2010.07.28 cannot destroy 'storage/Retored@2010.07.28': snapshot has dependent clones use '-R' to destroy the following datasets: storage/bacula
OH wait! I need to keep storage/bacula! That’s my live data. With ZFS,
it seems a clone is forever linked to the snapshot from which it was taken.
I’d like to unlink it. I’m guessing the only way is cp. Or better yet:
I think I’ll worry about this after I get rid of the backup pool.
# zfs umount MyBackup # zpool destroy MyBackup
Repartitioning the HDD
You will recall we put two partitions on those first 5 HDD but only only one partition on the two new HDD.
Now it’s time to repartition those 5x2TB HDD which have two partitions, each
931GB, into one large parition. Look at ada1..ada5 below. And note
that ada0 and ada6 are already a single 1.8TB partition.
# gpart show => 34 3907029101 ada1 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada2 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada3 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada4 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 34 3907029101 ada5 GPT (1.8T) 34 990 - free - (495K) 1024 1953412151 1 freebsd-zfs (931G) 1953413175 1953412151 2 freebsd-zfs (931G) 3906825326 203809 - free - (100M) => 63 156301362 mirror/gm0 MBR (75G) 63 156301425 1 freebsd [active] (75G) => 0 156301425 mirror/gm0s1 BSD (75G) 0 2097152 1 freebsd-ufs (1.0G) 2097152 12582912 2 freebsd-swap (6.0G) 14680064 8388608 4 freebsd-ufs (4.0G) 23068672 8388608 5 freebsd-ufs (4.0G) 31457280 124844145 6 freebsd-ufs (60G) => 34 3907029101 ada0 GPT (1.8T) 34 2014 - free - (1.0M) 2048 3906617453 1 freebsd-zfs (1.8T) 3906619501 409634 - free - (200M) => 34 3907029101 ada6 GPT (1.8T) 34 2014 - free - (1.0M) 2048 3906617453 1 freebsd-zfs (1.8T) 3906619501 409634 - free - (200M)
Let us offline ada1.
# zpool status pool: storage state: ONLINE scrub: scrub completed after 5h12m with 0 errors on Thu Jul 29 07:51:45 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 0 gpt/disk07-live ONLINE 0 0 0 errors: No known data errors
We’ll start with gpt/disk01-live
# zpool offline storage gpt/disk01-live # zpool status pool: storage state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 5h12m with 0 errors on Thu Jul 29 07:51:45 2010 config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 gpt/disk01-live OFFLINE 0 89 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 0 gpt/disk07-live ONLINE 0 0 0 errors: No known data errors
Destroy the two partitions. All data will be lost:
# gpart delete -i 1 ada1 ada1p1 deleted # gpart delete -i 2 ada1 ada1p2 deleted # gpart show ada1 => 34 3907029101 ada1 GPT (1.8T) 34 3907029101 - free - (1.8T)
Clear out the first and last 16KB on the HDD. This may not be useful as
we have already destroy the partition…
# dd if=/dev/zero of=/dev/ada1 bs=512 count=32 32+0 records in 32+0 records out # dd if=/dev/zero of=/dev/ada1 bs=512 count=32 oseek=3907029073 32+0 records in 32+0 records out
Get the disk ready to be a GEOM device:
# gpart create -s GPT ada1 ada1 created
Create the new partition
# gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk01-live ada1 ada1p1 added # gpart show ada1 => 34 3907029101 ada1 GPT (1.8T) 34 2014 - free - (1.0M) 2048 3906617453 1 freebsd-zfs (1.8T) 3906619501 409634 - free - (200M)
Hmm, just to be safe, let’s clear the first and last 16KB of that partition (NOTE: my
first attemps fail, see below for details).
NOTE that I use ada1p1, not ada1 as above.
# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=32 32+0 records in 32+0 records out 16384 bytes transferred in 0.008515 secs (1924161 bytes/sec)
And for the last 16KB:
# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=32 oseek=3906617421 32+0 records in 32+0 records out
The number 3906617421 come from the partition size (3906617453) found in the
gpart show output minus 32 sectors of 512 bytes each, for a total of 16KB.
A rough test to make sure you did the math right: add one to the oseek parameter
if if you see this error, but not with the original oseek, you specified the right
value:
# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=32 oseek=3906617422 dd: /dev/ada1p1: end of device 32+0 records in 31+0 records out 15872 bytes transferred in 0.007949 secs (1996701 bytes/sec)
Now lets see:
# gpart status Name Status Components ada2p1 N/A ada2 ada3p1 N/A ada3 ada4p1 N/A ada4 ada5p1 N/A ada5 mirror/gm0s1 N/A mirror/gm0 mirror/gm0s1a N/A mirror/gm0s1 ada0p1 N/A ada0 ada6p1 N/A ada6 ada1p1 N/A ada1
Now it’s time to replace the old 1TB device with the new 2TB device.
# zpool replace storage gpt/disk01-live gpt/disk01-live invalid vdev specification use '-f' to override the following errors: /dev/gpt/disk01-live is part of potentially active pool 'storage'
It turns out my math was wrong. The ZFS meta-data is much larger.
Completely removing ZFS meta-data
In the previous section, the ‘invalid vdev specification’ message was trying to
tell me that the new vdev was marked as being used in an existing pool. I could
have used the -f option to force the replace. Instead, I want to completely
remove the meta-data.
It turns out that ZFS stores two labels
at the begining of the vdev, and two labels at the end. Look at this:
# zdb -l /dev/gpt/disk01-live -------------------------------------------- LABEL 0 -------------------------------------------- version=14 name='storage' state=0 txg=4 pool_guid=2339808256841075165 hostid=3600270990 hostname='kraken.example.org' top_guid=18159926173103963460 guid=16482477649650197853 vdev_tree type='raidz' id=0 guid=18159926173103963460 nparity=2 metaslab_array=23 metaslab_shift=37 ashift=9 asize=13995117903872 is_log=0 children[0] type='disk' id=0 guid=16482477649650197853 path='/dev/gpt/disk01' whole_disk=0 children[1] type='disk' id=1 guid=8540639469082160959 path='/dev/gpt/disk02' whole_disk=0 children[2] type='disk' id=2 guid=6533883554281261104 path='/dev/gpt/disk03' whole_disk=0 children[3] type='disk' id=3 guid=1801494265368466138 path='/dev/gpt/disk04' whole_disk=0 children[4] type='disk' id=4 guid=7430995867171691858 path='/dev/gpt/disk05' whole_disk=0 children[5] type='file' id=5 guid=11845728232134214029 path='/tmp/sparsefile1.img' children[6] type='file' id=6 guid=353355856440925066 path='/tmp/sparsefile2.img' -------------------------------------------- LABEL 1 -------------------------------------------- version=14 name='storage' state=0 txg=4 pool_guid=2339808256841075165 hostid=3600270990 hostname='kraken.example.org' top_guid=18159926173103963460 guid=16482477649650197853 vdev_tree type='raidz' id=0 guid=18159926173103963460 nparity=2 metaslab_array=23 metaslab_shift=37 ashift=9 asize=13995117903872 is_log=0 children[0] type='disk' id=0 guid=16482477649650197853 path='/dev/gpt/disk01' whole_disk=0 children[1] type='disk' id=1 guid=8540639469082160959 path='/dev/gpt/disk02' whole_disk=0 children[2] type='disk' id=2 guid=6533883554281261104 path='/dev/gpt/disk03' whole_disk=0 children[3] type='disk' id=3 guid=1801494265368466138 path='/dev/gpt/disk04' whole_disk=0 children[4] type='disk' id=4 guid=7430995867171691858 path='/dev/gpt/disk05' whole_disk=0 children[5] type='file' id=5 guid=11845728232134214029 path='/tmp/sparsefile1.img' children[6] type='file' id=6 guid=353355856440925066 path='/tmp/sparsefile2.img' -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3
Each label is about 128KB. It was suggested I overwrite a few MB at each end.
Let us do the math:
5MB of 512 btye sectors = 10240 sectors
Let us do the dd again:
# dd if=/dev/zero of=/dev/ada1p1 bs=512 count=10240 10240+0 records in 10240+0 records out 5242880 bytes transferred in 2.494232 secs (2102002 bytes/sec) # zdb -l /dev/gpt/disk01-live -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3
There, that killed it.
Replacing the vdev
Now we can try this replacement again:
# zpool replace storage gpt/disk01-live gpt/disk01-live
There, replaced. Now look at the status:
# zpool status pool: storage state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.01% done, 100h19m to go config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 replacing DEGRADED 0 0 0 gpt/disk01-live/old OFFLINE 0 638 0 gpt/disk01-live ONLINE 0 0 0 86.6M resilvered gpt/disk02-live ONLINE 0 0 0 57.8M resilvered gpt/disk03-live ONLINE 0 0 0 57.8M resilvered gpt/disk04-live ONLINE 0 0 0 57.8M resilvered gpt/disk05-live ONLINE 0 0 0 57.8M resilvered gpt/disk06-live ONLINE 0 0 0 57.8M resilvered gpt/disk07-live ONLINE 0 0 0 57.8M resilvered errors: No known data errors #
Notice how gpt/disk01-live/old is OFFLINE and gpt/disk01-live is ONLINE
and all vdevs are being resilvered.
I will now wait for the resilvering to complete before proceeding with the
other HDD. Started 10:58 AM
A partition trick
While I was invistigating the above by talking with Pawel Jakub Dawidek (pjd)
on IRC, he mentioned: is the data you want to keep on the first partition?
If so, you could remove the second partition and grow the first.
Clever. But if you look at what I did above, I labelled the first partitions
as backup, and the second partition as live. This is demonstrated here:
# gpart list ada5 Geom name: ada5 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada5p1 Mediasize: 1000147021312 (931G) Sectorsize: 512 Mode: r0w0e0 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk05-backup length: 1000147021312 offset: 524288 type: freebsd-zfs index: 1 end: 1953413174 start: 1024 2. Name: ada5p2 Mediasize: 1000147021312 (931G) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk05-live length: 1000147021312 offset: 1000147545600 type: freebsd-zfs index: 2 end: 3906825325 start: 1953413175 Consumers: 1. Name: ada5 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3
More drive repartitioning
The resilver has finished:
# zpool status pool: storage state: ONLINE scrub: resilver completed after 5h34m with 0 errors on Sat Jul 31 16:30:49 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 790G resilvered gpt/disk02-live ONLINE 0 0 0 402M resilvered gpt/disk03-live ONLINE 0 0 0 402M resilvered gpt/disk04-live ONLINE 0 0 0 402M resilvered gpt/disk05-live ONLINE 0 0 0 402M resilvered gpt/disk06-live ONLINE 0 0 0 402M resilvered gpt/disk07-live ONLINE 0 0 0 402M resilvered errors: No known data errors #
On to ada2! Do be careful with these commands. They could destroy an
existing HDD that you need. 🙂
# zpool offline storage gpt/disk02-live # gpart delete -i 1 ada2 # gpart delete -i 2 ada2 # dd if=/dev/zero of=/dev/ada2p1 bs=512 count=10240 # dd if=/dev/zero of=/dev/ada2p1 bs=512 count=10240 oseek=3907018928 # gpart create -s GPT ada2 # gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk02-live ada2 # zpool replace storage gpt/disk02-live # zpool status pool: storage state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h1m, 0.37% done, 8h36m to go config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 gpt/disk01-live ONLINE 0 0 0 79.9M resilvered replacing DEGRADED 0 0 0 gpt/disk02-live/old OFFLINE 0 958 0 gpt/disk02-live ONLINE 0 0 0 2.95G resilvered gpt/disk03-live ONLINE 0 0 0 79.9M resilvered gpt/disk04-live ONLINE 0 0 0 79.9M resilvered gpt/disk05-live ONLINE 0 0 0 79.9M resilvered gpt/disk06-live ONLINE 0 0 0 79.9M resilvered gpt/disk07-live ONLINE 0 0 0 79.9M resilvered errors: No known data errors
That should be done by morning. Then onto ada3, ada4, and ada5.
4PM Sunday, August 1, 2010: ada3 started.
# zfs list NAME USED AVAIL REFER MOUNTPOINT storage 3.91T 524G 1.75G /storage storage/Retored 3.11T 524G 38.4K /storage/Retored storage/bacula 815G 524G 3.21T /storage/bacula storage/pgsql 1.75G 524G 1.75G /storage/pgsql
You should notice that I have 525G available… That should
jump significantly once I’m done this process.
As of 7:15 Monday Morning, ada4 has been repartitioned and resilvered, and
ada5 is undergoing resilvering now. ETA: 1:15 PM.
Where’s the space?
It’s now 6:37 PM and the resilvering is complete. All vdevs are the same size.
However, I have far less space than expected:
$ zfs list NAME USED AVAIL REFER MOUNTPOINT storage 3.92T 510G 1.75G /storage storage/Retored 3.11T 510G 38.4K /storage/Retored storage/bacula 829G 510G 3.23T /storage/bacula storage/pgsql 1.75G 510G 1.75G /storage/pgsql
I expected to have 8TB or so…
After talking to my friends on IRC, I found myself composing
an email.
Before I got any replies, Wyze mentioned import/export. So, I tried it:
# df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0s1a 989M 508M 402M 56% / devfs 1.0K 1.0K 0B 100% /dev /dev/mirror/gm0s1e 3.9G 500K 3.6G 0% /tmp /dev/mirror/gm0s1f 58G 4.6G 48G 9% /usr /dev/mirror/gm0s1d 3.9G 156M 3.4G 4% /var storage 512G 1.7G 510G 0% /storage storage/pgsql 512G 1.7G 510G 0% /storage/pgsql storage/bacula 3.7T 3.2T 510G 87% /storage/bacula storage/Retored 510G 39K 510G 0% /storage/Retored # zpool export storage # zpool import storage # df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0s1a 989M 508M 402M 56% / devfs 1.0K 1.0K 0B 100% /dev /dev/mirror/gm0s1e 3.9G 500K 3.6G 0% /tmp /dev/mirror/gm0s1f 58G 4.6G 48G 9% /usr /dev/mirror/gm0s1d 3.9G 156M 3.4G 4% /var storage 5.0T 1.7G 5.0T 0% /storage storage/Retored 5.0T 39K 5.0T 0% /storage/Retored storage/bacula 8.2T 3.2T 5.0T 39% /storage/bacula storage/pgsql 5.0T 1.7G 5.0T 0% /storage/pgsql
THERE! And:
$ zfs get used,available storage NAME PROPERTY VALUE SOURCE storage used 3.92T - storage available 4.96T -
For a total of 8.88TB. That’s good enough!
I’m told
that more recent versions of ZFS (>= 16) includes a new autoexpand property. If this
property is set on a given pool, the available space will be
made available automatically as soon as the last disk in a vdev is
replaced.
Any questions? Suggestions?
This right here is why you are my hero. I’ve been wanting ZFS since it came out but was both waiting for the code to mature and for prices to come down enough enough to make my raid5 array long obsolete. Who would have thunk it, me actually wanting/preferring software raid but the features are too hard to resist. 1.8 * 7 = Just over 12 TB, enough to make a grown geek cry.
I have to admit I only just skimmed over the whole thing the first time but I’m going to have to go back and reread it a few times. Still not sure why you created the sparse files/md drives. I though ZFS had a simple add to pool command.
Anyways, thank you for sharing your Diary all these years.
You can add HDD to a pool, but you cannot go from raidz1 to raidz2 for example.
From <http://en.wikipedia.org/wiki/ZFS#Limitations>:
It is not possible to add a disk as a column to a RAID-Z, RAID-Z2, or RAID-Z3 vdev. This feature depends on the block pointer rewrite functionality due to be added soon. You can however create a new RAID-Z vdev and add it to the zpool.
I should explain this in more detail. In short, I had to create a 7 vdev array with only 5 physical HDD.
—
The Man Behind The Curtain