Restoring an INOPERABLE 3Ware unit
I’ve been using a 3Ware 9550SX-8LP since 2006. Over the weekend, I encountered the first problem
with it. It became inoperable. That’s an overstatement, but the problem was easily fixed.
After a reboot to upgrade the kernel, Nagios alerted me to a problem. I checked via the command line
and found this situation:
# tw_cli info c0 Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c0 9550SX-8LP 8 8 3 1 4 1 OK # tw_cli info c0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 OK - - 64K 195.548 ON ON u1 SPARE OK - - - 69.2404 - ON u2 RAID-10 INOPERABLE - - 64K 195.548 OFF ON Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u2 69.25 GB 145226112 WD-WMAKE2379003 p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069 p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066 p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012 p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286 p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019 p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339 p7 OK u0 69.25 GB 145226112 WD-WMAKE2378696 Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 02-Sep-2010
Here, you can see that u2 has a problem. Looking at the output details, we can also see
that u2 contains a single HDD and is connected to port 0 (p0).
That means it is one of the two spares that have
existed in this array since I set it up.
I will remove that unit, and add it back into the array. See below.
Fixing it
I found help via Google
and used that as an example. I also posted to FreeBSD Forums
before I proceeded. But today, before I received a reply, I went ahead…
First, I removed the defective u2 unit:
# tw_cli maint deleteunit c0 u2 Deleting unit c0/u2 ...Done. # tw_cli info Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c0 9550SX-8LP 8 8 2 0 4 1 OK # tw_cli info c0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 OK - - 64K 195.548 ON ON u1 SPARE OK - - - 69.2404 - ON Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK - 69.25 GB 145226112 WD-WMAKE2379003 p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069 p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066 p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012 p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286 p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019 p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339 p7 OK u0 69.25 GB 145226112 WD-WMAKE2378696 Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 02-Sep-2010
This has removed the unit from the array. Now I add it back into the array.
I knew it was p0 because it was listed as so in the above output.
# tw_cli maint createunit c0 p0 rspare Creating new unit on controller /c0 ... Done. The new unit is /c0/u2. WARNING: This Spare unit may replace failed drive of same interface type only.
Now the status looks like this:
# tw_cli info c0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 OK - - 64K 195.548 ON ON u1 SPARE OK - - - 69.2404 - ON u2 SPARE OK - - - 69.2404 - OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u2 69.25 GB 145226112 WD-WMAKE2379003 p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069 p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066 p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012 p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286 p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019 p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339 p7 OK u0 69.25 GB 145226112 WD-WMAKE2378696 Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 02-Sep-2010
The next step is to verify that new unit:
# tw_cli //supernews> maint verify c0 u2 Sending start verify message to /c0/u2 ... Done. //supernews>
Now that verify has started, you can see that in the output of info:
# tw_cli info c0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 OK - - 64K 195.548 ON ON u1 SPARE OK - - - 69.2404 - ON u2 SPARE VERIFYING - 23% - 69.2404 - OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 VERIFYING u2 69.25 GB 145226112 WD-WMAKE2379003 p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069 p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066 p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012 p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286 p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019 p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339 p7 OK u0 69.25 GB 145226112 WD-WMAKE2378696 Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 02-Sep-2010
This was all much easier that I thought it was going to be…