Feb 122012
 

Restoring an INOPERABLE 3Ware unit

I’ve been using a 3Ware 9550SX-8LP since 2006. Over the weekend, I encountered the first problem with it. It became inoperable. That’s an overstatement, but the problem was easily fixed. After a reboot to upgrade the kernel, Nagios alerted me to a problem. I checked via the command line and found this situation:
# tw_cli info c0

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    9550SX-8LP   8         8        3       1       4       1      OK       

# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   OK             -       -       64K     195.548   ON     ON     
u1    SPARE     OK             -       -       -       69.2404   -      ON     
u2    RAID-10   INOPERABLE     -       -       64K     195.548   OFF    ON     

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u2     69.25 GB    145226112     WD-WMAKE2379003     
p1     OK               u1     69.25 GB    145226112     WD-WMAKE2379069     
p2     OK               u0     69.25 GB    145226112     WD-WMAKE2379066     
p3     OK               u0     69.25 GB    145226112     WD-WMAKE2379012     
p4     OK               u0     69.25 GB    145226112     WD-WMAKE2379286     
p5     OK               u0     69.25 GB    145226112     WD-WMAKE2379019     
p6     OK               u0     69.25 GB    145226112     WD-WMAKE2394339     
p7     OK               u0     69.25 GB    145226112     WD-WMAKE2378696     

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       255    02-Sep-2010
Here, you can see that u2 has a problem. Looking at the output details, we can also see that u2 contains a single HDD and is connected to port 0 (p0). That means it is one of the two spares that have existed in this array since I set it up. I will remove that unit, and add it back into the array. See below.

Fixing it

I found help via Google and used that as an example. I also posted to FreeBSD Forums before I proceeded. But today, before I received a reply, I went ahead… First, I removed the defective u2 unit:
# tw_cli maint deleteunit c0 u2
Deleting unit c0/u2 ...Done.


# tw_cli info

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    9550SX-8LP   8         8        2       0       4       1      OK       

# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   OK             -       -       64K     195.548   ON     ON     
u1    SPARE     OK             -       -       -       69.2404   -      ON     

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               -      69.25 GB    145226112     WD-WMAKE2379003     
p1     OK               u1     69.25 GB    145226112     WD-WMAKE2379069     
p2     OK               u0     69.25 GB    145226112     WD-WMAKE2379066     
p3     OK               u0     69.25 GB    145226112     WD-WMAKE2379012     
p4     OK               u0     69.25 GB    145226112     WD-WMAKE2379286     
p5     OK               u0     69.25 GB    145226112     WD-WMAKE2379019     
p6     OK               u0     69.25 GB    145226112     WD-WMAKE2394339     
p7     OK               u0     69.25 GB    145226112     WD-WMAKE2378696     

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       255    02-Sep-2010  
This has removed the unit from the array. Now I add it back into the array. I knew it was p0 because it was listed as so in the above output.
# tw_cli maint createunit c0 p0 rspare
Creating new unit on controller /c0 ... Done. The new unit is /c0/u2.
WARNING: This Spare unit may replace failed drive of same interface type only.
Now the status looks like this:
# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   OK             -       -       64K     195.548   ON     ON     
u1    SPARE     OK             -       -       -       69.2404   -      ON     
u2    SPARE     OK             -       -       -       69.2404   -      OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u2     69.25 GB    145226112     WD-WMAKE2379003     
p1     OK               u1     69.25 GB    145226112     WD-WMAKE2379069     
p2     OK               u0     69.25 GB    145226112     WD-WMAKE2379066     
p3     OK               u0     69.25 GB    145226112     WD-WMAKE2379012     
p4     OK               u0     69.25 GB    145226112     WD-WMAKE2379286     
p5     OK               u0     69.25 GB    145226112     WD-WMAKE2379019     
p6     OK               u0     69.25 GB    145226112     WD-WMAKE2394339     
p7     OK               u0     69.25 GB    145226112     WD-WMAKE2378696     

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       255    02-Sep-2010  
The next step is to verify that new unit:
# tw_cli
//supernews> maint verify c0 u2
Sending start verify message to /c0/u2 ... Done.

//supernews>
Now that verify has started, you can see that in the output of info:
# tw_cli info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   OK             -       -       64K     195.548   ON     ON     
u1    SPARE     OK             -       -       -       69.2404   -      ON     
u2    SPARE     VERIFYING      -       23%     -       69.2404   -      OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     VERIFYING        u2     69.25 GB    145226112     WD-WMAKE2379003     
p1     OK               u1     69.25 GB    145226112     WD-WMAKE2379069     
p2     OK               u0     69.25 GB    145226112     WD-WMAKE2379066     
p3     OK               u0     69.25 GB    145226112     WD-WMAKE2379012     
p4     OK               u0     69.25 GB    145226112     WD-WMAKE2379286     
p5     OK               u0     69.25 GB    145226112     WD-WMAKE2379019     
p6     OK               u0     69.25 GB    145226112     WD-WMAKE2394339     
p7     OK               u0     69.25 GB    145226112     WD-WMAKE2378696     

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       255    02-Sep-2010  
This was all much easier that I thought it was going to be…