Finding and fixing a corrupt ODM install
Good day. Recently it was discovered that one of the AIX servers is having an issue with a multitude of powerpath devices. When issuing a
lsdev |grep hdiskpower | wc -l I was surprised to see over 3000 finds. Upon looking at what was currently being used with lspv |grep power I noticed there was like half a dozen maybe in use.
Upgrading the ODM to a newer version didn’t help much. It took over 2.5 hours to remove all of the hdiskpower devices, followed by installing 3 additional ones. A reboot of the AIX system didn’t help either. Upon scouring the web, I have found a few places which indicate the following procedure should fix up the issue (at the moment this is untested). I’ll be validating this information within the next week.
* Shutdown the Application(s), Database(s), etc and varyoff all Volume Groups (VGs) except rootvg. This can be confirmed with
* If EMC Solutions Enabler is running, disable with
* Remove paths from the PowerPath Configuration –>
* Delete all Symmetrix Disks –>
* Delete all hdiskpower devices –>
* Confirm they’re gone with –>
* Remove all fibre devices instances ->
* Verify fibre adapters are gone –>
* Put the hba devices into a defined state –>
* Scan the bus –>
* Configure all of the EMC devices into PowerPath –>
* Some final checks –>
* Finally save your changes with –>
lsvg -o* If EMC Solutions Enabler is running, disable with
stordaemon shutdown all -immediate* Remove paths from the PowerPath Configuration –>
powermt remove hba=all* Delete all Symmetrix Disks –>
lsdev -CtSYMM* -Fname |xargs -n1 rmdev -dl* Delete all hdiskpower devices –>
rmdev -dl powerpath0* Confirm they’re gone with –>
lsdev -Cc disk (no symmextrix nor hdiskpower devices should exist)* Remove all fibre devices instances ->
rmdev -Rdl fscsi0 (repeat for others like fscsi1 etc)* Verify fibre adapters are gone –>
lsdev -Cc adapter (no fscsi should exist)* Put the hba devices into a defined state –>
rmdev -l fcsX (replace x with 0, 1 etc)* Scan the bus –>
emc_cfgmgr or cfgmgr -vl fcsX NOTE: emc_cgrmgr is a script downloadable from EMC’s website* Configure all of the EMC devices into PowerPath –>
powermt config* Some final checks –>
powermt display & powermt display dev=all & lsdev -Cc disk* Finally save your changes with –>
powermt save
MPIO settings (if applicable) may have to be put in again. If so, they can be changed like so:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail (repeat for other adapters)
A reboot should NOT be necessary. However, I’ll confirm and update within a week.
= Varying degrees of success =
No issues up until “rmdev -dl powerpath0″. Got this response instead:
rmdev -dl powerpath0
Method error (/etc/methods/ucfgpower):
0514-062 Cannot perform the requested function because the
specified device is busy.
Hence, done the
This slowly started to delete each of them one at a time. Time for a coffee break apparently!
lsdev -Cc disk option. It listed the two local SAS drives, and the 3000+ hdiskpower devices (all of the hdiskpower devices were in a Defined state). Hence, attempted a manual removal of those with the following line of code:lsdev -Cc disk | grep hdiskpower | awk {'print "rmdev -dl " $1'} | shThis slowly started to delete each of them one at a time. Time for a coffee break apparently!
Once the 3135 hdiskpower devices were deleted, the
rmdev -dl powerpath0command worked as expected. Rest of the procedure worked as planned. Lastly set the MPIO settings with the command:chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail -P
chdev -l fscsi1 dyntrk=yes -a fc_err_recov=fast_fail -P
MPIO settings took effect after reboot.
No comments:
Post a Comment