One of the most common problem that we encounter is disk errors on block. The best way to begin is starting our investigation from log files.
Incident:Mar 7 18:00:49 sarge scsi: [ID 107833 kern.notice] Requested Block: 27982496 Error Block: 27982496
Mar 7 18:00:47 sarge scsi: [ID 799468 kern.info] ssd127 at scsi_vhci0: name g60060e8005436200000043620000015e, bus address g60060e8005436200000043620000015e
Mar 7 18:00:47 sarge genunix: [ID 834635 kern.info] /scsi_vhci/ssd@g60060e8005436200000043620000015e (ssd127) multipath status: optimal, path /pci@25,700000/SUNW,qlc@0/fp@0,0 (fp1) to target address: w50060e8005436211,37 is online Load balancing: round-robin
Mar 7 18:00:48 sarge scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060e80054362000000436200000045 (ssd78):
Mar 7 18:00:48 sarge scsi: [ID 107833 kern.notice] Requested Block: 6463744 Error Block: 6463744
Mar 7 18:00:48 sarge scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 043620045
Mar 7 18:00:48 sarge scsi: [ID 107833 kern.notice] Sense Key: Unit Attention
Mar 7 18:00:48 sarge scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0
Once we ascertain the error exist and persist, we will proceed to check to disk connection/configuration status.
The cfgadm command provides configuration administration operations on dynamically reconfigurable hardware resources. These operations include displaying status, (-l), initiating testing, (-t), invoking configuration state changes, (-c), invoking hardware specific functions, (-x), and obtaining configuration administration help messages (-h). Configuration administration is performed at attachment points, which are places where system software supports dynamic reconfi guration of hardware resources during continued operation of OS.
sarge:~ # cfgadm
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c1 scsi-bus connected configured unknown
The mpathadm command enables multipathing discovery and management. The mpathadm command is implemented as a set of subcommands, many with their own options, that are described in the section for that subcommand. Options not associated with a particular subcommand are described under OPTIONS. The mpathadm subcommands operate on a direct-object. These are described in this section for each subcommand. The direct-objects, initiator-port, target-port, andlogical-unit in the subcommands are consistent with SCSI standard definitions.
To list available multipathing support:
sarge:~ # mpathadm list mpath-support
mpath-support: libmpscsi_vhci.so
The view properties for supported multipathing facilities:
sarge:~ # mpathadm show mpath-support libmpscsi_vhci.so
And list initiators port:
sarge:~ # mpathadm list initiator-port
# mpathadm disable/enable path -i 2000000173018713 -t 20030003ba27d095 \
-l /dev/rdsk/c4t60003BA27D2120004204AC2B000DAB00d0s2
To support for many third-party devices is not contained in the default version of the configuration file at /kernel/drv/scsi_vhci.conf. The following shows the changes necessary to bring EMC Symmetrix support into the multipathing software:
sarge:~ # less /kernel/drv/scsi_vhci.conf
The problem is that if 'Channel A' or 'B' is disabled, then we lose all contact with the disk. What we want is failover such that if 'A' fails, then all traffic is routed through 'Channel B' (and vice-versa, fibre-channel stuff).
When the disk becomes available again (after modunload/modload), we can see by log files.
The problem 'resolved' itself when three things occurred!
These were:
1) The '/kernel/drv/fp.conf' file had 2 entries in it for fibre-channel - as if there was a dual-port card present. In our case we only had the one port, so I commented out one of the entries.
2) The 'mpathadm show lu ...' command showed the 'Current Load Balance' as round-robin. This was changed to 'none'.
3) It seems that Sun recently released a patch fixing some problems with Qlogic cards. I tend to run 'pca' to patch my systems, and wasn't really paying too much attention to it I'm afraid! I think the patch was 113042.
Rebooting and reconfiguring the system, the FC card then seemed to work correctly when one of the channels was disabled. As far as we can tell running Solaris 10 with 2 FC cards should work pretty much out of the box with respect to failover.
No comments:
Post a Comment