How do you "fix" a faulty path in device-mapper-multipath
There's a subtle bug in your multipath.conf, vendor and product are matching at the regexp level, that you've added a series of leading spaces is causing multipathd to fail to match your configuration with the actual devices on the system. If you were to examine the output of echo 'show config' | multipathd -k
you would find two device sections for your SAN, one that matches all the extra spaces you added, and the default config (should it exist) provided by internal database.
Adjust your multipath.conf to look like this:
vendor "XIOTECH "
product "ISE1400.*"
SCSI Inquiry expects a vendor field that is no greater than 8 characters terminated by an ASCII Zero, if you don't use all 8 you must pad the field with
spaces to reach 8 characters. Multipathd is interpreting the spec to the letter of the law, you could have also done "XIOTECH.*"
if you really want to be sure.
Once you make these changes, stop multipathd using your initscripts, multipath -F which will flush your config and then start multipathd again. Your config file should be honored now. If you still have problems, reboot.
If there's ever a doubt that your config file isn't being honored, always examine the running config using the echo incantation and compare what's loaded in the database to your config file.
Related videos on Youtube
Lennert
internet/new-media high traffic web cluster focused sysadmin
Updated on September 17, 2022Comments
-
Lennert almost 2 years
I have a multipath config that was working but now shows a "faulty" path:
[root@nas ~]# multipath -ll sdd: checker msg is "readsector0 checker reports path is down" mpath1 (36001f93000a63000019f000200000000) dm-2 XIOTECH,ISE1400 [size=200G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=1][active] \_ 1:0:0:1 sdb 8:16 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 2:0:0:1 sdd 8:48 [active][faulty]
At the same time I'm seeing these three lines over and over in
/var/log/messages
Feb 5 12:52:57 nas kernel: sd 2:0:0:1: SCSI error: return code = 0x00010000 Feb 5 12:52:57 nas kernel: end_request: I/O error, dev sdd, sector 0 Feb 5 12:52:57 nas kernel: Buffer I/O error on device sdd, logical block 0
And this line shows up fairly often too
Feb 5 12:52:58 nas multipathd: sdd: readsector0 checker reports path is down
One thing I don't understand is why its using the
readsector0
checking method when my/etc/multipath.conf
file say to usetur
[root@nas ~]# tail -n15 /etc/multipath.conf
devices { device { vendor "XIOTECH " product "ISE1400 " path_grouping_policy multibus getuid_callout "/sbin/scsi_id -g -u -d /dev/%n" path_checker tur prio_callout "none" path_selector "round-robin 0" failback immediate no_path_retry 12 user_friendly_names yes } }
Looking at the upstream documentation here this paragraph seems relevant: http://christophe.varoqui.free.fr/usage.html
For each path: \_ host:channel:id:lun devnode major:minor [path_status][dm_status_if_known] The dm status (dm_status_if_known) is like the path status (path_status), but from the kernel's point of view. The dm status has two states: "failed", which is analogous to "faulty", and "active" which covers all other path states. Occasionally, the path state and the dm state of a device will temporarily not agree.
Its been well over 24 hours for me so its not temporary.
So with all that as background my questions are
- how can I determine the root cause here?
- how can I manually/command-line perform whatever check its doing
- why is it ignoring my multipath.conf (did I do it wrong?)Thanks in advance for any ideas, if there's anything else I can provide for info let me know in a comment and I'll edit it into the post.
-
Admin over 14 yearsHmmm for IBM arrays I don't attach spaces to vendor/model and they are recognized properly. To show why you don't get tur path checker, please paste relevant snippets from: multipath -d -v3
-
Admin over 14 yearsthank you kubanskamac, it turns out it is picky about the spaces and thats why my config snippet was being ignored and multipath was reverting to the defaults. The good news is the defaults actually work better as that getuid_callout line fails when I clean up the spaces. Chalk this up to me trusting terrible vendor documentation too much.
-
Admin about 13 yearsSo what about the root cause of your problem? Is there a faulty disk or have you lost a path to the disk?
-