"kernel: Buffer I/O error on device" - Does my server have a hardware problem?

11,868

This I/O error message is written to warn about a hardware error with sdb. It could be with the disks or with the cable, for example.

I suppose it is less likely to be an error in the disks themselves, if you have a large number of disks all showing errors at the same time :-). It could be an error in the disk controller.

If you see "Buffer I/O error" but no specific messages about ATA or SCSI error codes, or about retry attempts in general, maybe that gives some hint. But I do not really know :-).

Of course, a software error could cause any messages whatsoever :-).

To give an example of a software error, although I know this is not the same error: I have seen a kernel bug where "Buffer I/O error" was shown, without any error messages about ATA or SCSI or retry attempts. Fedora bug 1553979.


The "Buffer" part just means that it happened during a request for file data which is cacheable in the page cache. For historical reasons, people sometimes call these requests "buffered IO".

Share:
11,868

Related videos on Youtube

yael
Author by

yael

Updated on September 18, 2022

Comments

  • yael
    yael over 1 year

    we have linux DB server redhat 7.2

    we notice about many message as below about all disks that are mounted

    from /var/log/messages

    what we are need to understand if this behavior is relevant to HW problem

    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4980*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4981*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4982*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4983*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4984*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4985*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4986*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4987*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4988*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4989*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4990*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4991*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4992*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4993*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4994*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4995*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4996*
    Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4997*
    

    we also seen this messages

    Mar 27 09:18:08 server_DB smartd[1734]: Monitoring 0 ATA and 26 SCSI devices
    Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin
    Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin
    Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin
    Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin
    Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin
    Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin
    

    I am also checked the disk

    smartctl -a -d megaraid,0 /dev/sdb
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.el7.x86_64] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Vendor:               SEAGATE
    Product:              ST600MM0238
    Revision:             BS04
    User Capacity:        600,127,266,816 bytes [600 GB]
    Logical block size:   512 bytes
    Formatted with type 2 protection
    Logical block provisioning type unreported, LBPME=0, LBPRZ=0
    Rotation Rate:        10000 rpm
    Form Factor:          2.5 inches
    Logical Unit id:      0x5000c500a0f28343
    Serial number:        W0M0LYD2
    Device type:          disk
    Transport protocol:   SAS
    Local Time is:        Wed Mar 27 10:51:30 2019 UTC
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Disabled or Not Supported
    
    === START OF READ SMART DATA SECTION ===
    SMART Health Status: OK
    
    Current Drive Temperature:     24 C
    Drive Trip Temperature:        60 C
    
    Manufactured in week 45 of year 2017
    Specified cycle count over device lifetime:  10000
    Accumulated start-stop cycles:  50
    Specified load-unload count over device lifetime:  300000
    Accumulated load-unload cycles:  177
    Elements in grown defect list: 0
    
    Vendor (Seagate) cache information
      Blocks sent to initiator = 412242328
      Blocks received from initiator = 3213595579
      Blocks read from cache and sent to initiator = 312462212
      Number of read and write commands whose size <= segment size = 31915885
      Number of read and write commands whose size > segment size = 0
    
    Vendor (Seagate/Hitachi) factory information
      number of hours powered up = 3178.45
      number of minutes until next internal SMART test = 12
    
    • Admin
      Admin about 5 years
      When you say "many message as below about all disks that are mounted", do you mean you're seeing error messages about not just sdb but other disks as well?
    • Admin
      Admin about 5 years
      Is sdb a hard disk or a DVD?
    • Admin
      Admin about 5 years
      yes I mean also other disks , and disk is hard disk not DVD
    • Admin
      Admin about 5 years
      Be aware, this question is quite broad, because of how few details it includes. It might not work well on this site. The preset reasons for closing questions include both "too broad" and "Primarily opinion-based". You are asking about errors which might be in hardware or drivers, but you have not specified what the hardware is (and what the relevant driver is). It is also good to mention the specific kernel version that you saw the errors with. Also, if you can write a good question about this, ServerFault.com might know more about e.g. the hardware and drivers used on servers.
    • Admin
      Admin about 5 years
      I add also details about the disk , the same out is on the other disks , hope it help to give more details
    • Admin
      Admin about 5 years
      @yael kernel version? what is the disk controller called? what is the driver for the controller?
    • Admin
      Admin about 5 years
      Two suggestions of ways to find drivers here: unix.stackexchange.com/questions/15274/…
  • yael
    yael about 5 years
    can I ask little question , if we install from scratch the OS , and then the application , it could be help , I mean maybe something with OS level ? , or we can be sure its HW?
  • user2948306
    user2948306 about 5 years
    @yael could be driver error. could be another rare error in the core (different from the one I linked to).
  • yael
    yael about 5 years
    yes it is strange most of the disks are with the error as sdb
  • user2948306
    user2948306 about 5 years
    @yael if the only error you see on these disks is "Buffer I/O error", and it does not show a specific error about SCSI (or ATA, or generally retrying), maybe that says something. I really don't know, all I am saying is that often I have seen them together in the past. Maybe if you only see "Buffer I/O error", that could mean the kernel has an error communicating with the disk controller.