Multibit error encountered on Dell Server Memory

39,900

Solution 1

The event message reference for this was 1404. It indicates a faulty DIMM that should be replaced but from what I read on blogs, the alert often clears and does not come back after reboots. Since it only tripped once for me, I cleared the memory errors using OMSA (dcicfg32.exe) and so far so good.

Solution 2

Cause of error according to Dell: "A memory device correction rate exceeded an acceptable value, a memory spare bank was activated, or a multibit ECC error occurred. The system continues to function normally (except for a multibit error). Replace the memory module identified in the message during the system's next scheduled maintenance. The memory device status and location are provided."

Try replacing the DIMM with an identical one. If you have the memory under warranty then go for a replacement from the same vendor.

Share:
39,900

Related videos on Youtube

AXE Labs
Author by

AXE Labs

Updated on September 18, 2022

Comments

  • AXE Labs
    AXE Labs over 1 year

    Dell OpenManage reported the following:

    Memory device status is critical Memory device location: DIMM_B2 Possible memory module event cause:Multi bit error encountered

    What does this mean? How bad is it?

    • Tom O'Connor
      Tom O'Connor over 10 years
      Call Dell Support, send it back as faulty.
  • JimNim
    JimNim over 10 years
    This was a good move - replacement typically isn't warranted after a single occurrence, though I'd seriously consider it if the problem ever returns on that particular DIMM.
  • AXE Labs
    AXE Labs about 10 years
    Similarly, I was seeing "Single bit warning error rate exceeded" and "Single bit failure error rate exceeded" on a Linux host. These can be cleared as well but with omconfig: 'omconfig system alertlog action=clear' and 'omconfig system esmlog action=clear'. Lets hope they don't come back or its trash for the dimms.
  • Peter
    Peter almost 10 years
    Make sure you've got the latest firmware/BIOS too -- I have seen cases where these sorts of errors were spurious and "fixed" by firmware.