Multibit error encountered on Dell Server Memory
Solution 1
The event message reference for this was 1404. It indicates a faulty DIMM that should be replaced but from what I read on blogs, the alert often clears and does not come back after reboots. Since it only tripped once for me, I cleared the memory errors using OMSA (dcicfg32.exe) and so far so good.
Solution 2
Cause of error according to Dell: "A memory device correction rate exceeded an acceptable value, a memory spare bank was activated, or a multibit ECC error occurred. The system continues to function normally (except for a multibit error). Replace the memory module identified in the message during the system's next scheduled maintenance. The memory device status and location are provided."
Try replacing the DIMM with an identical one. If you have the memory under warranty then go for a replacement from the same vendor.
Related videos on Youtube
AXE Labs
Updated on September 18, 2022Comments
-
AXE Labs over 1 year
Dell OpenManage reported the following:
Memory device status is critical Memory device location: DIMM_B2 Possible memory module event cause:Multi bit error encountered
What does this mean? How bad is it?
-
Tom O'Connor over 10 yearsCall Dell Support, send it back as faulty.
-
-
JimNim over 10 yearsThis was a good move - replacement typically isn't warranted after a single occurrence, though I'd seriously consider it if the problem ever returns on that particular DIMM.
-
AXE Labs about 10 yearsSimilarly, I was seeing "Single bit warning error rate exceeded" and "Single bit failure error rate exceeded" on a Linux host. These can be cleared as well but with omconfig: 'omconfig system alertlog action=clear' and 'omconfig system esmlog action=clear'. Lets hope they don't come back or its trash for the dimms.
-
Peter almost 10 yearsMake sure you've got the latest firmware/BIOS too -- I have seen cases where these sorts of errors were spurious and "fixed" by firmware.