Ath10k and QCA6174 causing PCIe errors, firmware crashes, and connection drops?

8,860

Solution 1

Warning: This is only a partial solution!

While the main issue (the wifi drop and crash) appears to be solved, the AER Corrected Error message still spams the logs. At least wifi is more consistent now.

Bernard Wei's comment led to the repository for ath10k firmware, which conveniently included an update for the hw3.0 chain.

Downloading firmware-6.bin_WLAN.RM.4.4.1-00110-QCARMSWP-1 and replacing firmware-6.bin in /lib/firmware/ath10k/QCA6174/hw3.0 followed by a reboot brought a far more stable wireless experience.

cd /lib/firmware/ath10k/QCA6174/hw3.0
sudo mv firmware-6.bin firmware-6.bin.old
sudo wget https://github.com/kvalo/ath10k-firmware/raw/master/QCA6174/hw3.0/4.4.1/firmware-6.bin_WLAN.RM.4.4.1-00110-QCARMSWP-1 -O firmware-6.bin

Note, however, that the following lines are now in the syslog:

[   21.482256] ath10k_pci 0000:3c:00.0: Unknown eventid: 3
[   21.498398] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
[   21.501401] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118

Now... to wait for this to hit the linux-firmware package for real. And also fix the AER errors...

Solution 2

After way too much frustration I bought Intel 3168NGW card and replaced flawed Qulacomm wifi hardware. Suddenly all works perfectly. Even bluetooth, which was always buggy works totally great now. I wish I replaced the card long time before. And I wish Dell never put Linux incompatible hardware in XPS...

Share:
8,860

Related videos on Youtube

Kaz Wolfe
Author by

Kaz Wolfe

Hello. You showed up in my review queue. You will be reviewed and re-tagged if necessary. Failure to comply may result in harm. Yes, I am a wolf. Awoo. IT director for a warehousing company, doing networking, systems, and support. My languages of choice are Java and Python. One of the six people who actually like MongoDB. Self-declared cybersecurity expert. If you need me, swing by the AskUbuntu General Room or contact me on Discord. If you want to talk to me over e-mail for some reason, shoot a message to the below listed address. Please don't spam me. Notable Achievements Wrote WolfBot. And abandoned it. And then went back to working on it. And then archived it to go work on Discord chatbots Played video games Managed to nuke an entire Linux install, live. And then fix it, live. Reddit. Operate DIY Tech, a partnered Discord server. Honestly not much. I'm pretty boring. Contact Information PGP Key: 2588 13F5 3A16 EBB4 (Keybase Enabled) Discord: KazWolfe#2896 E-mail: (username)@linux.com

Updated on September 18, 2022

Comments

  • Kaz Wolfe
    Kaz Wolfe over 1 year

    I recently (re)installed Ubuntu 18.04 on my Razer Blade Pro (2017). My wireless card is performing extremely poorly, and frequently dropping connection. Inspecting dmesg for Atheros messages yields the following (nasty-looking) crash:

    [ 6709.200017] ath10k_pci 0000:3c:00.0: firmware crashed! (guid 01e29e97-0ee6-4538-8756-764abe49705f)
    [ 6709.200048] ath10k_pci 0000:3c:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
    [ 6709.200056] ath10k_pci 0000:3c:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
    [ 6709.201666] ath10k_pci 0000:3c:00.0: firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 fd869beb
    [ 6709.202773] ath10k_pci 0000:3c:00.0: board_file api 2 bmi_id N/A crc32 20d869c3
    [ 6709.202784] ath10k_pci 0000:3c:00.0: htt-ver 3.47 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
    [ 6709.204809] ath10k_pci 0000:3c:00.0: firmware register dump:
    [ 6709.204822] ath10k_pci 0000:3c:00.0: [00]: 0x05030000 0x000015B3 0x009E6FD4 0x00955B31
    [ 6709.204830] ath10k_pci 0000:3c:00.0: [04]: 0x009E6FD4 0x00060730 0x0000001D 0x00473AD4
    [ 6709.204838] ath10k_pci 0000:3c:00.0: [08]: 0x0049C59C 0x0044DEB4 0x004290B0 0x00449AB0
    [ 6709.204847] ath10k_pci 0000:3c:00.0: [12]: 0x00000009 0xFFFFFFFF 0x00952F6C 0x00952F77
    [ 6709.204854] ath10k_pci 0000:3c:00.0: [16]: 0x00952CC4 0x0091080D 0x00000000 0x0091080D
    [ 6709.204862] ath10k_pci 0000:3c:00.0: [20]: 0x409E6FD4 0x0040E818 0x00405820 0x0049C464
    [ 6709.204870] ath10k_pci 0000:3c:00.0: [24]: 0x809E9395 0x0040E878 0x0049C6E8 0xC09E6FD4
    [ 6709.204879] ath10k_pci 0000:3c:00.0: [28]: 0x80932EF9 0x0040EA68 0x0040A054 0x00000009
    [ 6709.204887] ath10k_pci 0000:3c:00.0: [32]: 0x809F8C46 0x0040EA98 0x0041201C 0x00000004
    [ 6709.204894] ath10k_pci 0000:3c:00.0: [36]: 0x80911210 0x0040EAC8 0x00000005 0x004040F4
    [ 6709.204902] ath10k_pci 0000:3c:00.0: [40]: 0x80911154 0x0040EB28 0x00400000 0x00000000
    [ 6709.204910] ath10k_pci 0000:3c:00.0: [44]: 0x8091122D 0x0040EB48 0x00000000 0x00400600
    [ 6709.204922] ath10k_pci 0000:3c:00.0: [48]: 0x40910024 0x0040EB78 0x0040AB98 0x0040AB98
    [ 6709.204930] ath10k_pci 0000:3c:00.0: [52]: 0x00000000 0x0040EB98 0x009BB001 0x00040020
    [ 6709.204938] ath10k_pci 0000:3c:00.0: [56]: 0x809EDA21 0x0040E938 0x00499F10 0x00000000
    [ 6709.204944] ath10k_pci 0000:3c:00.0: Copy Engine register dump:
    [ 6709.204967] ath10k_pci 0000:3c:00.0: [00]: 0x00034400  14  14   3   3
    [ 6709.204990] ath10k_pci 0000:3c:00.0: [01]: 0x00034800  17  17 510 511
    [ 6709.205012] ath10k_pci 0000:3c:00.0: [02]: 0x00034c00   5   5  68  69
    [ 6709.205034] ath10k_pci 0000:3c:00.0: [03]: 0x00035000  27  27  29  27
    [ 6709.205057] ath10k_pci 0000:3c:00.0: [04]: 0x00035400 131 131 131  67
    [ 6709.205079] ath10k_pci 0000:3c:00.0: [05]: 0x00035800   0   0  64   0
    [ 6709.205101] ath10k_pci 0000:3c:00.0: [06]: 0x00035c00  26  26  24  24
    [ 6709.205123] ath10k_pci 0000:3c:00.0: [07]: 0x00036000   1   1   1   1
    [ 6710.053042] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
    [ 6710.056101] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118
    [ 6710.153420] ath10k_pci 0000:3c:00.0: device successfully recovered
    

    There are also the following entries related to the wireless card:

    [ 7403.617792] pcieport 0000:00:1c.6: AER: Corrected error received: id=00e6
    [ 7403.617797] pcieport 0000:00:1c.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e6(Transmitter ID)
    [ 7403.617800] pcieport 0000:00:1c.6:   device [8086:a116] error status/mask=00001000/00002000
    [ 7403.617802] pcieport 0000:00:1c.6:    [12] Replay Timer Timeout 
    

    The lspci output of the card is as follows:

    3c:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
        Subsystem: Bigfoot Networks, Inc. QCA6174 802.11ac Wireless Network Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 145
        Memory at dc200000 (64-bit, non-prefetchable) [size=2M]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/8 Maskable+ 64bit-
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Virtual Channel
        Capabilities: [168] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [178] Latency Tolerance Reporting
        Capabilities: [180] L1 PM Substates
        Kernel driver in use: ath10k_pci
        Kernel modules: ath10k_pci
    
    -[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
               +- ...
               +-1c.0-[02-3a]--
               +-1c.4-[3b]----00.0  ...
               +-1c.6-[3c]----00.0  Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
               +-1d.0-[3d]----00.0  ...
               +- ...
    

    Loading the card (at boot) shows the following dmesg output:

    [   29.432791] ath10k_pci 0000:3c:00.0: enabling device (0000 -> 0002)
    [   29.433628] ath10k_pci 0000:3c:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
    [   29.721996] ath10k_pci 0000:3c:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:3c:00.0.bin failed with error -2
    [   29.722023] ath10k_pci 0000:3c:00.0: Direct firmware load for ath10k/cal-pci-0000:3c:00.0.bin failed with error -2
    [   29.725059] ath10k_pci 0000:3c:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
    [   29.725061] ath10k_pci 0000:3c:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
    [   29.725481] ath10k_pci 0000:3c:00.0: firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 fd869beb
    [   29.791271] ath10k_pci 0000:3c:00.0: board_file api 2 bmi_id N/A crc32 20d869c3
    [   30.386364] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
    [   30.389342] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118
    [   30.389967] ath10k_pci 0000:3c:00.0: htt-ver 3.47 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
    [   30.471606] ath: EEPROM regdomain: 0x6c
    [   30.471606] ath: EEPROM indicates we should expect a direct regpair map
    [   30.471607] ath: Country alpha2 being used: 00
    [   30.471608] ath: Regpair used: 0x6c
    [   30.475073] ath10k_pci 0000:3c:00.0 wlp60s0: renamed from wlan0
    [   31.698248] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
    [   31.701166] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118
    

    Notably, my system does not have hw3.2 under /lib/firmware/ath10k/QCA6174. I have version 1.173.1 of linux-firmware installed, and no proprietary drivers seem to be available for my wireless card. Mandatory AIO script results are available on Pastebin.

    After a crash of my wireless card, I can generally restore connectivity by togging WiFi off and then back on in the GNONE menu, but this is annoying to do whenever my wireless crashes (which takes anywhere from a few minutes to a few hours from last crash to happen). This worked fine in 16.04 HWE before I had to uninstall Linux, so I'm not really certain why 18.04 would bring a whole new host of problems, but apparently they exist now.

    I'm assuming this is a kernel-related bug (although I have yet to file a report on this), but I would like to know if there are any workarounds present to make my wireless connection last longer than ten minutes and/or stop the PCIe Bus Error messages from cluttering my syslog.

    Short of replacing my wireless card, and waiting for an official fix, what can I do to improve wireless performance (and stop the crashes)?

    • Admin
      Admin almost 6 years
      It doesn't seem to have hw3.2 in the git repo, hw3.0 has just been added 3 months ago. See github.com/kvalo/ath10k-firmware Do you know if you are using firmware for hw3.0 on hw3.2?
    • Admin
      Admin almost 6 years
      @BernardWei It appears as though I'm using WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api, which is in the 3.0 folder. Not sure why this isn't working now, it was great under 16.04.
    • Admin
      Admin almost 6 years
      I don't quite understand this, but following this thread might give you something to try that may or may not help your situation. bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/174327‌​9
    • Admin
      Admin over 4 years
      Came across this thread after my wifi dropped and stack dumped in my kernel again. For the record, am now running on 18.04 with the HWE kernel, and have updated to firmware 00140 (from 3 months ago), which seems to work just fine.
  • user238607
    user238607 over 5 years
    I have the very same problem as described in the question but my firmware version is firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 . (Notice the Z at the end) But the latest file for that version in the github repo is only for 079. It doesn't have 110 version for that file.
  • user238607
    user238607 over 5 years
    I have faced this problem for quite sometime. The system hangs and a reboot it is required. Do you think using firmware-6.bin_WLAN.RM.4.4.1-00110-QCARMSWP-1 file instead of my current file version firmware-6.bin_WLAN.RM.4.4.1-00079-QCARMSWPZ-1 would solve this problem for me? Please help!! Thanks
  • Boris Hamanov
    Boris Hamanov over 5 years
    The AER (Advanced Error Reporting) can be squashed with a kernel boot parameter, but it won't fix the problem. Examine the syslog around the time of the AER and you'll probably find PCIe errors that probably point to the wi-fi card. It may require re-seating on the motherboard.