How do I debug a kernel module in which a NULL pointer appears?

7,020

Solution 1

First things first, debug the module? Just see if you can load it up in gdb it might point you straight at a line that uses the relevant variable(or close to it).

oh, and you might find this article useful

Solution 2

I'm one of the authors of that patch, sorry it is so buggy :)

In general to find null pointers like this I just insert printks until I find the pointer that is null (=0), then I read the source code until I find out why.

However in this case I know that you have to disable framebuffer console or you'll get this nasty bug, which is only triggered when the console is visible. Or it could be the bug triggered when you unplug the keyboard, and the module still tries to write to the now invalid buffer.

You should check out the new code on github, which I am attempting to clean up right now, to make it easier to compile against arbitrary kernels, and which has quite a few bug fixes.

Also, drop by our IRC, #lg4l on freenode.

Share:
7,020
Falmarri
Author by

Falmarri

Updated on September 17, 2022

Comments

  • Falmarri
    Falmarri over 1 year

    I have a custom kernel module that I compiled from this patch that adds support for the logitech G19 keyboard among other G series devices. I compiled it just fine against Ubuntu's maverick kernel's master branch (2.6.35).

    I can boot and load the module, but I'm running into a really strange situation. As soon as I load the module (either on boot or through modprobe), I get a black screen and my console locks up.

    The weird part is that it doesn't lock my system up, it's just the current console session. I can SSH into my box, and it gives me a terminal and a session. And I can type, and I can even run a command and it gives me the output. It then draws my next prompt and immediately locks up.

    I see in dmesg that there's a null pointer, and I get the following stacktrace:

    [  956.215836] input: Logitech G19 Gaming Keyboard as /devices/pci0000:00/0000:00:1d.7/usb1/1-2/1-2.1/1-2.1.2/1-2.1.2:1.1/input/input5
    [  956.216023] hid-g19 0003:046D:C229.0004: input,hiddev97,hidraw3: USB HID v1.11 Keypad [Logitech G19 Gaming Keyboard] on usb-0000:00:1d.7-2.1.2/input1
    [  956.216065] input: Logitech G19 as /devices/pci0000:00/0000:00:1d.7/usb1/1-2/1-2.1/1-2.1.2/1-2.1.2:1.1/input/input6
    [  956.216128] Registered led device: g19_97:orange:m1
    [  956.216146] Registered led device: g19_97:orange:m2
    [  956.216178] Registered led device: g19_97:orange:m3
    [  956.216198] Registered led device: g19_97:red:mr
    [  956.216216] Registered led device: g19_97:red:bl
    [  956.216235] Registered led device: g19_97:green:bl
    [  956.216259] Registered led device: g19_97:blue:bl
    [  956.216872] Console: switching to colour frame buffer device 40x30
    [  956.216899] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
    [  956.216903] IP: [<ffffffffa040b21b>] sys_imageblit+0x21b/0x4ec [sysimgblt]
    [  956.216911] PGD 273554067 PUD 2726ca067 PMD 0 
    [  956.216914] Oops: 0000 [#1] SMP 
    [  956.216917] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb1/1-2/1-2.1/1-2.1.2/1-2.1.2:1.1/usb/hiddev1/uevent
    [  956.216921] CPU 5 
    [  956.216922] Modules linked in: hid_g19(+) led_class hid_gfb fb_sys_fops sysimgblt sysfillrect syscopyarea btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device ioatdma snd i5000_edac soundcore snd_page_alloc psmouse edac_core i5k_amb shpchp serio_raw dca ppdev parport_pc lp parport usbhid hid floppy e1000e
    [  956.216953] 
    [  956.216956] Pid: 3147, comm: modprobe Not tainted 2.6.35-26-generic #46 DSBF-DE/System Product Name
    [  956.216959] RIP: 0010:[<ffffffffa040b21b>]  [<ffffffffa040b21b>] sys_imageblit+0x21b/0x4ec [sysimgblt]
    [  956.216963] RSP: 0018:ffff8802766db738  EFLAGS: 00010246
    [  956.216965] RAX: 0000000000000000 RBX: ffff880273e71000 RCX: ffff880272e93b40
    [  956.216968] RDX: 0000000000000007 RSI: 0000000000000010 RDI: ffff880272e93b40
    [  956.216970] RBP: ffff8802766db7d8 R08: 0000000000000000 R09: ffff880272e93b98
    [  956.216972] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    [  956.216974] R13: 0000000000000010 R14: 0000000000000008 R15: ffff8802766db8c8
    [  956.216977] FS:  00007fcae7725700(0000) GS:ffff880001f40000(0000) knlGS:0000000000000000
    [  956.216979] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [  956.216981] CR2: 000000000000001c CR3: 000000026ba26000 CR4: 00000000000006e0
    [  956.216983] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  956.216986] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [  956.216988] Process modprobe (pid: 3147, threadinfo ffff8802766da000, task ffff8802696a16e0)
    [  956.216990] Stack:
    [  956.216991]  ffff8802766db778 ffffffff810746ae ffff8802766db700 ffff88026b2cadc0
    [  956.216994] <0> ffff8802766db778 ffffffff812beef9 ffff8802f66db947 ffff8802766db94f
    [  956.216998] <0> ffff8802766db848 00000000812bf22e ffff880272e93b40 ffffffff812feb40
    [  956.217001] Call Trace:
    [  956.217011]  [<ffffffff810746ae>] ? send_signal+0x3e/0x90
    [  956.217018]  [<ffffffff812beef9>] ? put_dec+0x59/0x60
    [  956.217023]  [<ffffffff812feb40>] ? fbcon_resize+0xd0/0x230
    [  956.217027]  [<ffffffffa04175da>] gfb_fb_imageblit+0x1a/0x30 [hid_gfb]
    [  956.217031]  [<ffffffff813051b9>] soft_cursor+0x1c9/0x270
    [  956.217034]  [<ffffffff81304e8b>] bit_cursor+0x65b/0x6c0
    [  956.217037]  [<ffffffff812c1796>] ? vsnprintf+0x316/0x5a0
    [  956.217043]  [<ffffffff81061045>] ? try_acquire_console_sem+0x15/0x60
    [  956.217046]  [<ffffffff81300ca8>] fbcon_cursor+0x1d8/0x340
    [  956.217049]  [<ffffffff81304830>] ? bit_cursor+0x0/0x6c0
    [  956.217054]  [<ffffffff81368139>] hide_cursor+0x29/0x90
    [  956.217057]  [<ffffffff8136b078>] redraw_screen+0x148/0x240
    [  956.217060]  [<ffffffff8136b42e>] bind_con_driver+0x2be/0x3b0
    [  956.217063]  [<ffffffff8136b569>] take_over_console+0x49/0x70
    [  956.217066]  [<ffffffff812ff7fb>] fbcon_takeover+0x5b/0xb0
    [  956.217069]  [<ffffffff81303ca5>] fbcon_event_notify+0x5c5/0x650
    [  956.217076]  [<ffffffff8158e7f6>] notifier_call_chain+0x56/0x80
    [  956.217080]  [<ffffffff8108510a>] __blocking_notifier_call_chain+0x5a/0x80
    [  956.217084]  [<ffffffff81085146>] blocking_notifier_call_chain+0x16/0x20
    [  956.217089]  [<ffffffff812f366b>] fb_notifier_call_chain+0x1b/0x20
    [  956.217092]  [<ffffffff812f4c8c>] register_framebuffer+0x1ec/0x2e0
    [  956.217098]  [<ffffffff814084f8>] ? usb_init_urb+0x28/0x40
    [  956.217101]  [<ffffffffa041790f>] gfb_probe+0x21f/0x4f0 [hid_gfb]
    [  956.217107]  [<ffffffffa0425778>] g19_probe+0x558/0xedc [hid_g19]
    [  956.217115]  [<ffffffff811c059c>] ? sysfs_do_create_link+0xec/0x210
    [  956.217128]  [<ffffffffa00330c7>] hid_device_probe+0x77/0xf0 [hid]
    [  956.217131]  [<ffffffff81388aa2>] ? driver_sysfs_add+0x62/0x90
    [  956.217134]  [<ffffffff81388bc8>] really_probe+0x68/0x190
    [  956.217138]  [<ffffffff81388d35>] driver_probe_device+0x45/0x70
    [  956.217140]  [<ffffffff81388dfb>] __driver_attach+0x9b/0xa0
    [  956.217143]  [<ffffffff81388d60>] ? __driver_attach+0x0/0xa0
    [  956.217146]  [<ffffffff81388008>] bus_for_each_dev+0x68/0x90
    [  956.217149]  [<ffffffff81388a3e>] driver_attach+0x1e/0x20
    [  956.217151]  [<ffffffff813882fe>] bus_add_driver+0xde/0x280
    [  956.217154]  [<ffffffff81389140>] driver_register+0x80/0x150
    [  956.217157]  [<ffffffff8158e7f6>] ? notifier_call_chain+0x56/0x80
    [  956.217161]  [<ffffffffa042a000>] ? g19_init+0x0/0x20 [hid_g19]
    [  956.217166]  [<ffffffffa0032913>] __hid_register_driver+0x53/0x90 [hid]
    [  956.217169]  [<ffffffff81085115>] ? __blocking_notifier_call_chain+0x65/0x80
    [  956.217173]  [<ffffffffa042a01e>] g19_init+0x1e/0x20 [hid_g19]
    [  956.217178]  [<ffffffff8100204c>] do_one_initcall+0x3c/0x1a0
    [  956.217184]  [<ffffffff8109bd9b>] sys_init_module+0xbb/0x200
    [  956.217192]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
    [  956.217195] Code: 83 e1 fc 48 89 4d c8 eb d3 8b 83 14 01 00 00 83 f8 04 74 09 83 f8 02 0f 85 7b 01 00 00 48 8b 4d b0 48 8b 83 00 04 00 00 8b 51 10 <44> 8b 04 90 8b 51 14 8b 3c 90 44 8b 4d ac 45 85 c9 75 16 41 b9 
    [  956.217218] RIP  [<ffffffffa040b21b>] sys_imageblit+0x21b/0x4ec [sysimgblt]
    [  956.217221]  RSP <ffff8802766db738>
    [  956.217223] CR2: 000000000000001c
    [  956.217227] ---[ end trace 95d6c6d6913ccc79 ]---
    

    Can anyone point me in the right direction as to how to go about debugging this?

    The stacktrace leads me to believe that it's not the hid-g15 driver but the hid-gfb driver, which creates a frame buffer for the LCD on the keyboard. This makes sense since it's locking up my display/console but digging into the kernel code isn't really going anywhere. So much of it is assembly and macro functions.

    The last function on the stacktrace that involves my new code is gfb_fb_imageblit. The entirety of that function is

       struct gfb_data *par = info->par;
       sys_imageblit(info, image);
       gfb_fb_update(par);
    

    Am I reading the stacktrace wrong? Am I missing something? Any tips on how to debug this?

    • imz -- Ivan Zakharyaschev
      imz -- Ivan Zakharyaschev about 13 years
      Quite a few years ago, I resolved a similar bug in the pl2303 module by simply carefully reading the code and finding the source for the NULL pointer. (Then this tiny fix was taken by GregKH, the maintainer.) Perhaps, you could use a debugger to help you, ask how to use a debugger with the kernel. Also contact the maintainers of the code, they might have ideas.
    • imz -- Ivan Zakharyaschev
      imz -- Ivan Zakharyaschev about 13 years
      If you just care about the module being usable (and not necessarily helping yourself and the community by fixing the module to work with your kernel), then try it with the same kernel version and configuration as "other people" use.
    • Falmarri
      Falmarri about 13 years
      @imz: Well I've only seen one mention of it actually being used, and I don't know the exact kernel and config that they used, only that it was built with the meerkat kernel. I'd like to learn though so I'll probably start debugging this when I have some time.
    • imz -- Ivan Zakharyaschev
      imz -- Ivan Zakharyaschev about 13 years
      If you go through the debugging yourself, you'll ultimately be able to post one of the best answers to your question here!
    • imz -- Ivan Zakharyaschev
      imz -- Ivan Zakharyaschev about 13 years
      @Falmarri: "Should I ask that as a new question [about using a debugger] or just wait until I get more answers here?" Well, I thought that at least modifying the title of this post to make clear that you are ready to go into debugging would attract more relevant answers and hints.
  • Falmarri
    Falmarri about 13 years
    Well I went through some debugging steps and followed the stacktrace. But when I rebooted my computer and reloaded the modules it just worked. So, I don't know what was wrong.
  • RobotHumans
    RobotHumans about 13 years
    Good deal. Glad it's working for whatever reason
  • Falmarri
    Falmarri almost 13 years
    Hey, thanks for responding. I didn't expect the patch to be bug free. In fact, I was hoping I could contribute meaningfully to it. In fact I think I have some good information for you guys, even though it might be old since I haven't had a chance to work much on this lately. I'll stop by irc when I get the chance.
  • vonbrand
    vonbrand over 11 years
    Does it work only with the debugger? Did you change anything else?