CUDA apps time out & fail after several seconds - how to work around this?

392

Solution 1

I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.

You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious. To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1. You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.

Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.

Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.

The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?

You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.

Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.

[EDIT Mar 2010 to update:] (outdated again, see the updates below for the most recent information) The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx

[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.

[EDIT August 2018 to update:] Although the NVIDIA tools allow disabling the TDR now, the same question is relevant for AMD/OpenCL developers. For those: The current link that documents the TDR settings is at https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

Solution 2

On Windows, the graphics driver has a watchdog timer that kills any shader programs that run for more than 5 seconds. Note that the Xorg/XFree86 drivers don't do this, so one possible workaround is to run the CUDA apps on Linux.

AFAIK it is not possible to disable the watchdog timer on Windows. The only way to get around this on Windows is to use a second card that has no displayed screens on it. It doesn't have to be a Tesla but it must have no active screens.

Solution 3

Resolve Timeout Detection and Recovery - WINDOWS 7 (32/64 bit)

Create a registry key in Windows to change the TDR settings to a higher amount, so that Windows will allow for a longer delay before TDR process starts.

Open Regedit from Run or DOS.

In Windows 7 navigate to the correct registry key area, to create the new key:

HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers.

There will probably one key in there called DxgKrnlVersion there as a DWord.

Right click and select to create a new key REG_DWORD, and name it TdrDelay. The value assigned to it is the number of seconds before TDR kicks in - it > is currently 2 automatically in Windows (even though the reg. key value doesn't exist >until you create it). Assign it with a new value (I tried 4 seconds), which doubles the time before TDR. Then restart PC. You need to restart the PC before the value will work.

Source from Win7 TDR (Driver Timeout Detection & Recovery) I have also verified this and works fine.

Solution 4

The most basic solution is to pick a point in the calculation some percentage of the way through that I am sure the GPU I am working with is able to complete in time, save all the state information and stop, then to start again.

Update: For Linux: Exiting X will allow you to run CUDA applications as long as you want. No Tesla required (A 9600 was used in testing this)

One thing to note, however, is that if X is never entered, the drivers probably won't be loaded, and it won't work.

It also seems that for Linux, simply not having any X displays up at the time will also work, so X does not need to be exited as long as you screen to a non-X full-screen terminal.

Solution 5

This isn't possible. The time-out is there to prevent bugs in calculations from taking up the GPU for long periods of time.

If you use a dedicated card for CUDA work, the time limit is lifted. I'm not sure if this requires a Tesla card, or if a GeForce with no monitor connected can be used.

Share:
392

Related videos on Youtube

rck
Author by

rck

I'm me

Updated on July 09, 2022

Comments

  • rck
    rck almost 2 years

    I am writing a program in Scheme and having difficulty with this one part. Below is an example to make my question clear

    (endsmatch lst) should return #t if the first element in the list is the same as the last element in the list and return #f otherwise.

    For example: (endsmatch '(s t u v w x y z)) should return: #f

    and

    (endsmatch (LIST 'j 'k 'l 'm 'n 'o 'j)) should return: #t

    Here is what I have so far (just error handling). The main issue I am having is solving this recursively. I understand there are easier solutions that are not recursive but I need to solve this using recursion.

    My code so far:

        (define (endsmatch lst)
    
           (if (not(list? lst))     
              "USAGE: (endsmatch [list])"
    
              (if (or (null? lst)
                  (= (length lst) 1))
                  #t
    
              (equal? ((car lst)) (endsmatch(car lst)))
    
        )))
    

    I believe my code starting at "(equal? " is where it is broken and doesn't work. This is also where I believe recursion will take place. Any help is appreciated!

  • rck
    rck over 15 years
    It would be useful to determine which of these cases it is. I'll have to try a non-tesla card with no monitor attached and find out.
  • rck
    rck over 15 years
    I just tried this out. No Tesla card needed. Using Linux, I actually just didn't bother going into X and the Limit was lifted.
  • San Jacinto
    San Jacinto over 13 years
    I'm not a SIMD programmer, nor do I play one on TV, but IMHO it's a bit too general to say that "Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong." In scientific applications (like ones CUDA is often used for), sometimes you just have a lot to compute.
  • Tom
    Tom over 13 years
    If you're not loading X then you can use a script to load the CUDA driver. Check out the Getting Started guide (developer.download.nvidia.com/compute/cuda/3_2_prod/docs/…) for more information.
  • Ade Miller
    Ade Miller about 13 years
    San Jacinto: See Tom's answer below. The timeout is reasonable in the case where the GPU you are computing on is also your display GPU. In the case where it is not used for display then you have more options.
  • Hans Rudel
    Hans Rudel almost 11 years
    Hi Tom, I have modified the watchdog timer already (to ~6days) and have managed to get a single kernel to run for 40 seconds. Ive just tried running a significantly larger one but i keep getting a "ErrorLaunch TimeOut" error. I only have a single GPU so i was wondering if there is something else which might be forcing the gpu to respond before its finished the kernel, esp since it should only take about 4-5 minutes to run and the timeout is set to such a large number? Thanks for your time, i really appreciate it.
  • einpoklum
    einpoklum over 10 years
    So, as other answers suggest, it is actually possible... can you rephrase your answer?
  • Die in Sente
    Die in Sente about 10 years
    Actually, on Windows any device with a WDDM driver will have the watchdog timer problem, whether it has a display attached or not. The NVIDA Tesla cards work around this by having a completely different type of driver (the TCC or Tesla Compute Cluster) driver, which doesn't identify the GPU to the OS as display adapter. If you just plug in a second video card (Radeon or GeForce) with no displays attached, it will still be recognized by the OS as a WDDM display adapter device, and the watchdog timer will still apply.
  • Glenn Maynard
    Glenn Maynard about 9 years
    It's definitely wrong to say that the watchdog shouldn't be disabled. The watchdog is completely broken: it triggers when single-stepping in the debugger, and it tends to completely freeze the system in multi-monitor/displayport configurations, which isn't any help to anyone.
  • Die in Sente
    Die in Sente about 9 years
    @Glenn. The NSight Cuda debugger has a software preemption mode so that it will not trigger the TDR while you're single-stepping with the debugger. Look for it under the NSight options menu. If you're using a GPU that has a display attached, the debugger will use that mode automatically. If you're using a GPU that doesn't have a display attached, then turning off the TDR or setting it to a really long value is reasonable.
  • Glenn Maynard
    Glenn Maynard about 9 years
    Given that the watchdog hard-crashes my whole system (with the lovely side-effect of making two of my monitors flash spastically, and making my speakers blast DMA loop noise), I think I'll stick with turning it off.
  • Die in Sente
    Die in Sente about 9 years
    @Glenn Unless you're still running Windows XP, a TDR should NOT hard-crash your whole system. It should just reset/restart the WDDM driver. The displays should blank out for a second or two and come back. Of course, any apps (Cuda or graphics) that were using the GPU will loose context and probably crash, but the symptoms you're describing should NOT happen with a TDR.
  • Glenn Maynard
    Glenn Maynard about 9 years
    @DieinSente Idealism is nice, but in the real world, it sure does crash Windows 7 for me.
  • Die in Sente
    Die in Sente about 9 years
    @Glenn Whatever works for you. But the WHOLE IDEA of TDR is for the OS to recover from a hung GPU and avoid exactly what you're experiencing. There must be something unusual wrong with your particular system that is causing a kernel-mode driver to crash. Round up the usual suspects.
  • Robadob
    Robadob about 8 years
    I can certainly agree with @GlennMaynard that sometimes a TDR timeout will lockup my machine and require me to reboot my machine (either it resets, or hasn't recovered after 2 minutes like this). However sometimes it also manages to recover. My personal speculation is that increasing TdrDdiDelay may fix this, as this appears to be the time limit for the WDDM driver to reset (particularly demanding work may cause it to take longer than the default of 5 seconds?). Details of TdrDdiDelay here: msdn.microsoft.com/en-us/library/windows/hardware/…
  • user3667089
    user3667089 over 7 years
    This answer saves my life. I wasn't able to figure out why the kernel is failing randomly at difference places.
  • DDRRSS
    DDRRSS over 3 years
    THANKS but how to turn off DPC watchdog for all drivers, not just for "Display" (via registry editor)?