Cancelling boost asio deadline timer safely

10,877

The cancellation is safe.

It's just not robust. You didn't account for the case when the timer wasn't pending. You cancel it once, then, but it will just start a new async wait once the completion handler is invoked.

What follows is my detailed steps on how I traced the issue.

SUMMARY TL;DR

Cancelling a time only cancels asynchronous operations in flight.

If you want to shutdown an asynchronous call chain, you'll have to use additional logic for that. An example is given below.

Handler Tracking

Enabling with

#define BOOST_ASIO_ENABLE_HANDLER_TRACKING 1

This produces output that can be visualized with boost/libs/asio/tools/handlerviz.pl:

A successful trace

enter image description here

As you can see, the async_wait is in-flight when the cancellation happens.

A "bad" trace

(truncated because it would run infinitely)

enter image description here

Note how the completion handler sees cc=system:0, not cc=system:125 (for operation_aborted). This is a symptom of the fact that the posted cancel did not actually "take". The only logical explanation (not visible in the diagram) is that the timer had already expired before the cancel gets invoked.

Let's compare the raw traces¹

enter image description here

¹ removing the noisy difference

Detecting It

So, we have a lead. Can we detect it?

    timer.get_io_service().post([](){
        std::cerr << "tid: " << std::this_thread::get_id() << ", cancelling in post\n";
        if (timer.expires_from_now() >= std::chrono::steady_clock::duration(0)) {
            timer.cancel();
        } else {
            std::cout << "PANIC\n";
            timer.cancel();
        }
    });

Prints:

tid: 140113177143232, i: 0, waiting for thread to join()
tid: 140113177143232, i: 1, waiting for thread to join()
tid: 140113177143232, i: 2, waiting for thread to join()
tid: 140113177143232, i: 3, waiting for thread to join()
tid: 140113177143232, i: 4, waiting for thread to join()
tid: 140113177143232, i: 5, waiting for thread to join()
tid: 140113177143232, i: 6, waiting for thread to join()
tid: 140113177143232, i: 7, waiting for thread to join()
tid: 140113177143232, i: 8, waiting for thread to join()
tid: 140113177143232, i: 9, waiting for thread to join()
tid: 140113177143232, i: 10, waiting for thread to join()
tid: 140113177143232, i: 11, waiting for thread to join()
tid: 140113177143232, i: 12, waiting for thread to join()
tid: 140113177143232, i: 13, waiting for thread to join()
tid: 140113177143232, i: 14, waiting for thread to join()
tid: 140113177143232, i: 15, waiting for thread to join()
tid: 140113177143232, i: 16, waiting for thread to join()
tid: 140113177143232, i: 17, waiting for thread to join()
tid: 140113177143232, i: 18, waiting for thread to join()
tid: 140113177143232, i: 19, waiting for thread to join()
tid: 140113177143232, i: 20, waiting for thread to join()
tid: 140113177143232, i: 21, waiting for thread to join()
tid: 140113177143232, i: 22, waiting for thread to join()
tid: 140113177143232, i: 23, waiting for thread to join()
tid: 140113177143232, i: 24, waiting for thread to join()
tid: 140113177143232, i: 25, waiting for thread to join()
tid: 140113177143232, i: 26, waiting for thread to join()
PANIC

Could we communicate the "super-cancellation" in another, clearer way? We have ... just the timer object to work with, of course:

Signaling Shutdown

The timer object doesn't have a lot of properties to work with. There's no close() or similar, like on a socket, that can be used to put the timer in some kind of invalid state.

However, there's the expiry timepoint, and we can use a special domain value to signal "invalid" for our application:

timer.get_io_service().post([](){
    std::cerr << "tid: " << std::this_thread::get_id() << ", cancelling in post\n";
    // also cancels:
    timer.expires_at(Timer::clock_type::time_point::min());
});

This "special value" is easy to handle in the completion handler:

void handle_timeout(const boost::system::error_code& ec)
{
    if (!ec) {
        started = true;
        if (timer.expires_at() != Timer::time_point::min()) {
            timer.expires_from_now(std::chrono::milliseconds(10));
            timer.async_wait(&handle_timeout);
        } else {
            std::cerr << "handle_timeout: detected shutdown\n";
        }
    } 
    else if (ec != boost::asio::error::operation_aborted) {
        std::cerr << "tid: " << std::this_thread::get_id() << ", handle_timeout error " << ec.message() << "\n";
    }
}
Share:
10,877
hudac
Author by

hudac

Updated on June 05, 2022

Comments

  • hudac
    hudac almost 2 years

    I'm trying to cancel a boost::asio::basic_waitable_timer<std::chrono::steady_clock> safely.

    According to this answer, this code should do that work:

    timer.get_io_service().post([&]{timer.cancel();})
    

    I'm afraid it doesn't work for me.
    Am I doing something wrong?
    This is my code:

    #include <iostream>
    #include "boost/asio.hpp"
    #include <chrono>
    #include <thread>
    #include <random>
    
    boost::asio::io_service io_service;
    boost::asio::basic_waitable_timer<std::chrono::steady_clock> timer(io_service);
    std::atomic<bool> started;
    
    void handle_timeout(const boost::system::error_code& ec)
    {
        if (!ec) {
            started = true;
            std::cerr << "tid: " << std::this_thread::get_id() << ", handle_timeout\n";
            timer.expires_from_now(std::chrono::milliseconds(10));
            timer.async_wait(&handle_timeout);
        } else if (ec == boost::asio::error::operation_aborted) {
            std::cerr << "tid: " << std::this_thread::get_id() << ", handle_timeout aborted\n";
        } else {
            std::cerr << "tid: " << std::this_thread::get_id() << ", handle_timeout another error\n";
        }
    }
    
    int main() {
    
        std::cout << "tid: " << std::this_thread::get_id() << ", Hello, World!" << std::endl;
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_int_distribution<> dis(1, 100);
    
        for (auto i = 0; i < 1000; i++) {
    
            started = false;
            std::thread t([&](){
    
                timer.expires_from_now(std::chrono::milliseconds(0));
                timer.async_wait(&handle_timeout);
    
                io_service.run();
            });
    
            while (!started) {};
            auto sleep = dis(gen);
            std::cout << "tid: " << std::this_thread::get_id() << ", i: " << i << ", sleeps for " << sleep << " [ms]" << std::endl;
            std::this_thread::sleep_for(std::chrono::milliseconds(sleep));
            timer.get_io_service().post([](){
                std::cerr << "tid: " << std::this_thread::get_id() << ", cancelling in post\n";
                timer.cancel();
            });
    //      timer.cancel();
            std::cout << "tid: " << std::this_thread::get_id() << ", i: " << i << ", waiting for thread to join()" << std::endl;
            t.join();
            io_service.reset();
        }
    
        return 0;
    }
    

    This is the output:

    ...
    tid: 140737335076608, handle_timeout
    tid: 140737335076608, handle_timeout
    tid: 140737353967488, i: 2, waiting for thread to join()
    tid: 140737335076608, cancelling in post
    tid: 140737335076608, handle_timeout aborted
    tid: 140737353967488, i: 3, sleeps for 21 [ms]
    tid: 140737335076608, handle_timeout
    tid: 140737353967488, i: 3, waiting for thread to join()
    tid: 140737335076608, handle_timeout
    tid: 140737335076608, cancelling in post
    tid: 140737335076608, handle_timeout
    tid: 140737335076608, handle_timeout
    tid: 140737335076608, handle_timeout
    tid: 140737335076608, handle_timeout
    tid: 140737335076608, handle_timeout
    ...
    continue forever...

    As you can see, the timer.cancel() is being called from the appropriate thread:

    tid: 140737335076608, cancelling in post

    BUT there's no

    tid: 140737335076608, handle_timeout aborted

    Afterwards.

    Main waits forever.

  • hudac
    hudac about 7 years
    Wow, thanks! You wrote The cancellation is safe. - You meant using post(), right? Not ordinary timer.cancel() ?
  • sehe
    sehe about 7 years
    Indeed. Is thread safe, in that it doesn't cause a data race, so that behaviour is defined
  • Igor R.
    Igor R. about 7 years
    Nice workaround, but... don't you think there should be better cancellation function that hides all this mess in its implementation detail? Cancellation-related questions arise again and again...
  • sehe
    sehe about 7 years
    @IgorR. I guess I'd love a timer with a close() function like I describe. It shouldn't be hard to write one aggregating Asio's timers. In practice I usually have a separate "shutdown" flag/refcount so that I don't have this issue.
  • hudac
    hudac about 7 years
    @sehe, I want to have a rule of thumb; According to deadline_timer doc, shared object between threads isn't safe. Are you saying cancel() is thread safe specifically because its implementation? When I'm writing a stop() function which will stop things like deadline_timer/socket/stream descriptor/signal, should I use post() as a rule of thumb, in order to prevent undefined behaviour when different threads is calling this stop(), or all of this calls should be thread safe, as cancel() is? Thanks
  • sehe
    sehe about 7 years
    @hudac I'm just confirming that your use of it is threadsafe, I didn't actually say anything else. Your use of it is safe because you post it to the service and the service runs on a single thread, meaning you get "implicit strand" behaviour (no two handlers ever run at the same time).
  • sehe
    sehe about 7 years
    @hudac More specifically, that's not a rule of thumb once you run the service on more threads! In that case you need a strand to synchronize access to the service objects (like deadline_timer). See stackoverflow.com/questions/12794107/…. I hope this drives home the point that cancel() is not thread safe, as per the documentation (nobody said this).
  • hudac
    hudac about 7 years
    @sehe, do you have workaround such as this for cancelling boost::asio::signal_set safely? Or should I use some shutdown flag?
  • sehe
    sehe about 7 years
    @hudac I don't think I do (I usually just listen for INT/TERM once). You could of course simply signal_set.clear(...); (now when you get the signal 0, this means you should probably shutdown)
  • jean
    jean over 6 years
    How about wrap handle_timeout by strand and post timer.cancel () in same strand