Why cgo's performance is so slow? is there something wrong with my testing code?

11,119

Solution 1

As you've discovered, there is fairly high overhead in calling C/C++ code via CGo. So in general, you are best off trying to minimise the number of CGo calls you make. For the above example, rather than calling a CGo function repeatedly in a loop it might make sense to move the loop down to C.

There are a number of aspects of how the Go runtime sets up its threads that can break the expectations of many pieces of C code:

  1. Goroutines run on a relatively small stack, handling stack growth through segmented stacks (old versions) or by copying (new versions).
  2. Threads created by the Go runtime may not interact properly with libpthread's thread local storage implementation.
  3. The Go runtime's UNIX signal handler may interfere with traditional C or C++ code.
  4. Go reuses OS threads to run multiple Goroutines. If the C code called a blocking system call or otherwise monopolised the thread, it could be detrimental to other goroutines.

For these reasons, CGo picks the safe approach of running the C code in a separate thread set up with a traditional stack.

If you are coming from languages like Python where it isn't uncommon to rewrite code hotspots in C as a way to speed up a program you will be disappointed. But at the same time, there is a much smaller gap in performance between equivalent C and Go code.

In general I reserve CGo for interfacing with existing libraries, possibly with small C wrapper functions that can reduce the number of calls I need to make from Go.

Solution 2

Update for James's answer: it seems that there's no thread switch in current implementation.

See this thread on golang-nuts:

There's always going to be some overhead. It's more expensive than a simple function call but significantly less expensive than a context switch (agl is remembering an earlier implementation; we cut out the thread switch before the public release). Right now the expense is basically just having to do a full register set switch (no kernel involvement). I'd guess it's comparable to ten function calls.

See also this answer which links "cgo is not Go" blog post.

C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.

Thus, cgo has an overhead because it performs a stack switch, not thread switch.

It saves and restores all registers when C function is called, while it's not required when Go function or assembly function is called.


Besides that, cgo's calling conventions forbid passing Go pointers directly to C code, and common workaround is to use C.malloc, and so introduce additional allocations. See this question for details.

Share:
11,119
习明昊
Author by

习明昊

Updated on June 14, 2022

Comments

  • 习明昊
    习明昊 about 2 years

    I'm doing a test: compare excecution times of cgo and pure Go functions run 100 million times each. The cgo function takes longer time compared to the Golang function, and I am confused with this result. My testing code is:

    package main
    
    import (
        "fmt"
        "time"
    )
    
    /*
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    void show() {
    
    }
    
    */
    // #cgo LDFLAGS: -lstdc++
    import "C"
    
    //import "fmt"
    
    func show() {
    
    }
    
    func main() {
        now := time.Now()
        for i := 0; i < 100000000; i = i + 1 {
            C.show()
        }
        end_time := time.Now()
    
        var dur_time time.Duration = end_time.Sub(now)
        var elapsed_min float64 = dur_time.Minutes()
        var elapsed_sec float64 = dur_time.Seconds()
        var elapsed_nano int64 = dur_time.Nanoseconds()
        fmt.Printf("cgo show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
            elapsed_min, elapsed_sec, elapsed_nano)
    
        now = time.Now()
        for i := 0; i < 100000000; i = i + 1 {
            show()
        }
        end_time = time.Now()
    
        dur_time = end_time.Sub(now)
        elapsed_min = dur_time.Minutes()
        elapsed_sec = dur_time.Seconds()
        elapsed_nano = dur_time.Nanoseconds()
        fmt.Printf("go show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
            elapsed_min, elapsed_sec, elapsed_nano)
    
        var input string
        fmt.Scanln(&input)
    }
    

    and result is:

    cgo show function elasped 0.368096 minutes or 
    elapsed 22.085756 seconds or 
    elapsed 22085755775 nanoseconds
    
    go show function elasped 0.000654 minutes or 
    elapsed 0.039257 seconds or 
    elapsed 39257120 nanoseconds
    

    The results show that invoking the C function is slower than the Go function. Is there something wrong with my testing code?

    My system is : mac OS X 10.9.4 (13E28)

  • 习明昊
    习明昊 over 9 years
    Thanks ,it does help me a lot!
  • gavv
    gavv about 8 years