Why cgo's performance is so slow? is there something wrong with my testing code?

c performance go cgo

11,119

Solution 1

As you've discovered, there is fairly high overhead in calling C/C++ code via CGo. So in general, you are best off trying to minimise the number of CGo calls you make. For the above example, rather than calling a CGo function repeatedly in a loop it might make sense to move the loop down to C.

There are a number of aspects of how the Go runtime sets up its threads that can break the expectations of many pieces of C code:

Goroutines run on a relatively small stack, handling stack growth through segmented stacks (old versions) or by copying (new versions).
Threads created by the Go runtime may not interact properly with libpthread's thread local storage implementation.
The Go runtime's UNIX signal handler may interfere with traditional C or C++ code.
Go reuses OS threads to run multiple Goroutines. If the C code called a blocking system call or otherwise monopolised the thread, it could be detrimental to other goroutines.

For these reasons, CGo picks the safe approach of running the C code in a separate thread set up with a traditional stack.

If you are coming from languages like Python where it isn't uncommon to rewrite code hotspots in C as a way to speed up a program you will be disappointed. But at the same time, there is a much smaller gap in performance between equivalent C and Go code.

In general I reserve CGo for interfacing with existing libraries, possibly with small C wrapper functions that can reduce the number of calls I need to make from Go.

Solution 2

Update for James's answer: it seems that there's no thread switch in current implementation.

See this thread on golang-nuts:

There's always going to be some overhead. It's more expensive than a simple function call but significantly less expensive than a context switch (agl is remembering an earlier implementation; we cut out the thread switch before the public release). Right now the expense is basically just having to do a full register set switch (no kernel involvement). I'd guess it's comparable to ten function calls.

See also this answer which links "cgo is not Go" blog post.

C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.

Thus, cgo has an overhead because it performs a stack switch, not thread switch.

It saves and restores all registers when C function is called, while it's not required when Go function or assembly function is called.

Besides that, cgo's calling conventions forbid passing Go pointers directly to C code, and common workaround is to use C.malloc, and so introduce additional allocations. See this question for details.

11,119

Author by

习明昊

Updated on June 14, 2022

Comments

习明昊 about 2 years

I'm doing a test: compare excecution times of cgo and pure Go functions run 100 million times each. The cgo function takes longer time compared to the Golang function, and I am confused with this result. My testing code is:

package main

import (
    "fmt"
    "time"
)

/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void show() {

}

*/
// #cgo LDFLAGS: -lstdc++
import "C"

//import "fmt"

func show() {

}

func main() {
    now := time.Now()
    for i := 0; i < 100000000; i = i + 1 {
        C.show()
    }
    end_time := time.Now()

    var dur_time time.Duration = end_time.Sub(now)
    var elapsed_min float64 = dur_time.Minutes()
    var elapsed_sec float64 = dur_time.Seconds()
    var elapsed_nano int64 = dur_time.Nanoseconds()
    fmt.Printf("cgo show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
        elapsed_min, elapsed_sec, elapsed_nano)

    now = time.Now()
    for i := 0; i < 100000000; i = i + 1 {
        show()
    }
    end_time = time.Now()

    dur_time = end_time.Sub(now)
    elapsed_min = dur_time.Minutes()
    elapsed_sec = dur_time.Seconds()
    elapsed_nano = dur_time.Nanoseconds()
    fmt.Printf("go show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
        elapsed_min, elapsed_sec, elapsed_nano)

    var input string
    fmt.Scanln(&input)
}

and result is:

cgo show function elasped 0.368096 minutes or 
elapsed 22.085756 seconds or 
elapsed 22085755775 nanoseconds

go show function elasped 0.000654 minutes or 
elapsed 0.039257 seconds or 
elapsed 39257120 nanoseconds

The results show that invoking the C function is slower than the Go function. Is there something wrong with my testing code?

My system is : mac OS X 10.9.4 (13E28)

习明昊 over 9 years

Thanks ,it does help me a lot!
gavv about 8 years

This is probably outdated: groups.google.com/forum/#!topic/golang-nuts/RTtMsgZi88Q