When to use []byte or string in Go?

16,603

Solution 1

My advice would be to use string by default when you're working with text. But use []byte instead if one of the following conditions applies:

  • The mutability of a []byte will significantly reduce the number of allocations needed.

  • You are dealing with an API that uses []byte, and avoiding a conversion to string will simplify your code.

Solution 2

I've gotten the sense that in Go, more than in any other non-ML style language, the type is used to convey meaning and intended use. So, the best way to figure out which type to use is to ask yourself what the data is.

A string represents text. Just text. The encoding is not something you have to worry about and all operations work on a character by character basis, regardless of that a 'character' actually is.

An array represents either binary data or a specific encoding of that data. []byte means that the data is either just a byte stream or a stream of single byte characters. []int16 represents an integer stream or a stream of two byte characters.

Given that fact that pretty much everything that deals with bytes also has functions to deal with strings and vice versa, I would suggest that instead of asking what you need to do with the data, you ask what that data represents. And then optimize things once you figure out bottlenecks.

EDIT: This post is where I got the rationale for using type conversion to break up the string.

Solution 3

  1. One difference is that the returned []byte can be potentially reused to hold another/new data (w/o new memory allocation), while string cannot. Another one is that, in the gc implementation at least, string is a one word smaller entity than []byte. Can be used to save some memory when there is a lot of such items live.

  2. Casting a []byte to string for logging is not necessary. Typical 'text' verbs, like %s, %q work for string and []byte expressions equally. In the other direction the same holds for e.g. %x or % 02x.

  3. Depends on why is the concatenation performed and if the result is ever to be again combined w/ something/somewhere else afterwards. If that's the case then []byte may perform better.

Share:
16,603

Related videos on Youtube

Matt Joiner
Author by

Matt Joiner

About Me I like parsimonious code, with simple interfaces and excellent documentation. I'm not interested in enterprise, boiler-plate, or cookie-cutter nonsense. I oppose cruft and obfuscation. My favourite languages are Go, Python and C. I wish I was better at Haskell. Google+ GitHub Bitbucket Google code My favourite posts http://stackoverflow.com/questions/3609469/what-are-the-thread-limitations-when-working-on-linux-compared-to-processes-for/3705919#3705919 http://stackoverflow.com/questions/4352425/what-should-i-learn-first-before-heading-to-c/4352469#4352469 http://stackoverflow.com/questions/6167809/how-much-bad-can-be-done-using-register-variables-in-c/6168852#6168852 http://stackoverflow.com/questions/4141307/c-and-c-source-code-profiling-tools/4141345#4141345 http://stackoverflow.com/questions/3463207/how-big-can-a-malloc-be-in-c/3486163#3486163 http://stackoverflow.com/questions/4095637/memory-use-of-stl-data-structures-windows-vs-linux/4183178#4183178

Updated on June 24, 2022

Comments

  • Matt Joiner
    Matt Joiner about 2 years

    Frequently in writing Go applications, I find myself with the choice to use []byte or string. Apart from the obvious mutability of []byte, how do I decide which one to use?

    I have several use cases for examples:

    1. A function returns a new []byte. Since the slice capacity is fixed, what reason is there to not return a string?
    2. []byte are not printed as nicely as string by default, so I often find myself casting to string for logging purposes. Should it always have been a string?
    3. When prepending []byte, a new underlying array is always created. If the data to prepend is constant, why should this not be a string?
    • Asherah
      Asherah about 12 years
      This sounds like it depends on your use. If you plan to be doing string-y ops with them, then call it a string. If it's just opaque data being shuffled around, why not []byte? It comes down to the use cases.
    • Esko Luontola
      Esko Luontola about 12 years
      And if you need to process individual characters, instead of a stream of UTF-8 encoded bytes, then convert it to runes first (32 bit ints IIRC).
    • newacct
      newacct about 12 years
      then there is also []rune, which is the best to represent a mutable string of characters
    • swdunlop
      swdunlop about 12 years
      The mutability really is the key difference between a string and a slice of bytes or runes. There are many subtle nuances when dealing with slices if the original array is modified -- such as cases where a slice of that array was used as a key in a map, or stored elsewhere. Try to avoid falling into the habit of thinking of slices as a fixed tuple -- they are really more like C pointers with length.
    • JAB
      JAB about 12 years
      Go has a type named rune? My opinion of Google has gone up a bit.
  • Matt Joiner
    Matt Joiner about 12 years
    FWIW, The %v specifier treats []byte as an array of integers, this is the default for non -f methods. Also what about functions that return slices, rather than taking them as arguments?
  • animaacija
    animaacija over 6 years
    what do you mean by "string is a one word smaller entity than []byte" ?
  • Adirio
    Adirio almost 6 years
    It has been nearlly a year since @animaacija comment but i figure I'll answer it: as strings are inmutable, they are represented as a pointer and a length internally while []bytes also require the capacity.
  • pkaramol
    pkaramol over 4 years
    Shouldn't the case of whether we are dealing or not exclusively with ASCII characters be factored in? Cause if not, a byte cannot hold such a character and we need the rune to be used?
  • andybalholm
    andybalholm over 4 years
    A []byte can hold non-ASCII characters if they are encoded as bytes (for example. in UTF-8).
  • pkaramol
    pkaramol over 4 years
    I assume in that case we may not have 1-1 char-byte mapping (UTF-8 codepoints may be represented from 1 to 4 bytes).
  • andybalholm
    andybalholm over 4 years
    That's right. But that's the way it is with a string too.
  • hookenz
    hookenz about 4 years
    I've often found the conversion to string is convenient because of missing scanning functions that don't work on byte arrays. But coverting to string is unnecessary copying. When working with large files it's better not to convert to a string of possible. It gives a massive performance boost