Fastest way to convert a possibly-null-terminated ascii byte[] to a string?

32,904

Solution 1

Any reason not to use the String(sbyte*, int, int) constructor? If you've worked out which portion of the buffer you need, the rest should be simple:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset, int length)
{
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, length);
       }
    }
}

If you need to look first:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset)
{
    int end = offset;
    while (end < buffer.Length && buffer[end] != 0)
    {
        end++;
    }
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, end - offset);
       }
    }
}

If this truly is an ASCII string (i.e. all bytes are less than 128) then the codepage problem shouldn't be an issue unless you've got a particularly strange default codepage which isn't based on ASCII.

Out of interest, have you actually profiled your application to make sure that this is really the bottleneck? Do you definitely need the absolute fastest conversion, instead of one which is more readable (e.g. using Encoding.GetString for the appropriate encoding)?

Solution 2

Oneliner (assuming the buffer actually contains ONE well formatted null terminated string):

String MyString = Encoding.ASCII.GetString(MyByteBuffer).TrimEnd((Char)0);

Solution 3

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestProject1
{
    class Class1
    {
    static public string cstr_to_string( byte[] data, int code_page)
    {
        Encoding Enc = Encoding.GetEncoding(code_page);  
        int inx = Array.FindIndex(data, 0, (x) => x == 0);//search for 0
        if (inx >= 0)
          return (Enc.GetString(data, 0, inx));
        else 
          return (Enc.GetString(data)); 
    }

    }
}

Solution 4

I'm not sure of the speed, but I found it easiest to use LINQ to remove the nulls before encoding:

string s = myEncoding.GetString(bytes.TakeWhile(b => !b.Equals(0)).ToArray());

Solution 5

s = s.Substring(0, s.IndexOf((char) 0));
Share:
32,904

Related videos on Youtube

Wayne Bloss
Author by

Wayne Bloss

Updated on July 09, 2022

Comments

  • Wayne Bloss
    Wayne Bloss almost 2 years

    I need to convert a (possibly) null terminated array of ascii bytes to a string in C# and the fastest way I've found to do it is by using my UnsafeAsciiBytesToString method shown below. This method uses the String.String(sbyte*) constructor which contains a warning in it's remarks:

    "The value parameter is assumed to point to an array representing a string encoded using the default ANSI code page (that is, the encoding method specified by Encoding.Default).

    Note: * Because the default ANSI code page is system-dependent, the string created by this constructor from identical signed byte arrays may differ on different systems. * ...

    * If the specified array is not null-terminated, the behavior of this constructor is system dependent. For example, such a situation might cause an access violation. * "

    Now, I'm positive that the way the string is encoded will never change... but the default codepage on the system that my app is running on might change. So, is there any reason that I shouldn't run screaming from using String.String(sbyte*) for this purpose?

    using System;
    using System.Text;
    
    namespace FastAsciiBytesToString
    {
        static class StringEx
        {
            public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
            {
                int maxIndex = offset + maxLength;
    
                for( int i = offset; i < maxIndex; i++ )
                {
                    /// Skip non-nulls.
                    if( buffer[i] != 0 ) continue;
                    /// First null we find, return the string.
                    return Encoding.ASCII.GetString(buffer, offset, i - offset);
                }
                /// Terminating null not found. Convert the entire section from offset to maxLength.
                return Encoding.ASCII.GetString(buffer, offset, maxLength);
            }
    
            public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
            {
                string result = null;
    
                unsafe
                {
                    fixed( byte* pAscii = &buffer[offset] )
                    { 
                        result = new String((sbyte*)pAscii);
                    }
                }
    
                return result;
            }
        }
    
        class Program
        {
            static void Main(string[] args)
            {
                byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };
    
                string result = asciiBytes.AsciiBytesToString(3, 6);
    
                Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);
    
                result = asciiBytes.UnsafeAsciiBytesToString(3);
    
                Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);
    
                /// Non-null terminated test.
                asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };
    
                result = asciiBytes.UnsafeAsciiBytesToString(3);
    
                Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);
    
                Console.ReadLine();
            }
        }
    }
    
    • Wayne Bloss
      Wayne Bloss over 15 years
      Whoops, just realized something...there's no way for me to specify a max length when using String.String(sbyte*) which basically means death to using the constructor for the purpose of reading out of a ring-buffer since it could keep reading past the max length into the next segment!
  • Wayne Bloss
    Wayne Bloss over 15 years
    Thanks for your reply. I did not use String(sbyte*, int, int) because it does not stop at the first null that it finds, instead it converts every null to a space just like Encoding.ASCII.GetString().
  • Wayne Bloss
    Wayne Bloss over 15 years
    Oh, also it's not a bottleneck or anything. I'm just a nerd with nothing better to do on the weekend :)
  • Marlon
    Marlon over 12 years
    This is extremely slow since it's creating a new string instance for every character. Coincidentally, I have made this exact same code before and this was happened to be my bottleneck (and the strings were at most 255 characters in length!) This is definitely not what the OP wants in terms of speed.
  • eselk
    eselk almost 11 years
    Thanks, just what I needed. I suspect for a lot of legacy apps like mine, code page will be 1252, and this will be exactly what they need.
  • Jeff
    Jeff over 9 years
    This doesn't handle null termination.
  • DanielHsH
    DanielHsH over 9 years
    private static char[] string2chars(string S){ S += '\0'; // Add null terminator for C strings. byte[] bytes = System.Text.Encoding.UTF8.GetBytes(S); // Since we convert to bytes the '\0' is crucial, otherwise it will be lost char[] chars = System.Text.Encoding.UTF8.GetChars(bytes); // Can use ASCII instead return chars; }
  • DanielHsH
    DanielHsH over 9 years
    Jeff - the code above fixes the null termination issue
  • AaA
    AaA about 9 years
    This only works if buffer only contains one single string starting from index 0 of the array
  • Rick
    Rick almost 9 years
    what happens if there's no null termination? When would Enc.GetString stop?
  • Vladimir Poslavskiy
    Vladimir Poslavskiy almost 9 years
    @Rick its stop at end of Array "data".
  • user666412
    user666412 about 8 years
    This code has yielded me an error: "Cannot take the address of, get the size of, or declare a pointer to a managed type 'byte[]' (CS0208)". To fix it, I removed the & from &buffer
  • Arek
    Arek over 7 years
    This does not make it terminate after null character. The resulting string has length of whole buffer and contains \0 character and further bytes.
  • Jon Skeet
    Jon Skeet over 7 years
    @Arek: I was assuming the OP would be doing that. Will edit to clarify.
  • Jon Skeet
    Jon Skeet over 7 years
    @Arek: Actually, there's more to it... looking now.
  • Timeless
    Timeless over 6 years
    while (offset < buffer.Length..., should be offset? or end.
  • BoiseBaked
    BoiseBaked almost 5 years
    Best answer! To complete the answer some, don't forget "using System.Linq;" and without myEncoding: "String s = Encoding.UTF8.GetString(rbuf.TakeWhile(b => !b.Equals(0)).ToArray());" where rbuf is a Byte[].
  • hongxu
    hongxu over 2 years
    .TrimEnd makes another copy so this is not very fast.