Java: StringBuffer to byte[] without toString

12,302

Solution 1

As many have already suggested, you can use the CharBuffer class, but allocating a new CharBuffer would only make your problem worse.

Instead, you can directly wrap your StringBuilder in a CharBuffer, since StringBuilder implements CharSequence:

Charset charset = StandardCharsets.UTF_8;
CharsetEncoder encoder = charset.newEncoder();

// No allocation performed, just wraps the StringBuilder.
CharBuffer buffer = CharBuffer.wrap(stringBuilder);

ByteBuffer bytes = encoder.encode(buffer);

EDIT: Duarte correctly points out that the CharsetEncoder.encode method may return a buffer whose backing array is larger than the actual data—meaning, its capacity is larger than its limit. It is necessary either to read from the ByteBuffer itself, or to read a byte array out of the ByteBuffer that is guaranteed to be the right size. In the latter case, there's no avoiding having two copies of the bytes in memory, albeit briefly:

ByteBuffer byteBuffer = encoder.encode(buffer);

byte[] array;
int arrayLen = byteBuffer.limit();
if (arrayLen == byteBuffer.capacity()) {
    array = byteBuffer.array();
} else {
    // This will place two copies of the byte sequence in memory,
    // until byteBuffer gets garbage-collected (which should happen
    // pretty quickly once the reference to it is null'd).

    array = new byte[arrayLen];
    byteBuffer.get(array);
}

byteBuffer = null;

Solution 2

If you're willing to replace the StringBuilder with something else, yet another possibility would be a Writer backed by a ByteArrayOutputStream:

ByteArrayOutputStream bout = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(bout);
try {
    writer.write("String A");
    writer.write("String B");
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(bout.toByteArray());

try {
    writer.write("String C");
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(bout.toByteArray());

As always, your mileage may vary.

Solution 3

For starters, you should probably be using StringBuilder, since StringBuffer has synchronization overhead that's usually unnecessary.

Unfortunately, there's no way to go directly to bytes, but you can copy the chars into an array or iterate from 0 to length() and read each charAt().

Solution 4

Unfortunately, the answers above that deal with ByteBuffer's array() method are a bit buggy... The trouble is that the allocated byte[] is likely to be bigger than what you'd expect. Thus, there will be trailing NULL bytes that are hard to get rid off, since you can't "re-size" arrays in Java.

Here is an article that explains this in more detail: http://worldmodscode.wordpress.com/2012/12/14/the-java-bytebuffer-a-crash-course/

Share:
12,302

Related videos on Youtube

mmascosta
Author by

mmascosta

Updated on October 22, 2022

Comments

  • mmascosta
    mmascosta over 1 year

    The title says it all. Is there any way to convert from StringBuilder to byte[] without using a String in the middle?

    The problem is that I'm managing REALLY large strings (millions of chars), and then I have a cycle that adds a char in the end and obtains the byte[]. The process of converting the StringBuffer to String makes this cycle veryyyy very very slow.

    Is there any way to accomplish this? Thanks in advance!

    • tolitius
      tolitius over 10 years
      why not use CharBuffer instead? And then do "charBuffer.array()"?
    • Vidya
      Vidya over 10 years
      Can you clarify why you need to store all these big strings in memory? Is this something a user is waiting on? Could this instead become a MapReduce or Spark job? I just wonder if maybe this question is a symptom of an architectural design smell.
  • Vishy
    Vishy over 10 years
    +1 And the Javadoc for StringBuffer says you should use StringBuilder for nearly ten years now.