How do I create a Java string from the contents of a file?

1,578,429

Solution 1

Read all text from a file

Java 11 added the readString() method to read small files as a String, preserving line terminators:

String content = Files.readString(path, StandardCharsets.US_ASCII);

For versions between Java 7 and 11, here's a compact, robust idiom, wrapped up in a utility method:

static String readFile(String path, Charset encoding)
  throws IOException
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}

Read lines of text from a file

Java 7 added a convenience method to read a file as lines of text, represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

List<String> lines = Files.readAllLines(Paths.get(path), encoding);

Java 8 added the Files.lines() method to produce a Stream<String>. Again, this method is lossy because line separators are stripped. If an IOException is encountered while reading the file, it is wrapped in an UncheckedIOException, since Stream doesn't accept lambdas that throw checked exceptions.

try (Stream<String> lines = Files.lines(path, encoding)) {
  lines.forEach(System.out::println);
}

This Stream does need a close() call; this is poorly documented on the API, and I suspect many people don't even notice Stream has a close() method. Be sure to use an ARM-block as shown.

If you are working with a source other than a file, you can use the lines() method in BufferedReader instead.

Memory utilization

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String> is one way to do this, if your input "records" happen to be individual lines. (Using the readLine() method of BufferedReader is the procedural equivalent to this approach.)

Character encoding

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

The StandardCharsets class defines some constants for the encodings required of all Java runtimes:

String content = readFile("test.txt", StandardCharsets.UTF_8);

The platform default is available from the Charset class itself:

String content = readFile("test.txt", Charset.defaultCharset());

Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

Solution 2

If you're willing to use an external library, check out Apache Commons IO (200KB JAR). It contains an org.apache.commons.io.FileUtils.readFileToString() method that allows you to read an entire File into a String with one line of code.

Example:

import java.io.*;
import java.nio.charset.*;
import org.apache.commons.io.*;

public String readFile() throws IOException {
    File file = new File("data.txt");
    return FileUtils.readFileToString(file, StandardCharsets.UTF_8);
}

Solution 3

A very lean solution based on Scanner:

Scanner scanner = new Scanner( new File("poem.txt") );
String text = scanner.useDelimiter("\\A").next();
scanner.close(); // Put this call in a finally block

Or, if you want to set the charset:

Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" );
String text = scanner.useDelimiter("\\A").next();
scanner.close(); // Put this call in a finally block

Or, with a try-with-resources block, which will call scanner.close() for you:

try (Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" )) {
    String text = scanner.useDelimiter("\\A").next();
}

Remember that the Scanner constructor can throw an IOException. And don't forget to import java.io and java.util.

Source: Pat Niemeyer's blog

Solution 4

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;

Java 7

String content = new String(Files.readAllBytes(Paths.get("readMe.txt")), StandardCharsets.UTF_8);

Java 11

String content = Files.readString(Paths.get("readMe.txt"));

Solution 5

If you're looking for an alternative that doesn't involve a third-party library (e.g. Commons I/O), you can use the Scanner class:

private String readFile(String pathname) throws IOException {

    File file = new File(pathname);
    StringBuilder fileContents = new StringBuilder((int)file.length());        

    try (Scanner scanner = new Scanner(file)) {
        while(scanner.hasNextLine()) {
            fileContents.append(scanner.nextLine() + System.lineSeparator());
        }
        return fileContents.toString();
    }
}
Share:
1,578,429
OscarRyz
Author by

OscarRyz

Software Developer who happens to like writing code. Here are some interesting answers you might like to upvote :") Why java people frequently consume exception silently ? Coding in Other (Spoken) Languages How to create an string from the contents of a file History of Objective-C square brackets (as I remember it) ( visible only to &gt;10k users )

Updated on February 13, 2022

Comments

  • OscarRyz
    OscarRyz about 2 years

    I've been using the idiom below for some time now. And it seems to be the most wide-spread, at least on the sites I've visited.

    Is there a better/different way to read a file into a string in Java?

    private String readFile(String file) throws IOException {
        BufferedReader reader = new BufferedReader(new FileReader (file));
        String         line = null;
        StringBuilder  stringBuilder = new StringBuilder();
        String         ls = System.getProperty("line.separator");
    
        try {
            while((line = reader.readLine()) != null) {
                stringBuilder.append(line);
                stringBuilder.append(ls);
            }
    
            return stringBuilder.toString();
        } finally {
            reader.close();
        }
    }
    
    • OscarRyz
      OscarRyz over 15 years
      Can anyone explain me in a very simple way what's with the NIO? Each time I read about itI get lost in the nth mention of channel :(
    • Henrik Paul
      Henrik Paul over 15 years
      do remember that it's not guaranteed that the line separator in the file isn't necessary the same as the system's line separator.
    • Deep
      Deep over 12 years
      Code above has a bug of adding extra new line char at the last line. It should be something like following if(line = reader.readLine() ) != null){ stringBuilder.append( line ); } while (line = reader.readLine() ) != null) { stringBuilder.append( ls ); stringBuilder.append( line ); }
    • Val
      Val over 12 years
      Java 7 introduces byte[] Files.readAllBytes(file); To those, who suggest the 'one-line' Scanner solution: Don't yo need to close it?
    • Bill K
      Bill K about 9 years
      @OscarRyz The biggest change for me is that NIO allows you to listen to many ports without allocating a thread for each. Not a problem unless you want to send a packet to every machine in a class B network address space (65k addresses) to see what exists, Windows runs out of threads at around 20k (Found this out solving exactly this problem--discovery of a class A/B network, before NIO it was tough).
    • Rajesh Goel
      Rajesh Goel almost 7 years
      If you see the Files.readAllBytes() implementation, you will notice it is using a channel which is closeable. So no need to close it explicitly.
    • Piko
      Piko almost 7 years
      With the advent of Groovy, you can read the file thus: return new File( file).text()
    • Love Bisaria
      Love Bisaria over 6 years
      Linking another StackOverflow link, which find is well explained: stackoverflow.com/questions/14169661/…
    • user207421
      user207421 about 5 years
      @Deep The last line in a text file is usually line-terminated, so what you describe as a bug isn't one, and your code has the bug of removing all the line terminators.
    • Alan
      Alan almost 3 years
      Please accept an answer to your question and help put this to rest.
    • Franz D.
      Franz D. over 2 years
      To all those poor souls who recommend using byte-based methods when obviously text should be handled: Our world will be hell as long as you persist in your ignorance. (I mean I'm lenient with 90s legacy code in this respect, but Goddammit we're in 2021, and globalization and non-ASCII characters is something.)
    • OscarRyz
      OscarRyz over 2 years
      @FranzD. What do you think is used to store that text in a file?
    • Franz D.
      Franz D. over 2 years
      @OscarRyz: Well, bytes, my dear Oscar. But byte-based methods tend not to handle to intricacies of byte <-> character conversions appropriately. And while that might work if you test your code with some ASCII or maybe even Latin-1, it will fail horribly and cause hours of work and frustration as soon as someone tries to read/write Chinese or some other "minor" (in THEIR world) language. Most of my former colleagues who proudly called themselves "software engineers" did neither know nor care about UTF-16 surrogates, and yes, I do call that ignorant, because that's what it is.
    • OscarRyz
      OscarRyz over 2 years
      @Franz D. Good, then you read bytes and decode using the appropriate character encoding. You're wrongly assuming the file would be encoding using UTF-16 but it could be literally anything else. It's strongly recommended to use UTF-8 for anything nowadays. Read the accepted answer, has very useful information.
  • OscarRyz
    OscarRyz over 15 years
    Yeap. It makes the "high" level language take a different meaning. Java is high level compared with C but low compared with Python or Ruby
  • OscarRyz
    OscarRyz about 14 years
    I think this has the inconvenience os using the platform default encoding. +1 anyway :)
  • Vishy
    Vishy about 14 years
    This will change the newlines to the default newline choise. This may be desirable, or unintended.
  • Sébastien Nussbaumer
    Sébastien Nussbaumer over 13 years
    Note : after exercising a bit that code, I found out that you can't reliably delete the file right after reading it with this method, which may be a non issue in some case, but not mine. May it be in relation with this issue : bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154 ? I finally went with the proposition of Jon Skeet which doesn't suffer from this bug. Anyways, I just wanted to give the info, for other people, just in case...
  • Dónal
    Dónal almost 13 years
    Agree that Java is long on high-level abstractions but short on convenience methods
  • Pablo Grisafi
    Pablo Grisafi over 12 years
    \\A works because there is no "other beginning of file", so you are in fact read the last token...which is also the first. Never tried with \\Z. Also note you can read anything that is Readable , like Files, InputStreams, channels...I sometimes use this code to read from the display window of eclipse, when I'm not sure if I'm reading one file or another...yes, classpath confuses me.
  • ceving
    ceving almost 12 years
    I seems to me that the finally block does not know variables defined in the try block. javac 1.6.0_21 throws the error cannot find symbol.
  • earcam
    earcam over 11 years
    Scanner implements Closeable (it invokes close on the source) - so while elegant it shouldn't really be a one-liner. The default size of the buffer is 1024, but Scanner will increase the size as necessary (see Scanner#makeSpace())
  • wau
    wau about 11 years
    This code may give unpredictable results. According to the documentation of the available() method, there is no guarantee that the end of file is reached in the event that the method returns 0. In that case you might end up with an incomplete file. What's worse, the number of bytes actually read can be smaller than the value returned by available(), in which case you get corrupted output.
  • assafmo
    assafmo about 11 years
    or new String(Files.readAllBytes(Paths.get(filename))); :-)
  • Thorn
    Thorn about 11 years
    True, Java has an insane number of ways of dealing with Files and many of them seem complicated. But this is fairly close to what we have in higher level languages: byte[] bytes = Files.readAllBytes(someFile.toPath());
  • Mohamed Taher Alrefaie
    Mohamed Taher Alrefaie about 11 years
    This code has casting from long to int which could pop up some crazy behaviour with big files. Has extra spaces and where do you close the inputstream?
  • Bryan Larson
    Bryan Larson almost 11 years
    Forgive me for reviving a comment this old, but did you mean to pass in a String object called "file", or should that be a File object instead?
  • Dan Dyer
    Dan Dyer almost 11 years
    Rolled back the edit to this answer because the point was to narrow the scope of the line variable. The edit declared it twice, which would be a compile error.
  • Jonik
    Jonik over 10 years
    @M-T-A: The stream is closed, note the use of Closer in CharSource. The code in the answer isn't the actual, current Guava source.
  • Ari
    Ari almost 7 years
    if the lines inside the library are not counted.
  • Patrick Parker
    Patrick Parker about 6 years
    In the first case you might be adding an extra newline at the end. in the second case you might be omitting one. So both are equally wrong. See this article
  • mryan
    mryan over 5 years
    Why, oh why, introduce new methods that rely on the default charset in 2018 ?
  • leventov
    leventov over 5 years
    @mryan this method doesn't rely on the default system charset. It defaults to UTF-8, that is fine.
  • mryan
    mryan over 5 years
    @leventov you're right ! so does Files.readAllLines ! that makes the files API not very consistent with older methods but it's for the better :)
  • mauron85
    mauron85 over 5 years
    Have you even tried your own code? You've defined reader in try/catch block, so it won't be accessible in finally block.
  • Jean-Christophe Blanchard
    Jean-Christophe Blanchard over 5 years
    Duplicate of Moritz Petersen answer who wrote:String content = new String(Files.readAllBytes(Paths.get(filename)), "UTF-8");
  • Thufir
    Thufir over 5 years
    example charset to invoke?
  • Harshal Parekh
    Harshal Parekh over 3 years
    Great answer. +1. But this answer is 12 years old. Java now has try-with-resources.
  • tmoschou
    tmoschou over 2 years
    The stream returned by Files.lines(Paths.get("file.txt")) is not closed and is a resource leak. You should wrap in a try-with-resources block.