How to read a file as a byte array in Scala
Solution 1
Java 7:
import java.nio.file.{Files, Paths}
val byteArray = Files.readAllBytes(Paths.get("/path/to/file"))
I believe this is the simplest way possible. Just leveraging existing tools here. NIO.2 is wonderful.
Solution 2
This should work (Scala 2.8):
val bis = new BufferedInputStream(new FileInputStream(fileName))
val bArray = Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte).toArray
Solution 3
The library scala.io.Source is problematic, DON'T USE IT in reading binary files.
The error can be reproduced as instructed here: https://github.com/liufengyun/scala-bug
In the file data.bin
, it contains the hexidecimal 0xea
, which is 11101010
in binary and should be converted to 234
in decimal.
The main.scala
file contain two ways to read the file:
import scala.io._
import java.io._
object Main {
def main(args: Array[String]) {
val ss = Source.fromFile("data.bin")
println("Scala:" + ss.next.toInt)
ss.close
val bis = new BufferedInputStream(new FileInputStream("data.bin"))
println("Java:" + bis.read)
bis.close
}
}
When I run scala main.scala
, the program outputs follows:
Scala:205
Java:234
The Java library generates correct output, while the Scala library not.
Solution 4
val is = new FileInputStream(fileName)
val cnt = is.available
val bytes = Array.ofDim[Byte](cnt)
is.read(bytes)
is.close()
Solution 5
You might also consider using scalax.io:
scalax.io.Resource.fromFile(fileName).byteArray
Related videos on Youtube
fgysin
Updated on March 13, 2021Comments
-
fgysin about 3 years
I can find tons of examples but they seem to either rely mostly on Java libraries or just read characters/lines/etc.
I just want to read in some file and get a byte array with scala libraries - can someone help me with that?
-
Philippe over 12 yearsI think relying on Java libraries is what (almost?) everyone would do, the Scala library included. See for instance the source code of scala.io.Source.
-
fgysin over 12 yearsI know Scala relies on Java. But what is the point of a language where I can not even do simple file i/o without using a different language?
-
Duncan McGregor over 12 yearsYou're not using a different language, just a standard JVM API that has proved good enough not to need replacing!
-
fgysin over 12 yearsHm yeah, you are probably right... Still, it feels like cheating. :)
-
Philippe over 12 yearsWell, how do you think the Java classes are implemented? Deep down, somewhere, there is a native method: it has just a signature, no Java implementation, and relies on an OS-specific C implementation. Isn't that cheating too? :)
-
Duncan McGregor over 12 yearsIt should be said that Scala on .Net does make this a more pressing issue.
-
fgysin over 12 years@Duncan McGregor: Good point, guess the transition isn't as smooth there...
-
fgysin over 12 years@Philippe: Sure, and using C is only cheating on assembly :P... What I meant is just, that the border between languages is usually rather clearly defined, Scala and Java sort of melt into each other.
-
Suma about 9 yearspossible duplicate of What is the proper way to code a read-while loop in Scala?
-
-
qu1j0t3 over 11 yearsI think this is a great example of wrapping a Java API function to get Stream semantics. Much appreciated.
-
BeniBela over 10 years
val bis = new java.io.BufferedInputStream(new java.io.FileInputStream(fileName));
if you do not have the java paths imported -
Max over 10 yearsUsing this approach, is closing the file also needed or is it implicit?
-
fengyun liu over 10 yearsIf I set the encoding to
Source.fromFile("data.bin", "ISO8859-1")
, it works well. -
Tony K. about 10 yearsYou need to close it yourself
-
Dibbeke over 9 yearsThis approach is slow, since it needs to process each and every byte. Ideally, I/O operations should be block-based.
-
akauppi over 9 yearsNoticed that the last actions on that repository are 6 years ago - is it still relevant?
-
fedesilva over 8 yearsI think that anyone not bound to jvm < 7 should use this.
-
Benjamin almost 7 yearsMaybe it's helpful, but really, this isn't an answer. Introducing a new problem in an answer is not constructive and belongs somewhere else.
-
morfizm over 6 yearsI benchmarked it comparing to buffered approach, it's about 500 times slower on my test. (test config: compute CRC32 of a 14 MB file, which is repeatedly re-read from SSD in RAID-0 - so it's in system file cache; Intel Core i7 2nd gen; 16GB RAM).
-
m.bemowski over 5 yearsIt is not a valid solution. From javadoc of InputStream.available:
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
-
niid about 4 yearsI had to import import org.apache.commons.io.IOUtils instead of the suggested import.
-
jwvh over 3 yearsWith a question this old (asked over 9 years ago), and with so many answers already submitted, it is helpful to point out how your new answer is different from the previous answers. (And including code that's been commented out just looks sloppy.)
-
Alistair McIntyre about 3 yearsyeah.. the other answers clearly show a byte array being returned. this is really not clear