How to read a file as a byte array in Scala

67,961

Solution 1

Java 7:

import java.nio.file.{Files, Paths}

val byteArray = Files.readAllBytes(Paths.get("/path/to/file"))

I believe this is the simplest way possible. Just leveraging existing tools here. NIO.2 is wonderful.

Solution 2

This should work (Scala 2.8):

val bis = new BufferedInputStream(new FileInputStream(fileName))
val bArray = Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte).toArray

Solution 3

The library scala.io.Source is problematic, DON'T USE IT in reading binary files.

The error can be reproduced as instructed here: https://github.com/liufengyun/scala-bug

In the file data.bin, it contains the hexidecimal 0xea, which is 11101010 in binary and should be converted to 234 in decimal.

The main.scala file contain two ways to read the file:

import scala.io._
import java.io._

object Main {
  def main(args: Array[String]) {
    val ss = Source.fromFile("data.bin")
    println("Scala:" + ss.next.toInt)
    ss.close

    val bis = new BufferedInputStream(new FileInputStream("data.bin"))
    println("Java:" + bis.read)
    bis.close
  }
}

When I run scala main.scala, the program outputs follows:

Scala:205
Java:234

The Java library generates correct output, while the Scala library not.

Solution 4

val is = new FileInputStream(fileName)
val cnt = is.available
val bytes = Array.ofDim[Byte](cnt)
is.read(bytes)
is.close()

Solution 5

You might also consider using scalax.io:

scalax.io.Resource.fromFile(fileName).byteArray
Share:
67,961

Related videos on Youtube

fgysin
Author by

fgysin

Updated on March 13, 2021

Comments

  • fgysin
    fgysin about 3 years

    I can find tons of examples but they seem to either rely mostly on Java libraries or just read characters/lines/etc.

    I just want to read in some file and get a byte array with scala libraries - can someone help me with that?

    • Philippe
      Philippe over 12 years
      I think relying on Java libraries is what (almost?) everyone would do, the Scala library included. See for instance the source code of scala.io.Source.
    • fgysin
      fgysin over 12 years
      I know Scala relies on Java. But what is the point of a language where I can not even do simple file i/o without using a different language?
    • Duncan McGregor
      Duncan McGregor over 12 years
      You're not using a different language, just a standard JVM API that has proved good enough not to need replacing!
    • fgysin
      fgysin over 12 years
      Hm yeah, you are probably right... Still, it feels like cheating. :)
    • Philippe
      Philippe over 12 years
      Well, how do you think the Java classes are implemented? Deep down, somewhere, there is a native method: it has just a signature, no Java implementation, and relies on an OS-specific C implementation. Isn't that cheating too? :)
    • Duncan McGregor
      Duncan McGregor over 12 years
      It should be said that Scala on .Net does make this a more pressing issue.
    • fgysin
      fgysin over 12 years
      @Duncan McGregor: Good point, guess the transition isn't as smooth there...
    • fgysin
      fgysin over 12 years
      @Philippe: Sure, and using C is only cheating on assembly :P... What I meant is just, that the border between languages is usually rather clearly defined, Scala and Java sort of melt into each other.
    • Suma
      Suma about 9 years
  • qu1j0t3
    qu1j0t3 over 11 years
    I think this is a great example of wrapping a Java API function to get Stream semantics. Much appreciated.
  • BeniBela
    BeniBela over 10 years
    val bis = new java.io.BufferedInputStream(new java.io.FileInputStream(fileName)); if you do not have the java paths imported
  • Max
    Max over 10 years
    Using this approach, is closing the file also needed or is it implicit?
  • fengyun liu
    fengyun liu over 10 years
    If I set the encoding to Source.fromFile("data.bin", "ISO8859-1"), it works well.
  • Tony K.
    Tony K. about 10 years
    You need to close it yourself
  • Dibbeke
    Dibbeke over 9 years
    This approach is slow, since it needs to process each and every byte. Ideally, I/O operations should be block-based.
  • akauppi
    akauppi over 9 years
    Noticed that the last actions on that repository are 6 years ago - is it still relevant?
  • fedesilva
    fedesilva over 8 years
    I think that anyone not bound to jvm < 7 should use this.
  • Benjamin
    Benjamin almost 7 years
    Maybe it's helpful, but really, this isn't an answer. Introducing a new problem in an answer is not constructive and belongs somewhere else.
  • morfizm
    morfizm over 6 years
    I benchmarked it comparing to buffered approach, it's about 500 times slower on my test. (test config: compute CRC32 of a 14 MB file, which is repeatedly re-read from SSD in RAID-0 - so it's in system file cache; Intel Core i7 2nd gen; 16GB RAM).
  • m.bemowski
    m.bemowski over 5 years
    It is not a valid solution. From javadoc of InputStream.available: Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
  • niid
    niid about 4 years
    I had to import import org.apache.commons.io.IOUtils instead of the suggested import.
  • jwvh
    jwvh over 3 years
    With a question this old (asked over 9 years ago), and with so many answers already submitted, it is helpful to point out how your new answer is different from the previous answers. (And including code that's been commented out just looks sloppy.)
  • Alistair McIntyre
    Alistair McIntyre about 3 years
    yeah.. the other answers clearly show a byte array being returned. this is really not clear