How can I read a CSV file and put its content in a Map in Scala?

18,480

Solution 1

This worked for me:

import scala.io.Source
Source.fromFile("some_very_big_file").getLines.map(_.split(";")).count(_ => true)

The split Breaks up each line of the CSV file in simple records. The count is only there to check if the file is really read.

So now we can use this to read in a real CSV file (although I only tested it with a small file):

scala> val content=Source.fromFile("test.csv").getLines.map(_.split(";"))
content: Iterator[Array[java.lang.String]] = non-empty iterator

scala> val header=content.next
header: Array[java.lang.String] = Array(Elements, Duration)

scala> content.map(header.zip(_).toMap)
res40: Iterator[scala.collection.immutable.Map[java.lang.String,java.lang.String]] = non-empty iterator

This works quite well with simple CSV files. If you have more complex ones (e.g. entries spilt over several lines), you might have to use a more complex CSV parser (e.g. Apache Commons CSV. But usually sucha aperser will also give you some kind of iterator and you can use the same map(... zip ...) function on it.

Solution 2

You could skip the intermediary List of tuple and just build the map directly like this:

val result: Map[String, Array[String]] = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail))(collection.breakOut)

Not sure if this will fix your issue though, but you did ask if there was another way to build the map. You can read more about collection.breakOut here:

Scala: List[Tuple3] to Map[String,String]

Share:
18,480
bam098
Author by

bam098

Updated on July 11, 2022

Comments

  • bam098
    bam098 almost 2 years

    I have a CSV file, which contains a data matrix. The first column of this matrix contains a label and the other columns contain values, which are associated to the label (i.e. to the first column). Now I want to read this CSV file and put the data into a Map[String,Array[String]] in Scala. The key of the Map should be the label (this in the first column) and the Map values should be the other values (these one in the rest of the columns). To read the CSV file I use opencsv.

    val isr: InputStreamReader = new InputStreamReader(getClass.getResourceAsStream("test.csv"))`  
    val data: IndexedSeq[Array[String]] = new CSVReader(isr).readAll.asScala.toIndexedSeq`
    

    Now I have all data in an IndexedSeq[Array[String]]. Can I use this functional way here or should I better chose an iterative way, because it can get complex to read all data at once? Well, now I need to create the Map from this IndexedSeq. Therefor I map the IndexedSeq to an IndexedSeq of Tupel[String,Array[String]] to seperate the label value from the rest of the values and then I create the Map from this.

    val result: Map[String, Array(String) = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail)).toMap
    

    This works for small examples but when I use it to read the content of my CSV file it throws a java.lang.RuntimeException. I also tried to create the map with a groupBy or to create several Maps (one for each line) and to reduce them afterwards to one big Map, but without success. I also read another post on stackoverflow and somebody assumes that toMap has a complexity of O(n²). I got this at the end of my StackTrace (whole Stacktrace is quite long).

    Exception in thread "main" java.lang.reflect.InvocationTargetException      
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.runSingleTest(JavaSpecs2Runner.java:130)  
        at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.main(JavaSpecs2Runner.java:76)  
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
        at java.lang.reflect.Method.invoke(Method.java:601)  
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)  
        Caused by: java.lang.RuntimeException: can not create specification: com.test.MyClassSpec  
        at scala.sys.package$.error(package.scala:27)  
        at org.specs2.specification.SpecificationStructure$.createSpecification(BaseSpecification.scala:96)   
        at org.specs2.runner.ClassRunner.createSpecification(ClassRunner.scala:64)  
        at org.specs2.runner.ClassRunner.start(ClassRunner.scala:35)  
        at org.specs2.runner.ClassRunner.main(ClassRunner.scala:28)  
        at org.specs2.runner.NotifierRunner.main(NotifierRunner.scala:24)  
        ... 11 more  
        Process finished with exit code 1
    

    Does anybody know another way to create a Map from the data in a CSV file?