What is the difference between toString and mkString in scala?

28,737

Solution 1

Let's look at the types, shall we?

scala> import scala.io._
import scala.io._

scala> val foo = Source.fromFile("foo.txt")
foo: scala.io.BufferedSource = non-empty iterator

scala> 

Now, the variable that you have read the file foo.txt into is an iterator. If you perform toString() invocation on it, it doesn't return the contents of the file, rather the String representation of the iterator you've created. OTOH, mkString() reads the iterator(that is, iterates over it) and constructs a long String based on the values read from it.

For more info, look at this console session:

scala> foo.toString
res4: java.lang.String = non-empty iterator

scala> res4.foreach(print)
non-empty iterator
scala> foo.mkString
res6: String = 
"foo
bar
baz
quux
dooo
"

scala> 

Solution 2

The toString method is supposed to return the string representation of an object. It is often overridden to provide a meaningful representation. The mkString method is defined on collections and is a method which joins the elements of the collection with the provided string. For instance, try something like:

val a = List("a", "b", "c")
println(a.mkString(" : "))

and you will get "a : b : c" as the output. The mkString method has created a string from your collection by joining the elements of the collection with the string you provided. In the particular case you posted, the mkString call joined the elements returned by the BufferedSource iterator with the empty string (this is because you called mkString with no arguments). This results in simply concatenating all of the strings (yielded by the BufferedSource iterator) in the collection together.

On the other hand, calling toString here doesn't really make sense, as what you are getting (when you don't get an error) is the string representation of the BufferedSource iterator; which just tells you that the iterator is non-empty.

Solution 3

They're different methods in different classes. In this case, mkString is a method in the trait GenTraversableOnce. toString is defined on Any (and is very often overridden).

The easiest way (or at least the way I usually use) to find this out is to use the documentation at http://www.scala-lang.org/api/current/index.html. Start with the type of your variable:

val data = io.Source.fromFile("file.txt")

is of type

scala.io.BufferedSource

Go to the doc for BufferedSource, and look for mkString. In the doc for mkString (hit the down arrow over to the left) you'll see that it comes from

Definition Classes TraversableOnce → GenTraversableOnce

And do the same thing with toString.

Solution 4

I think the problem is to understand what Source class is doing. It seems from your code that you expect that Source.fromFile retrieves the content of a file when really what it does is to point to the start of a file.

This is typical when working with I/O operations where you have to open a "connection" with a resource (on this case a connection with your filesystem), read/write several times and then close that "connection". In your example you open a connection to a file and you have to read line per line the contents of the file until you reach the end. Think that when you read you are loading information in memory so it's not a good idea to load the whole file in memory in most of the scenarios (which mkString is going to do).

In the other hand mkString is made to iterate over all the elements of a collection, so in this case what is does is to read the file and load an Array[String] in memory. Be careful because if the file is big your code will fail, normally when working with I/O you should use a buffer to read some content, then process/save that content and then load more content (in the same buffer), avoiding problems with memory. For example reading 5 lines --> parse --> save parsed lines --> read next 5 lines --> etc.

You can also understand that "toString" retrieves you nothing... just tells you "you can read lines, the file is not empty".

Share:
28,737

Related videos on Youtube

alan
Author by

alan

Updated on July 27, 2020

Comments

  • alan
    alan almost 4 years

    I have a file that contains 10 lines - I want to retrieve it, and then split them with a newline("\n") delimiter.

    here's what I did

    val data = io.Source.fromFile("file.txt").toString;
    

    But this causes an error when I try to split the file on newlines.

    I then tried

    val data = io.Source.fromFile("file.txt").mkString;
    

    And it worked.

    What the heck? Can someone tell me what the difference between the two methods are?

    • James Moore
      James Moore about 12 years
      FYI, no one writes those semicolons at the end of the line.
    • Luigi Plinge
      Luigi Plinge about 12 years
      Did you have a problem locating the relevant docs? They tell you exactly what the difference is. First step is to locate them on your local filesystem and bookmark them in your browser.
    • Daniel C. Sobral
      Daniel C. Sobral about 12 years
      Truthfully, toString is a debugging method. It's true purpose is to make all objects printable, so that debugging messages/debuggers will be able to display something.
  • user unknown
    user unknown about 12 years
    "the string representation of an object" sounds a little as if there where an objective way, each object is represented. In fact, there is a method ".toString ()" defined in java.lang.Object, which is used if the class or an intermediate parent did not overwrite it. In contrast, mkString is only defined in few collection classes of Scala. And they don't produce the same result, as the question points out. mkString is defined in a useful way for data: scala.io.BufferedSource = non-empty iterator, toString () not that much.