How can I parse the IO String in Haskell?

string parsing haskell io monads

36,340

Solution 1

I think you have a fundamental misunderstanding about IO in Haskell. Particularly, you say this:

Maybe there is a function which can convert from 'IO String' to [Char]?

No, there isn't¹, and the fact that there is no such function is one of the most important things about Haskell.

Haskell is a very principled language. It tries to maintain a distinction between "pure" functions (which don't have any side-effects, and always return the same result when give the same input) and "impure" functions (which have side effects like reading from files, printing to the screen, writing to disk etc). The rules are:

You can use a pure function anywhere (in other pure functions, or in impure functions)
You can only use impure functions inside other impure functions.

The way that code is marked as pure or impure is using the type system. When you see a function signature like

digitToInt :: String -> Int

you know that this function is pure. If you give it a String it will return an Int and moreover it will always return the same Int if you give it the same String. On the other hand, a function signature like

getLine :: IO String

is impure, because the return type of String is marked with IO. Obviously getLine (which reads a line of user input) will not always return the same String, because it depends on what the user types in. You can't use this function in pure code, because adding even the smallest bit of impurity will pollute the pure code. Once you go IO you can never go back.

You can think of IO as a wrapper. When you see a particular type, for example, x :: IO String, you should interpret that to mean "x is an action that, when performed, does some arbitrary I/O and then returns something of type String" (note that in Haskell, String and [Char] are exactly the same thing).

So how do you ever get access to the values from an IO action? Fortunately, the type of the function main is IO () (it's an action that does some I/O and returns (), which is the same as returning nothing). So you can always use your IO functions inside main. When you execute a Haskell program, what you are doing is running the main function, which causes all the I/O in the program definition to actually be executed - for example, you can read and write from files, ask the user for input, write to stdout etc etc.

You can think of structuring a Haskell program like this:

All code that does I/O gets the IO tag (basically, you put it in a do block)
Code that doesn't need to perform I/O doesn't need to be in a do block - these are the "pure" functions.
Your main function sequences together the I/O actions you've defined in an order that makes the program do what you want it to do (interspersed with the pure functions wherever you like).
When you run main, you cause all of those I/O actions to be executed.

So, given all that, how do you write your program? Well, the function

readFile :: FilePath -> IO String

reads a file as a String. So we can use that to get the contents of the file. The function

lines:: String -> [String]

splits a String on newlines, so now you have a list of Strings, each corresponding to one line of the file. The function

init :: [a] -> [a]

Drops the last element from a list (this will get rid of the final . on each line). The function

read :: (Read a) => String -> a

takes a String and turns it into an arbitrary Haskell data type, such as Int or Bool. Combining these functions sensibly will give you your program.

Note that the only time you actually need to do any I/O is when you are reading the file. Therefore that is the only part of the program that needs to use the IO tag. The rest of the program can be written "purely".

It sounds like what you need is the article The IO Monad For People Who Simply Don't Care, which should explain a lot of your questions. Don't be scared by the term "monad" - you don't need to understand what a monad is to write Haskell programs (notice that this paragraph is the only one in my answer that uses the word "monad", although admittedly I have used it four times now...)

Here's the program that (I think) you want to write

run :: IO (Int, Int, [(Int,Int,Int)])
run = do
  contents <- readFile "text.txt"   -- use '<-' here so that 'contents' is a String
  let [a,b,c] = lines contents      -- split on newlines
  let firstLine  = read (init a)    -- 'init' drops the trailing period
  let secondLine = read (init b)    
  let thirdLine  = read (init c)    -- this reads a list of Int-tuples
  return (firstLine, secondLine, thirdLine)

To answer npfedwards comment about applying lines to the output of readFile text.txt, you need to realize that readFile text.txt gives you an IO String, and it's only when you bind it to a variable (using contents <-) that you get access to the underlying String, so that you can apply lines to it.

Remember: once you go IO, you never go back.

¹ I am deliberately ignoring unsafePerformIO because, as implied by the name, it is very unsafe! Don't ever use it unless you really know what you are doing.

Solution 2

As a programming noob, I too was confused by IOs. Just remember that if you go IO you never come out. Chris wrote a great explanation on why. I just thought it might help to give some examples on how to use IO String in a monad. I'll use getLine which reads user input and returns an IO String.

line <- getLine

All this does is bind the user input from getLine to a value named line. If you type this this in ghci, and type :type line it will return:

:type line
line :: String

But wait! getLine returns an IO String

:type getLine
getLine :: IO String

So what happened to the IOness from getLine? <- is what happened. <- is your IO friend. It allows you to bring out the value that is tainted by the IO within a monad and use it with your normal functions. Monads are easily identified because they begin with do. Like so:

main = do
    putStrLn "How much do you love Haskell?"
    amount <- getLine
    putStrln ("You love Haskell this much: " ++ amount)

If you're like me, you'll soon discover that liftIO is your next best monad friend, and that $ help reduce the number of parenthesis you need to write.

So how do you get the information from readFile? Well if readFile's output is IO String like so:

:type readFile
readFile :: FilePath -> IO String

Then all you need is your friendly <-:

 yourdata <- readFile "samplefile.txt"

Now if type that in ghci and check the type of yourdata you'll notice it's a simple String.

:type yourdata
text :: String

Solution 3

As people already say, if you have two functions, one is readStringFromFile :: FilePath -> IO String, and another is doTheRightThingWithString :: String -> Something, then you don't really need to escape a string from IO, since you can combine this two functions in various ways:

With fmap for IO (IO is Functor):

fmap doTheRightThingWithString readStringFromFile

With (<$>) for IO (IO is Applicative and (<$>) == fmap):

import Control.Applicative

...

doTheRightThingWithString <$> readStringFromFile

With liftM for IO (liftM == fmap):

import Control.Monad

...

liftM doTheRightThingWithString readStringFromFile

With (>>=) for IO (IO is Monad, fmap == (<$>) == liftM == \f m -> m >>= return . f):

readStringFromFile >>= \string -> return (doTheRightThingWithString string)
readStringFromFile >>= \string -> return $ doTheRightThingWithString string
readStringFromFile >>= return . doTheRightThingWithString
return . doTheRightThingWithString =<< readStringFromFile

With do notation:

do
  ...
  string <- readStringFromFile
  -- ^ you escape String from IO but only inside this do-block
  let result = doTheRightThingWithString string
  ...
  return result

Every time you will get IO Something.

Why you would want to do it like that? Well, with this you will have pure and referentially transparent programs (functions) in your language. This means that every function which type is IO-free is pure and referentially transparent, so that for the same arguments it will returns the same values. For example, doTheRightThingWithString would return the same Something for the same String. However readStringFromFile which is not IO-free can return different strings every time (because file can change), so that you can't escape such unpure value from IO.

Solution 4

If you have a parser of this type:

myParser :: String -> Foo

and you read the file using

readFile "thisfile.txt"

then you can read and parse the file using

fmap myParser (readFile "thisfile.txt")

The result of that will have type IO Foo.

The fmap means myParser runs "inside" the IO.

Another way to think of it is that whereas myParser :: String -> Foo, fmap myParser :: IO String -> IO Foo.

View more solutions

36,340

Author by

Simon

Updated on December 21, 2020

Comments

Simon over 3 years
I' ve got a problem with Haskell. I have text file looking like this:
```
5.
7. 
[(1,2,3),(4,5,6),(7,8,9),(10,11,12)].
```
I haven't any idea how can I get the first 2 numbers (2 and 7 above) and the list from the last line. There are dots on the end of each line.

I tried to build a parser, but function called 'readFile' return the Monad called IO String. I don't know how can I get information from that type of string.

I prefer work on a array of chars. Maybe there is a function which can convert from 'IO String' to [Char]?