String parsing in Haskell
Solution 1
Since String
s are simply lists of Char
s in Haskell, Data.List would be a good place to start looking (in the interest of learning Haskell).
For more complex cases (where commas may be nested inside quotes and should be ignored, for example), parsec (as Daniel mentioned) would be a better solution.
Also, if you're looking to parse CSVs you may try Text.CSV, though I've not tried it, so I can't say how helpful it'll be.
Solution 2
I finally decided to roll my own parsing functions since this is such a simple situation. I have learned a lot about Haskell since I first posted this question and want to document my solution here:
split :: Char -> String -> [String]
split _ "" = []
split c s = firstWord : (split c rest)
where firstWord = takeWhile (/=c) s
rest = drop (length firstWord + 1) s
removeChar :: Char -> String -> String
removeChar _ [] = []
removeChar ch (c:cs)
| c == ch = removeChar ch cs
| otherwise = c:(removeChar ch cs)
main = do
handle <- openFile "input/names.txt" ReadMode
contents <- hGetContents handle
let names = sort (map (removeChar '"') (split ',' contents))
print names
hClose handle
Solution 3
The most powerful solution is a parser combinator. Haskell has several of these, but the foremost that come to my mind are:
- parsec: a very good general-purpose parsing library
- attoparsec: a faster version of parsec, which sacrifices the quality of error messages and some other features for extra speed
- uu-parsinglib: a very powerful parsing library
The big advantage of parser combinators is that it is very easy to define parsers using do
notation (or Applicative
style, if you prefer).
If you just want some quick and simple string manipulation capabilities, then consult the text
library (for high-performance byte-encoded strings), or Data.List
(for ordinary list-encoded strings), which provide the necessary functions to manipulate strings.
Solution 4
Here's a particularly cheeky way to proceed:
parseCommaSepQuotedWords :: String -> [String]
parseCommaSepQuotedWords s = read ("[" ++ s ++ "]")
This might work but it's very fragile and rather silly. Essentially you are using the fact that the Haskell way of writing lists of strings almost coincides with your way, and hence the built-in Read
instance is almost the thing you want. You could use reads
for better error-reporting but in reality you probably want to do something else entirely.
In general, parsec
is really worth taking a look at - it's a joy to use, and one of the things that originally really got me excited about Haskell. But if you want a homegrown solution, I often write simple things using case
statements on the result of span
and break
. Suppose you are looking for the next semicolon in the input. Then break (== ';') inp
will return (before, after)
, where:
-
before
is the content ofinp
up to (and not including) the first semicolon (or all of it if there is none) -
after
is the rest of the string:- if
after
is not empty, the first element is a semicolon - regardless of what else happens,
before ++ after == inp
- if
So to parse a list of statements separated by semicolons, I might do this:
parseStmts :: String -> Maybe [Stmt]
parseStmts inp = case break (== ';') inp of
(before, _ : after) -> -- ...
-- ^ before is the first statement
-- ^ ignore the semicolon
-- ^ after is the rest of the string
(_, []) -> -- inp doesn't contain any semicolons
Solution 5
In the interest of having a complete answer for those who happen upon this question, Data.Text has some good functions as well.
Code-Apprentice
I primarily program in C++ and Java. Recently I started learning Haskell. My current mathematical interests are group theory, graph theory, category theory, and type theory. I also enjoy playing chess and Go. My Amazon wishlist
Updated on June 04, 2022Comments
-
Code-Apprentice almost 2 years
I am very new to Haskell and am currently trying to solve a problem that requires some string parsing. My input String contains a comma-delimited list of words in quotes. I want to parse this single string into a list of the words as Strings. Where should I start learning about parsing such a String? Is there a partuclar module and/or functions that will be helpful?
p.s. Please don't post a full solution. I am just asking for a pointer to a starting place so I can learn how to do it.
-
Ben Millwood almost 12 yearsWhen I was a noob I could not make heads nor tails of uu-parsinglib. I haven't tried it since then, but I wouldn't exactly call it friendly.
-
Richard Careaga over 8 yearsThis link is now at therning.org/magnus/posts/…; see wiki.haskell.org/Parsec Sec. 5.2 for other links in the series and additional resources