How do I write effectively to a file in F#?

f#
11,816

Solution 1

You observed a quadratic behaviour O(n^2) on manipulating sequences. When you call Seq.skip, a brand new sequence will be created, so you implicitly traverse the remaining part. More detailed explanation could be found at https://stackoverflow.com/a/1306267.

In this example, you don't need to decompose sequences. Replacing your inner function by:

let internalWriter (seq: string seq) (sw:StreamWriter) i (endTag:string) =
    for node in Seq.take i seq do
        sw.WriteLine(node)
    sw.WriteLine(endTag)

I can write 10000 rows in fraction of a second.

You can refactor further by remove this inner function and copy its body to the parent function.

As the link above mentioned, if you ever need decomposing sequences, LazyList should be better to use.

Solution 2

pad in his answer has pointed to the cause of slowdown. Another idiomatic approach might be instead of infinite sequence generating sequence of needed length with Seq.unfold, which makes the code really trivial:

let xmlSeq n = Seq.unfold (fun i ->
    if i = 0 then None
    else Some((sprintf "<author><name>name%d</name><age>%d</age><books><book>book%d</book></books></author>" i i i), i - 1)) n

let createFile seqLen fileName =
    use streamWriter = new StreamWriter("C:\\tmp\\FSharpXmlTest\\" + fileName, false)
    streamWriter.WriteLine("<startTag>")
    seqLen |> xmlSeq |> Seq.iter streamWriter.WriteLine
    streamWriter.WriteLine("</startTag>")

(funcTimer (fun () -> createFile  10000 "file10000.xml"))

Generating 10000 elements takes around 500ms on my laptop.

Share:
11,816
Tomas Jansson
Author by

Tomas Jansson

C#, .NET, ASP.NET, ASP.NET MVC, ADO.NET...

Updated on June 04, 2022

Comments

  • Tomas Jansson
    Tomas Jansson almost 2 years

    I want to generate large xml files for testing purpose but the code I ended up with is really slow, the time grows exponentially with the number of rows I write to the file. Th example below shows that it takes milliseconds to write 100 rows, but it takes over 20 seconds to write 1000 rows (on my machine). I really can't figured out what is making this slow, since I think that writing 1000 rows shouldn't take that long. Also, writing 200 rows takes about 4 times as long as writing 100 rows which is not good. To run the code you might want to change the path for the StreamWriter.

    open System.IO
    open System.Diagnostics
    
    let xmlSeq = Seq.initInfinite (fun index -> sprintf "<author><name>name%d</name><age>%d</age><books><book>book%d</book></books></author>" index index index)
    
    let createFile (seq: string seq) numberToTake fileName =
        use streamWriter = new StreamWriter("C:\\tmp\\FSharpXmlTest\\FSharpXmlTest\\" + fileName, false)
        streamWriter.WriteLine("<startTag>")
        let rec internalWriter (seq: string seq) (sw:StreamWriter) i (endTag:string) =
            match i with
            | 0 -> (sw.WriteLine(Seq.head seq);
                sw.WriteLine(endTag))
            | _ -> (sw.WriteLine(Seq.head seq);
                internalWriter (Seq.skip 1 seq) sw (i-1) endTag)
        internalWriter seq streamWriter numberToTake "</startTag>"
    
    let funcTimer fn =
        let stopWatch = Stopwatch.StartNew()
        printfn "Timing started"
        fn()
        stopWatch.Stop()
        printfn "Time elased: %A" stopWatch.Elapsed
    
    
    (funcTimer (fun () -> createFile xmlSeq 100 "file100.xml"))
    (funcTimer (fun () -> createFile xmlSeq 1000 "file1000.xml"))
    
  • Tomas Jansson
    Tomas Jansson over 10 years
    Thank you for clarifying that I was creating a new sequence each time. However, I should replace the inner function, I could just remove the inner function and add the for loop to the outer function.
  • Tomas Jansson
    Tomas Jansson over 10 years
    Thank you for filling in other useful information. The other answers is more targeted my question but you presented some really useful extra information.