How to read a csv file one line at a time and replace/edit certain lines as you go?

20,591

Solution 1

The process would be something like this:

  1. Open a StreamWriter to a temporary file.
  2. Open a StreamReader to the target file.
  3. For each line:
    1. Split the text into columns based on a delimiter.
    2. Check the columns for the values you want to replace, and replace them.
    3. Join the column values back together using your delimiter.
    4. Write the line to the temporary file.
  4. When you are finished, delete the target file, and move the temporary file to the target file path.

Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.


Basic example just using StreamReader and StreamWriter:

var sourcePath = @"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;

var splitExpression = new Regex(@"(" + delimiter + @")(?=(?:[^""]|""[^""]*"")*$)");

using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
    string line = null;
    string[] headers = null;
    if (firstLineContainsHeaders)
    {
        line = reader.ReadLine();
        lineNumber++;

        if (string.IsNullOrEmpty(line)) return; // file is empty;

        headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

        writer.WriteLine(line); // write the original header to the temp file.
    }

    while ((line = reader.ReadLine()) != null)
    {
        lineNumber++;

        var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

        // if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
        if (headers == null) headers = new string[columns.Length];

        if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));

        // TODO: search and replace in columns
        // example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", @"\/");

        writer.WriteLine(string.Join(delimiter, columns));
    }

}

File.Delete(sourcePath);
File.Move(tempPath, sourcePath);

Solution 2

memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files. read here http://msdn.microsoft.com/en-us/library/dd997372.aspx or google Memory-mapped files

Solution 3

Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.

using (var sr = new StreamReader(new FileStream(@"C:\temp\file.csv", FileMode.Open)))
        {
            var line = sr.ReadLine();
            while (!sr.EndOfStream)
            {
                // do stuff

                line = sr.ReadLine();
            }

        }
Share:
20,591
richard
Author by

richard

I love this site. I am here to learn, and try to contribute back as much as possible. As an aside, I can't thank everyone on this site enough for their answers. This site (and it's sister sites through SE) are INVALUABLE.

Updated on January 03, 2020

Comments

  • richard
    richard over 4 years

    I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.

    How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?

  • richard
    richard over 11 years
    I like it! I will check it out. Thanks.
  • richard
    richard over 11 years
    That's definitely the simple and most straight forward way to go.
  • moribvndvs
    moribvndvs over 11 years
    One thing, I didn't think of the size. The final File.Move will probably be very slow. So instead, you might just create the temp file in the same folder as the source file, then delete the source and just rename the temp (rather than use GetTempFileName and File.Move).
  • richard
    richard over 11 years
    This is what I ended up doing. It was fast and great. Thanks!