How to edit the header of a huge CSV file in-place?

8,766

Solution 1

If you do not know the length of the header, head -n1 seems like a reasonable way to get the first line.

To write it in-place back to the head of the file, you can use dd:

head -n1 file.csv | ./do-some-processing | dd of=file.csv bs=1 conv=notrunc

the conv=notrunc is critical to leave the rest of the file intact, and bs=1 is to stop on byte boundary.

Solution 2

I would suggest sed for this, you can specify to only make the substitution on the first line such as 1s/foo/bar/:

$ cat file
col1,col2,col3
1,2,3
3,2,1
...

$ sed -e '1s/col1/tmp/' -e '1s/col3/col1/'  -e '1s/tmp/col3/' file
col3,col2,col1
1,2,3
3,2,1
...

Use -i to store the change back to the file:

$ sed -i -e '1s/col1/tmp/' -e '1s/col3/col1/'  -e '1s/tmp/col3/' file
Share:
8,766

Related videos on Youtube

sds
Author by

sds

Math, Data Science, History...

Updated on September 18, 2022

Comments

  • sds
    sds over 1 year

    I have several huge CSV files in which I want to swap two column names.

    I do not want to modify/copy/rewrite the data.

    The operation is very cheap in C: fopen the file, fgets the header, fseek or rewind, manipulate the header (preserving its length), fputs the new header, fclose the file.

    This can also be done in ANSI Common Lisp (CLISP, SBCL or GCL):

     (with-open-file (csv "foo.csv" :direction :io
                          :if-exists :overwrite)
       (let ((header (read-line csv)))
         (print header)
         (file-position csv 0)
         (write-line (string-upcase header) csv)
         (file-position csv 0)
         (read-line csv)))
    

    and takes a fraction of a second (sed takes a few minutes because it reads and re-writes the whole file even it you tell it to modify just the first line, ignoring the crucial information that the size of the header did not change).

    How do I do that with the "standard unix tools" (e.g., perl)?

  • sds
    sds over 11 years
    this copies the data, i.e., the time spent is proportional to the data size.
  • sds
    sds over 11 years
    This does modify the file as required, but this re-writes the data, i.e., time shows that the sed command takes about the same time (few minutes) as cp.
  • Julian Knight
    Julian Knight almost 10 years
    So why mark this down and accept the same answer above?
  • sds
    sds almost 10 years
    because the answer above is constant in time and this answer is not
  • b0fh
    b0fh over 9 years
    In all fairness, you didn't explain how to merge the files together. Using dd was the non-trivial insight.
  • Pavel Berdnikov
    Pavel Berdnikov about 9 years
    Nice, I never knew about notrunc. But note that ./do-some-processing must preserve the length of the header (as specified by the OP.) Just a warning for tl;dr folks (like me :)
  • sds
    sds almost 7 years
    The idea was to use a batch, not an interactive, process.