How to edit the header of a huge CSV file in-place?
Solution 1
If you do not know the length of the header, head -n1
seems like a reasonable way to get the first line.
To write it in-place back to the head of the file, you can use dd:
head -n1 file.csv | ./do-some-processing | dd of=file.csv bs=1 conv=notrunc
the conv=notrunc
is critical to leave the rest of the file intact, and bs=1
is to stop on byte boundary.
Solution 2
I would suggest sed
for this, you can specify to only make the substitution on the first line such as 1s/foo/bar/
:
$ cat file
col1,col2,col3
1,2,3
3,2,1
...
$ sed -e '1s/col1/tmp/' -e '1s/col3/col1/' -e '1s/tmp/col3/' file
col3,col2,col1
1,2,3
3,2,1
...
Use -i
to store the change back to the file:
$ sed -i -e '1s/col1/tmp/' -e '1s/col3/col1/' -e '1s/tmp/col3/' file
Related videos on Youtube
Comments
-
sds over 1 year
I have several huge CSV files in which I want to swap two column names.
I do not want to modify/copy/rewrite the data.
The operation is very cheap in
C
:fopen
the file,fgets
the header,fseek
orrewind
, manipulate the header (preserving its length),fputs
the new header,fclose
the file.This can also be done in
ANSI Common Lisp
(CLISP, SBCL or GCL):(with-open-file (csv "foo.csv" :direction :io :if-exists :overwrite) (let ((header (read-line csv))) (print header) (file-position csv 0) (write-line (string-upcase header) csv) (file-position csv 0) (read-line csv)))
and takes a fraction of a second (
sed
takes a few minutes because it reads and re-writes the whole file even it you tell it to modify just the first line, ignoring the crucial information that the size of the header did not change).How do I do that with the "standard unix tools" (e.g.,
perl
)? -
sds over 11 yearsthis copies the data, i.e., the time spent is proportional to the data size.
-
sds over 11 yearsThis does modify the file as required, but this re-writes the data, i.e.,
time
shows that thesed
command takes about the same time (few minutes) ascp
. -
Julian Knight almost 10 yearsSo why mark this down and accept the same answer above?
-
sds almost 10 yearsbecause the answer above is constant in time and this answer is not
-
b0fh over 9 yearsIn all fairness, you didn't explain how to merge the files together. Using
dd
was the non-trivial insight. -
Pavel Berdnikov about 9 yearsNice, I never knew about notrunc. But note that ./do-some-processing must preserve the length of the header (as specified by the OP.) Just a warning for tl;dr folks (like me :)
-
sds almost 7 yearsThe idea was to use a batch, not an interactive, process.