How to parse a CSV file in Bash?
Solution 1
You need to use IFS
instead of -d
:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool
and csvkit
.
Solution 2
From the man
page:
-d delim The first character of delim is used to terminate the input line, rather than newline.
You are using -d,
which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
Solution 3
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash
, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs
parser ready to use in examples/loadables
directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv
as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[@]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable
, this may not work...
Complete sample with multiline CSV fields.
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[@]}"
while read -ru $FD line;do
while csv -a row "$line" ; ((${#row[@]}<${#headline[@]})) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[@]}"
done
This may render: (I've used printf "%q"
to represent non-printables characters like newlines as $'\n'
)
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or csvsample.sh.
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
Solution 4
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk
parses the string fields to variables and tr
removes the quote.
Slightly slower as awk
is executed for each field.
Solution 5
In addition to the answer from @Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
Sousou
Updated on February 18, 2022Comments
-
Sousou about 2 years
I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line do read -d, col1 col2 < <(echo $line) echo "I got:$col1|$col2" done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is
b
. Why?-
BeemerGuy over 13 yearshave you considered
awk
to use$1
,$2
, etc? -
tokland over 13 yearsas a sidenote: command < <(echo "string") ---> command <<< "string"
-
Jay over 7 yearsThe 'cut' command line program was designed for that: ss64.com/bash/cut.html
-
tripleee over 3 yearsPossible duplicate of stackoverflow.com/questions/36287982/…
-
tripleee almost 3 yearsYou want to lose the useless use of
cat
-
Jatin Chauhan about 2 yearsI’ll suggest awk if that helps
-
-
peak over 8 yearsThe proposed solution is fine for very simple CSV files, that is, if the headers and values are free of commas and embedded quotation marks. It is actually quite tricky to write a generic CSV parser (especially since there are several CSV "standards"). One approach to making CSV files more amenable to *nix tools is to convert them to TSV (tab-separated values), e.g. using Excel.
-
Zsolt over 7 yearsIt is interesting that I cannot do mkdir in the body. I'm getting
command not found
. Only theecho
works. -
SourceSeeker over 7 years@Zsolt: There's no reason that should be the case. You must have a typo or a stray non-printing character.
-
Zsolt over 7 yearsI figured it out. I called one of the variables PATH. Rookie mistake
-
SourceSeeker over 7 years@Zsolt I recommend always using lowercase or mixed case variable names for that very reason.
-
thomas.mc.work over 7 years@DennisWilliamson You should enclose the seperator e.g. when using
;
:while IFS=";" read col1 col2; do ...
-
SourceSeeker over 7 years@thomas.mc.work: That's true in the case of semicolons and other characters that are special to the shell. In the case of a comma, it's not necessary and I tend to prefer to omit characters that are unnecessary. For example, you could always specify variables for expansion using curly braces (e.g.
${var}
), but I omit them when they're not necessary. To me, it looks cleaner. -
Michal aka Miki over 7 yearsI extended the case for the long row here stackoverflow.com/q/40331348/54964
-
pkarc about 5 yearsGood, you can also use coma (,)
-
tripleee over 3 yearsProcessing a line at a time with Awk is a gross antipattern.
awk -F'|' '{ gsub(/"/, ""); print $1, $2 }' "$csvFile"
-
F. Hauri - Give Up GitHub over 2 years
-
F. Hauri - Give Up GitHub over 2 years@DennisWilliamson, From some time, bash source tree offer a loadable builtin csv parser! Have a look at my answer! Of course there are some limitations...