Replacing the values in one file with the values in another file in bash
Solution 1
You can use awk
for this:
awk -F',|, ' 'NR==FNR{a[$2]=$3} NR>FNR{$8=a[$8];print}' OFS=',' "$file2" "$file1"
This reads model-list.csv, storing all the models and their descriptions into an string-indexed array (eg a["Z800"] == "HP Z800 Workstation"
). Then it reads through the list data, replacing each model with the description string from the array.
Explanation:
-
-F',|, '
- this sets the field separator using a regex pattern, in this case the field separator will be either a single comma, or a single comma and a single space. -
NR==FNR{a[$2]=$3}
- NR is an awk internal variable which keeps track of the total number of rows read since the program began. FNR is similar, but keeps track of the number of rows of the current file which have been read. SoNR==FNR
is an awk idiom which means "if this is the first file to be read", and the associated action isa[$2]=$3
which saves the value of field 3 in the arraya
, with the string index being set to the value of field 2. -
NR>FNR{$8=a[$8];print}'
- similar to the previous, but this time operates only on files other than the first to be read. For each line, we use the value of field 8 as the index to look up the value in the array, then re-assign field 8 to the array value. Finally, the whole line is printed . -
OFS=',' "$file2" "$file1"
- sets the output field separator to a comma (default is space), then reads in 2 files in the specified order.
Solution 2
Some notes:
- Bash is a terrible language for database emulation. Are you sure you cannot use a relational database for this?
- Avoid useless uses of
cat
. You can dogrep ",$ModelNo," "$file1"
. - You can do
while IFS=, read -r _ ModelNo ModelName _
to avoid theawk
lines. - In Bash you can do
my_command <<< "$variable"
instead ofecho "$variable" | my_command
. - You should use
$(my_command)
instead of`my_command`
for readability. -
grep -F
will search for literal strings. - You can check the exit code of
grep
to see if it found anything. That should be faster than checking the file size.
Solution 3
In bash, assuming a bash version >= 4, you can do this very easily using associative arrays:
#!/usr/bin/env bash
## declare models as an associative array
declare -A models
## read the 1st file, load the Value => Model pair
## pairs into the models array. Note that I'm setting bash's
## Input Field Separator ($IFS) to comma (,) and that I first pass
## the file through sed to remove the spaces after the commas.
## For more on why I'm using <() instead of a pipe, see
## http://stackoverflow.com/q/9985076/1081936
while IFS=, read -r man val mod;
do
models["$val"]="$mod"
done < <(sed 's/, /,/g' "$1")
## Read the second file. I am defining 9 variables, 8 for
## the first 8 fields, up to the model and $rest for the rest of
## the fields, up to the end of the line.
while IFS=',' read -r loc ip host dom dnam user manu model rest;
do
printf "%s,%s,%s,%s,%s,%s,%s,%s,%s\n" "$loc" "$ip" "$host" "$dom" \
"$dnam" "$user" "$manu" "${models[$model]}" "$rest";
done < <(sed 's/, /,/g' "$2")
Caveats:
-
This will fail on 1st line of the specific
List.csv
you posted becausemodel-list.csv
hasModel Name
whereList.csv
hasModel
. This means tat there will be no match for${models[$model]}
on the first line. You can fix this either by editing the header of one of the files so that the field names are identical or by using this version instead:#!/usr/bin/env bash declare -A models while IFS=, read -r man val mod; do models["$val"]="$mod" done < <(sed 's/, /,/g' "$1") ## Set up a counter to hold the line numbers c=0; while IFS=',' read -r loc ip host dom dnam user manu model rest; do ## Increment the line number (( c++ )); ## If this is the 1st line, print if [ "$c" -eq "1" ]; then printf "%s,%s,%s,%s,%s,%s,%s,%s,%s\n" "$loc" "$ip" "$host" "$dom" \ "$dnam" "$user" "$manu" "$model" "$rest"; else printf "%s,%s,%s,%s,%s,%s,%s,%s,%s\n" "$loc" "$ip" "$host" "$dom" \ "$dnam" "$user" "$manu" "${models[$model]}" "$rest"; fi done < <(sed 's/, /,/g' "$2")
This assumes that your file is as simple as you show, that all fields are defined by commas and that no fields can contain commas.
In Perl this could of course be done much more simply:
perl -F',\s*' -lane '$k{$F[1]}=$F[2]; next if $#F < 4; s/$F[7]/$k{$F[7]}/; print' model-list.csv List.csv
Explanation
-
-F
sets the field delimiter (here a,
followed by 0 or more whitespace characters) which is used with-a
which automatically splits each input line into the@F
array. -
-l
turns on automatic removal of the\n
at the end of each line and also adds an implicit\n
to eachprint
statement. -
-n
means read the input file line by line and apply whatever script was passed with-e
to it. -
$k{$F[1]}=$F[2]
: this populates the has%k
where the 2nd field of each line is the key and the value is the 3nd field. This is only relevant for themodel-list.csv
but will also be run forList.csv
. This can safely be ignored as long asList.csv
will never contain an 8th field that is also present as a 2nd field inmodel-list.csv
-
next if $#F < 4
: read the next line if this one has less than 4 fields. This is so as the finalprint
does not print the lines ofmodel-list.csv
-
s/$F[7]/$k{$F[7]}/; print
: substitute the 8th field of the current line with whatever is stored in the%k
hash for that field and print the line.
Related videos on Youtube
Mandar Shinde
Updated on September 18, 2022Comments
-
Mandar Shinde over 1 year
I am having a csv file naming
List.csv
in following format:Location,IP Address,Host Name,Domain,Domain Name, User Name,Manufacturer,Model,System Type, Serial Number, Operating System,RAM (GB),Processor Type,Processor Frequency H1,xx.xx.xx.xx,PC1,domain.com,DOMAIN,User1,LENOVO,4089AZ8,X86-based PC,L90RA96,Microsoft Windows 7 Professional ,2,Pentium(R) Dual-Core CPU E5800,3.20GHz H3,xx.xx.xx.xx,PC2,domain.com,DOMAIN,User2,LENOVO,4089AZ8,X86-based PC,L906W3P,Microsoft Windows 7 Professional ,2,Pentium(R) Dual-Core CPU E5800,3.20GHz H2,xx.xx.xx.xx,PC3,domain.com,DOMAIN,User3,LENOVO,4089A76,X86-based PC,L929410,Microsoft Windows 7 Professional ,2,Pentium(R) Dual-Core CPU E5400,2.70GHz H2,xx.xx.xx.xx,PC4,domain.com,DOMAIN,User4,Hewlett-Packard,Z800,x64-based PC,SGH007QT16,Microsoft Windows 7 Professional ,12,Intel(R) Xeon(R) CPU W5590,3.33GHz
If you look at the
MODEL
column, it is bearing some values which do not interpret the name of the model. I have created another file,model-list.csv
, which contains these values and their corresponding model names. It looks something like :Manufacturer,Value,Model Name Lenovo, 4089AZ8, ThinkCentre Lenovo, 4089A76, ThinkCentre HP, Z800, HP Z800 Workstation
I want the values in the
List.csv
file to be replaced by the corresponding model name present inmodel-list.csv
. As there are 2900+ items in theList.csv
and about 150 items inmodel-list.csv
file, I was planning to achieve this using a bash script, which is as follows:#!/bin/bash file1="List.csv" file2="model-list.csv" outfile="List_out.csv" stagingfile="List-staging.csv" rm -f "$outfile" "$stagingfile" while read line do ModelNo=`echo "$line"|awk -F',' '{print $2}'` ModelName=`echo "$line"|awk -F',' '{print $3}'` cat "$file1"|grep ",$ModelNo," > "$stagingfile" if [ -s "$stagingfile" ] then while read line1 do NewLine=`echo "$line1"|sed "s/,${ModelNo},/,${ModelName},/g"` echo "$NewLine" >> "$outfile" done < "$stagingfile" rm -f "$stagingfile" fi done < "$file2"
When above script is executed, the
"$outfile"
contains almost 40-50 additional entries as compared to theList.csv
.Anything wrong with the script?
-
Admin about 10 yearsWhat is the actual output? That should make it easier to debug.
-
Admin about 10 yearsAs I said, the file to be handled is too large to debug. I could just see number of added entries to the resulting file.
List.csv
has 2914 entries and$outfile
has 2957 of them. -
Admin about 10 yearsYou could put some of the entries that you weren't expecting in your question. If you don't know which ones shouldn't be there I suggest you trim down both files until the number is manageable.
-
Admin about 10 yearsYes. That can be done. I will have to check for this solution. Thanks.
-
Admin about 10 yearsWould a Perl solution be acceptable? This is possible in bash, I'd probably do it with associative arrays but it is trivial in Perl.
-
Admin about 10 yearsYes. Perl will also be acceptable. But Bash is the preferred one.
-
-
Mandar Shinde about 10 years@l0b0- I am having two options to deal with this issue: Use
Perl
or useBASH
. I preferred to select the later. Thanks for the notes. -
Mandar Shinde about 10 yearsAre you sure about the delimiter? You mentioned a
|
(pipe) here. Or should it be a comma? -
lgeorget about 10 years@MandarShinde
-F',|, '
means that the delimiter is either a comma,
or a comma followed by a space,
. The pipe is part of the-F
option. -
Ramesh about 10 yearsThis is a more elegant solution rather than the bash script solution. :)
-
Josh Jolly about 10 years@MandarShinde - you can use regex patterns as the field separator - see gnu.org/software/gawk/manual/html_node/… (in this case, as Igeorget correctly commented, the field separator is either a single comma or a comma followed by a space, due to the differing nature of your input files).
-
Mandar Shinde about 10 years@JoshJolly- So, the entire piece of code I have mentioned in the question can be replaced by a single command. Isn't it ?
-
Josh Jolly about 10 yearsPotentially, yes. You should test my answer thoroughly though. :)
-
Mandar Shinde about 10 yearsWorked absolutely fine! Can you please elaborate what
'NR==FNR{a[$2]=$3} NR>FNR{$8=a[$8];print}' OFS=',' "$file2" "$file1
in thisawk
command exactly does? Is it a kind ofif-elseif
statement? -
Josh Jolly about 10 yearsI have edited to explain, but generally in awk a condition followed by an action in braces means if the condition is met, the action takes place.