Linux Combine two files by column
18,122
Solution 1
$ awk -v OFS='\t' '
NR==1 { print $0, "Remark1", "Remark2"; next }
NR==FNR { a[$1]=$0; next }
$1 in a { print a[$1], $2, $3 }
' Test1.txt Test2.txt
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
Solution 2
It is far easier to use the join
command:
$ cat a.txt
ID Name Telephone
1 John 011
2 Sam 013
3 Jena 014
4 Peter 015
$ cat b.txt
ID Remark1 Remark2
1 Test1 Test2
2 Test3 Test4
3 Test5 Test6
4 Test7 Test8
5 Test7 Test8
6 Test7 Test8
7 Test7 Test8
8 Test7 Test8
9 Test7 Test8
$ join a.txt b.txt
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
Use the column
command to pretty print it:
$ join a.txt b.txt | column -t
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
Solution 3
awk -F"\t" '
{key = $1 FS $2 FS $3 FS $4}
NR == 1 {header = key}
!(key in result) {result[key] = $0; next}
{ for (i=5; i <= NF; i++) result[key] = result[key] FS $i }
END {
print result[header]
delete result[header]
PROCINFO["sorted_in"] = "@ind_str_asc" # if using GNU awk
for (key in result) print result[key]
}
' Test1.txt Test2.txt ... > result.txt
Author by
clear.choi
Updated on June 04, 2022Comments
-
clear.choi almost 2 years
I am trying to combine two files as below (Intersection)
ID Name Telephone 1 John 011 2 Sam 013 3 Jena 014 4 Peter 015
Second file Test2.txt
1 Test1 Test2 2 Test3 Test4 3 Test5 Test6 4 Test7 Test8 5 Test7 Test8 6 Test7 Test8 7 Test7 Test8 8 Test7 Test8 9 Test7 Test8
Then Final result
ID Name Telephone Remark1 Remark2 1 John 011 Test1 Test2 2 Sam 013 Test3 Test4 3 Jena 014 Test5 Test6 4 Peter 015 Test7 Test8
I did like this as below,
awk -F"\t" ' {key = $1 } NR == 1 {header = key} !(key in result) {result[key] = $0; next} { for (i=2; i <= NF; i++) result[key] = result[key] FS $i } END { print result[header] delete result[header] PROCINFO["sorted_in"] = "@ind_str_asc" for (key in result) print result[key] } ' Test1.txt Test2.txt > result.txt
And I just notice that this is Union set. Including all data Test1 and Test2.
I would like to show only for Intersection case as what I expected result. (1, 2 ,3 ,4) only
Do you guys have any idea? Thanks!
-
Aaron McDaid over 9 years
join
is the right tool, but beware that the sort order used bysort
and byjoin
are not exactly the same. Best to useenv LANG=en_EN sort ...
andenv LANG=en_UN join ...
. A frustrating problem. For more, see unix.stackexchange.com/questions/12942/… -
clear.choi over 9 yearswhen I try to do with my case correctly joined but tab delimited is disappeared. And if I use column -t looks like broken format.
-
clear.choi over 9 years@glennjackman I already did and no resuly file size 0 so I tried join -a1 -t $'\t' test1.txt test2.txt > result.txt then no combine. result will be same as test1.txt and result.txt.
-
glenn jackman over 9 yearsThe answer assumes the 2nd file has a header line. does it?
-
damienfrancois over 9 years@AaronMcDaid thanks for the feedback I never had the issue but it is good to know. The data in this case was sorted already though, and I skipped the
sort
part. -
clear.choi over 9 years@glennjackman oh 2nd file does not have header line do they need? then I will insert header first for second file
-
damienfrancois over 9 years@glennjackman you are right, I am now confused as I copy/pasted the examples on my system and in my files the second file has a header and I am sure I did not insert it myself. But the question indeed does not present a header in the second file.
-
clear.choi over 9 years@damienfrancois Thank you for your answer and sorry I edited question and during that time u already copy and paste I am sorry for that ! I will try to make header and try again
-
clear.choi over 9 yearsStill it's not join only header part is join but main part is not join problem is if I using just join then correctly join but It need to be tab delimited and if I use -t $'\t' it looks like does not correct
-
clear.choi over 9 yearsAnd also if I use | column -t then show error message "column: line too long"
-
Casimir et Hippolyte over 9 years@glennjackman: the header problem can be solved like this:
join ./phones.txt <(echo -e "ID Remark1 Remark2\n`cat ./remarks.txt`") | column -t
-
glenn jackman over 9 years@CasimiretHippolyte, better would be
<(echo -e "ID\tRemark1\tRemark2"; cat ./remarks.txt)
-
glenn jackman over 9 years@clear.choi, try
sed '1{s/$/\tRemark1\tRemark2/;q}' file1; join -t$'\t' <(sed 1d file1|sort) <(sort file2)