Linux Combine two files by column

18,122

Solution 1

$ awk -v OFS='\t' '
NR==1   { print $0, "Remark1", "Remark2"; next }
NR==FNR { a[$1]=$0; next }
$1 in a { print a[$1], $2, $3 }
' Test1.txt Test2.txt
ID     Name  Telephone  Remark1 Remark2
1      John     011     Test1   Test2
2      Sam      013     Test3   Test4
3      Jena     014     Test5   Test6
4      Peter    015     Test7   Test8

Solution 2

It is far easier to use the join command:

$ cat a.txt 
ID     Name  Telephone       
1      John     011
2      Sam      013
3      Jena     014
4      Peter    015
$ cat b.txt 
ID     Remark1  Remark2       
1       Test1    Test2
2       Test3    Test4
3       Test5    Test6
4       Test7    Test8
5       Test7    Test8
6       Test7    Test8
7       Test7    Test8
8       Test7    Test8
9       Test7    Test8
$ join a.txt b.txt 
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8

Use the column command to pretty print it:

$ join a.txt b.txt | column -t
ID  Name   Telephone  Remark1  Remark2
1   John   011        Test1    Test2
2   Sam    013        Test3    Test4
3   Jena   014        Test5    Test6
4   Peter  015        Test7    Test8

Solution 3

 awk -F"\t" '
     {key = $1 FS $2 FS $3 FS $4}
     NR == 1 {header = key}
     !(key in result) {result[key] = $0; next}
     { for (i=5; i <= NF; i++) result[key] = result[key] FS $i }
     END {
         print result[header]
         delete result[header]
         PROCINFO["sorted_in"] = "@ind_str_asc"    # if using GNU awk
         for (key in result) print result[key]
     }
 ' Test1.txt Test2.txt ... > result.txt
Share:
18,122
clear.choi
Author by

clear.choi

Updated on June 04, 2022

Comments

  • clear.choi
    clear.choi almost 2 years

    I am trying to combine two files as below (Intersection)

    ID     Name  Telephone       
    1      John     011
    2      Sam      013
    3      Jena     014
    4      Peter    015
    

    Second file Test2.txt

    1       Test1    Test2
    2       Test3    Test4
    3       Test5    Test6
    4       Test7    Test8
    5       Test7    Test8
    6       Test7    Test8
    7       Test7    Test8
    8       Test7    Test8
    9       Test7    Test8
    

    Then Final result

    ID     Name  Telephone    Remark1  Remark2
    1      John    011        Test1    Test2
    2      Sam     013        Test3    Test4
    3      Jena    014        Test5    Test6
    4      Peter   015        Test7    Test8
    

    I did like this as below,

    awk -F"\t" '
        {key = $1 }
        NR == 1 {header = key}
        !(key in result) {result[key] = $0; next}
        { for (i=2; i <= NF; i++) result[key] = result[key] FS $i }
        END {
            print result[header]
            delete result[header]
            PROCINFO["sorted_in"] = "@ind_str_asc"    
            for (key in result) print result[key]
        }
    ' Test1.txt Test2.txt > result.txt
    

    And I just notice that this is Union set. Including all data Test1 and Test2.

    I would like to show only for Intersection case as what I expected result. (1, 2 ,3 ,4) only

    Do you guys have any idea? Thanks!

  • Aaron McDaid
    Aaron McDaid over 9 years
    join is the right tool, but beware that the sort order used by sort and by join are not exactly the same. Best to use env LANG=en_EN sort ... and env LANG=en_UN join .... A frustrating problem. For more, see unix.stackexchange.com/questions/12942/…
  • clear.choi
    clear.choi over 9 years
    when I try to do with my case correctly joined but tab delimited is disappeared. And if I use column -t looks like broken format.
  • clear.choi
    clear.choi over 9 years
    @glennjackman I already did and no resuly file size 0 so I tried join -a1 -t $'\t' test1.txt test2.txt > result.txt then no combine. result will be same as test1.txt and result.txt.
  • glenn jackman
    glenn jackman over 9 years
    The answer assumes the 2nd file has a header line. does it?
  • damienfrancois
    damienfrancois over 9 years
    @AaronMcDaid thanks for the feedback I never had the issue but it is good to know. The data in this case was sorted already though, and I skipped the sort part.
  • clear.choi
    clear.choi over 9 years
    @glennjackman oh 2nd file does not have header line do they need? then I will insert header first for second file
  • damienfrancois
    damienfrancois over 9 years
    @glennjackman you are right, I am now confused as I copy/pasted the examples on my system and in my files the second file has a header and I am sure I did not insert it myself. But the question indeed does not present a header in the second file.
  • clear.choi
    clear.choi over 9 years
    @damienfrancois Thank you for your answer and sorry I edited question and during that time u already copy and paste I am sorry for that ! I will try to make header and try again
  • clear.choi
    clear.choi over 9 years
    Still it's not join only header part is join but main part is not join problem is if I using just join then correctly join but It need to be tab delimited and if I use -t $'\t' it looks like does not correct
  • clear.choi
    clear.choi over 9 years
    And also if I use | column -t then show error message "column: line too long"
  • Casimir et Hippolyte
    Casimir et Hippolyte over 9 years
    @glennjackman: the header problem can be solved like this: join ./phones.txt <(echo -e "ID Remark1 Remark2\n`cat ./remarks.txt`") | column -t
  • glenn jackman
    glenn jackman over 9 years
    @CasimiretHippolyte, better would be <(echo -e "ID\tRemark1\tRemark2"; cat ./remarks.txt)
  • glenn jackman
    glenn jackman over 9 years
    @clear.choi, try sed '1{s/$/\tRemark1\tRemark2/;q}' file1; join -t$'\t' <(sed 1d file1|sort) <(sort file2)