how can i compare data in 2 files to identify common and unique data?

18,265

Solution 1

No matter if your file1 and file2 are sorted or not, use command as follows:

unique data in file1:

awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
4
5
6

unique data in file2:

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2
a
b
c

common data:

awk 'NR==FNR{a[$0];next} ($0 in a)' file1 file2
1
2
3

Explanation:

NR==FNR    - Execute next block for 1st file only
a[$0]      - Create an associative array with key as '$0' (whole line) and copy that into it as its content.
next       - move to next row
($0 in a)  - For each line saved in `a` array:
             print the common lines from 1st and 2nd file "($0 in a)' file1 file2"
             or unique lines in 1st file only "!($0 in a)' file2 file1"
             or unique lines in 2nd file only "!($0 in a)' file1 file2"

Solution 2

This is what comm is for:

$ comm <(sort file1) <(sort file2)
        1
        2
        3
4
5
6
    a
    b
    c

The first column is lines only appearing in file 1
The second column is lines only appearing in file 2
The third column is lines common to both files

comm requires the input files to be sorted

To exclude any column from appearing, add an option with that column number. For example, to see only the lines in common, use comm -12 ... or the lines that are only in file2, comm -13 ...

Share:
18,265

Related videos on Youtube

Vigneswara Prabhu
Author by

Vigneswara Prabhu

I am doing a Job that has nothing to do with my passions or stimulates me in any way. Nothing new there. I am someone who devotes half his time earning money to spend on my pursuits during the other half. An aspiring Writer (for the past 5 years) and extremely moody. wishes to make most of his life, but is content to browsing through self help books &amp; media without taking any action. Risk averse; a frog in lukewarm cup of water. But staying Idle doesn't suite me. Like my mother, it makes me restless. so i am always doing..something. Reading, Writing, Learning. The challenge is, to make something of it. To make it something productive. Because I'm Afraid. That soon i'll find myself staring back at a stranger at the end of the rope. One whom, after death, shall be remembered at most by 2 generations before fading to..obscurity. And that lack of agency for me to the world is terrifying. You won't find me on any social media. Because i left half of them, and never joined the other half. Best decision of my life (So Far). Distancing myself from a crowd with the FOMO syndrome. Although part of it was getting sick of seeing others get on with life, while i seemed to stay stationary. If you want someone to talk to, at an intellectual or personal level, about anything under the sun, that is not related to social media i am the person. I am a person who strives for many hundred acquaintances, and close friends one can count on two hands. Love your family, respect your friends, never stop trying. Because what else is there to do, in this shallow, complex, sacred world.

Updated on September 18, 2022

Comments

  • Vigneswara Prabhu
    Vigneswara Prabhu over 1 year

    How can I compare data in 2 files to identify common and unique data ? I can't do it line by line because I have file 1 which contains say 100 id/codes/number-set and I want to compare a file 2 to file 1.

    The thing is that file 2 contains a subset of data in file 1 and also data unique to file 2, for example:

    file 1      file 2
    1            1
    2            a
    3            2
    4            b
    5            3 
    6            c
    

    How can I compare both files to identify data that is common and unique to each files? diff can't seem to do the job.