list the difference and overlap between two plain data set

19,476

Solution 1

Use the comm command.

If you lists are in files listA and listB:

comm listA listB

By default, comm will return 3 columns. Items only in listA, items only in listB, and items common to both lists.

You can suppress individual columns, with a -1, -2, or -3 arg.

Solution 2

This will give you the unique items that exist in A but not in B:

cat A|perl -ne '$z=$_;chomp($z);$y=`grep $z B`;if ($y== "") {print "\n$z";}'|sort -u

This will give you the list of common items in both A and B:

cat A |xargs -i grep {} B|sort -u
Share:
19,476

Related videos on Youtube

user1420706
Author by

user1420706

Updated on September 18, 2022

Comments

  • user1420706
    user1420706 over 1 year

    Possible Duplicate:
    Linux tools to treat files as sets and perform set operations on them

    I have two data sets, A and B. The format for each data set is one number per line. For instance,

    12345
    23456
    67891
    2345900
    12345
    

    Some of the data in A are not included in data set B. How to list all of these data in A, and how to list all of those data shared by A and B. How can I do that using Linux/UNIX commands?

  • HongboZhu
    HongboZhu over 9 years
    The answer assumes listA and listB are already sorted. A more general solution: comm <(sort listA) <(sort listB)
  • Mike
    Mike almost 9 years
    Very simple solution. Is the comm command deployed in all linux distro?