diff stop after first difference
Solution 1
cmp
stops at the first difference:
% cat foo
foo
bar
baz
---
foo
bar
baz
% cat bar
foo
bar
baz
---
foo+
bar+
baz+
% cmp foo bar
foo bar differ: byte 20, line 5
%
You could wrap a script around it in order to print the different lines:
#! /bin/bash
line=$(cmp "$1" "$2" | awk '{print $NF}')
if [ ! -z $line ]; then
awk -v file="$1" -v line=$line 'NR==line{print "In file "file": "$0; exit}' "$1"
awk -v file="$2" -v line=$line 'NR==line{print "In file "file": "$0; exit}' "$2"
fi
% ./script.sh foo bar
In file foo: foo
In file bar: foo+
Part of the cost is now shifted to the AWK commands, but it should be significantly faster than checking both files entirely.
Solution 2
I tested this with the trivial cases but leave the field test to you:
$ cat f1
l1
l21 l22 l23 l24
l3
l4x
l5
$ cat f2
l1
l21 l22 l23
l3
l4y
l5
$ cat awkdiff.awk
BEGIN {
maxdiff = 5
ignoreemptylines = 1
whitespaceaware = 1
if (whitespaceaware) {
emptypattern = "^[[:space:]]*$"
} else {
emptypattern = "^$"
FS=""
}
f1 = ARGV[1]
f2 = ARGV[2]
rc1=rc2=1
while( (rc1>0 && rc2>0 && diff<maxdiff) ) {
rc1 = getline l1 < f1 ; ++nr1
rc2 = getline l2 < f2 ; ++nr2
if (ignoreemptylines) {
while ( l1 ~ emptypattern && rc1>0) {
rc1 = getline l1 < f1 ; ++nr1
}
while ( l2 ~ emptypattern && rc2>0) {
rc2 = getline l2 < f2 ; ++nr2
}
}
if ( rc1>0 && rc2>0) {
nf1 = split( l1, a1)
nf2 = split( l2, a2)
if ( nf1 <= nf2) {
nfmin = nf1
} else {
nfmin = nf2
}
founddiff = 0
for (i=1; i<=nfmin; ++i) {
if ( a2[i]"" != a1[i]"") {
printf "%d:%d:{%s} != %d:%d:{%s}\n", \
nr1, nf1, a1[i], nr2, nf2, a2[i]
founddiff=1
++diff
break
}
}
if ( !founddiff && nf1 != nf2) {
if ( nf1 > nf2)
printf "%d:%d:{%s} != %d:EOL\n", nr1, nfmin+1, a1[nfmin+1], nr2
else
printf "%d:EOL != %d:%d:{%s}\n", nr1, nr2, nfmin+1, a2[nfmin+1]
++diff
}
} else {
if ( rc1 == -1 && rc2 == -1) {
print "IO error"
} else if ( rc1 == 1 && rc2 == 0) {
print "%d:%s != EOL\n", nr1, l1
} else if ( rc1 == 0 && rc2 == 1) {
printf "EOL != %d:%s\n", nr2, l2
}
}
}
}
$ awk -f awkdiff.awk /tmp/f1 /tmp/f2
2:4:{l24} != 2:EOL
6:1:{l4x} != 5:1:{l4y}
maxdiff = N: sets the maximum number of differences at which comparison should stop
ignoreemptylines = 1|0: specifies if empty lines should be ignored when comparing
whitespaceaware = 1|0: specifies if comparison should be done wordwise (assuming consecutive whitespaces equal) or linewise
Related videos on Youtube
TTT
Updated on September 18, 2022Comments
-
TTT over 1 year
I'd like to perform a
diff
on 2 files and have it cease at the first difference. I don't require that the command be done viadiff
, of course, but I do require that the actual command cease once the first difference is found and reported. I'm running on some very large files, and expect a perfect match, but still want to know what the difference was, should one be found, sodiff -q
,diff ... |head -1
, andcmp
are no good. And, since the files are very large, something that doesn't exhaust memory would be nice. Although not necessary for my current problem, bonus points for solutions that work for the first (user-specified) n differences, and for ones that can ignore whitespace differences. -
kos over 8 years@TTT Not sure what you mean, the script I proposed shows the first different line in both files (in the example that is line 5).
-
TTT over 8 yearsWhoops, meant to delete the comment, accidental post. Deleting now.
-
Vladimir Panteleev almost 8 yearsWarning! If one file is a prefix of the other,
cmp
will simply printcmp: EOF on shorter-file
. If this can happen with your input, make sure to handle this edge case.