Difference in whitespace between two files on Linux
Solution 1
For vim
users, there is a handy utility to show exact differences between files:
vimdiff file1 file2
This will put each file in windows, side-by-side, and differences with highlighted in color.
Some useful commands when in vimdiff
While in vimdiff
, some useful commands are:
]c
: jump to next change[c
: jump to previous changectrl-W ctrl-W
: switch to other windowzo
: open foldszc
: close folds
Example
Here is an example of vimdiff
in an xterm
comparing two versions of a cups
configuration file:
You can see that long sections of identical lines have been collapsed. They can be opened again with zo
.
The color scheme will vary depending on your option settings. In the above example, when a line appears in one file but not the other, that line is given a dark blue background. In the other file, the missing lines are indicated by dashed lines. When a line appears in both files but has some differences, the unchanged parts of the lines have a pink background and the changed parts have a red background.
Solution 2
On FreeBSD or most Linux systems, you can pipe the output of diff through cat -v -e -t
to show whitespace differences.
diff file1 file2 | cat -vet
Tabs will be shown as ^I
, a $
will be shown at the end of each line so that you can see trailing whitespace, and nonprinting characters will be displayed as ^X
or M-X
.
If you have GNU coreutils (available on most non-busybox Linux distributions), this can be simplified to
diff file1 file2 | cat -A
On busybox systems, use catv -vet
.
Solution 3
od
may help. The Octal Dump command can show contents in hexadecimal. This can help you to see what bytes, including null bytes or unexpected white space, is in a file. Possible common causes may be LF vs CRLF, tabs vs spaces, or ASCII vs Unicode (which may often just have a null byte before each normally visible byte). od -x filename
ought to reveal any of those patterns. If you want a more elaborate way to view the file, any "hex editor" may do nicely. The nice thing about od
is that, like the cut
command, it is built into many Unix systems. So, often, no separate installation is necessary.
If you need files to be more similar, tr
can make some changes, and sed
can make more. I would probably start with ls -l
to see which file is larger, then view bytes to see what needs to be changed, and then change one of the files so that they seem more similar.
Solution 4
Was one of the files edited on a Windows machine?
Standard line termination on Windows is CRLF, where on Linux it's simply LF (and on Macs it used to be CR, but I suspect that's changed since OS X).
Try wc -l
on the files and see how many lines, then see if the size difference is the same as the number of lines (last line may not be terminated in one file).
Solution 5
To find out where real whitespaces and tabs are you could replace them using sed
for example:
$ cat file
line 1
line 2
line 6
line 7
$ sed 's/ /-/g; s/\t/<tab>/g' file
--line-1
--line-2
<tab>line-6
<tab><tab>line-7
And now compare the two files.
Related videos on Youtube
Romski
Former Java developer in recovery, doing Javascript everywhere! Node, Angular, React, Redux, rxjs, Docker, Vagrant, AWS (Lambda, SQS, SNS, DynamoDb, ...). OSX > Linux > Windows. Spaces not tabs.
Updated on September 18, 2022Comments
-
Romski over 1 year
I have two files that when I compare with diff show that every line has changed. When I compare them with
diff -w
(ignoring whitespace) it shows the few minimal changes that I expect.Obviously there is some difference between the whitespace in each file, but I don't know what they are or how to find them. I have tried editing the files to ensure that the whitespace is actually space characters (as opposed to tabs) but am unsure what else to do.
I have used vim with
:set list on
to confirm there was no trailing space at the end of the lines.I also believe that each file has Linux line terminators as vim didn't show the
^M
at the end of the lines.-
Romski over 9 yearsGood suggestion. I used vim with ":set list on" this showed the "$" at the end of the line and there was no trailing space. I'll update my question
-
Romski over 9 years@John1024 I was unaware of vimdiff, but it looks promising. Add it as answer and I'll accept
-
Lie Ryan over 9 yearsVim shows ^M only when it misdetects a Unix line ending but the file actually has DOS line ending. Usually this happens if you've got mixed line ending in a single file, e.g. applying a patch with different line ending than the original file. When vim detects DOS line ending correctly, it wouldn't have shown the ^M.
-
-
Romski over 9 yearsThanks for the quick reply. Doing a line count shows that one file has 5 more lines (I expect this as I've made edits). I got one file from a Linux machine and the other was checked out from a code repository onto Linux. I believe that viewing a file with Windows terminators in vim will show the last character as ^M and that's not the case.
-
fencepost over 9 yearsvim is actually smart enough to autodetect the line termination, see stackoverflow.com/questions/3852868 for details.
-
Romski over 9 yearsI was not aware of that! I'll re-check
-
Daniel Labonté about 9 yearsEven better, you could run that filter on the diff output. Or you could use the ready-made filter in
cat
, as in superuser.com/a/913368/37154