Difference in whitespace between two files on Linux

14,116

Solution 1

For vim users, there is a handy utility to show exact differences between files:

vimdiff file1 file2

This will put each file in windows, side-by-side, and differences with highlighted in color.

Some useful commands when in vimdiff

While in vimdiff, some useful commands are:

  • ]c: jump to next change

  • [c: jump to previous change

  • ctrl-W ctrl-W: switch to other window

  • zo: open folds

  • zc: close folds

Example

Here is an example of vimdiff in an xterm comparing two versions of a cups configuration file:

enter image description here

You can see that long sections of identical lines have been collapsed. They can be opened again with zo.

The color scheme will vary depending on your option settings. In the above example, when a line appears in one file but not the other, that line is given a dark blue background. In the other file, the missing lines are indicated by dashed lines. When a line appears in both files but has some differences, the unchanged parts of the lines have a pink background and the changed parts have a red background.

Solution 2

On FreeBSD or most Linux systems, you can pipe the output of diff through cat -v -e -t to show whitespace differences.

diff file1 file2 | cat -vet

Tabs will be shown as ^I, a $ will be shown at the end of each line so that you can see trailing whitespace, and nonprinting characters will be displayed as ^X or M-X.

If you have GNU coreutils (available on most non-busybox Linux distributions), this can be simplified to

diff file1 file2 | cat -A

On busybox systems, use catv -vet .

Solution 3

od may help. The Octal Dump command can show contents in hexadecimal. This can help you to see what bytes, including null bytes or unexpected white space, is in a file. Possible common causes may be LF vs CRLF, tabs vs spaces, or ASCII vs Unicode (which may often just have a null byte before each normally visible byte). od -x filename ought to reveal any of those patterns. If you want a more elaborate way to view the file, any "hex editor" may do nicely. The nice thing about od is that, like the cut command, it is built into many Unix systems. So, often, no separate installation is necessary.

If you need files to be more similar, tr can make some changes, and sed can make more. I would probably start with ls -l to see which file is larger, then view bytes to see what needs to be changed, and then change one of the files so that they seem more similar.

Solution 4

Was one of the files edited on a Windows machine?

Standard line termination on Windows is CRLF, where on Linux it's simply LF (and on Macs it used to be CR, but I suspect that's changed since OS X).

Try wc -l on the files and see how many lines, then see if the size difference is the same as the number of lines (last line may not be terminated in one file).

Solution 5

To find out where real whitespaces and tabs are you could replace them using sed for example:

$ cat file
  line 1
  line 2
    line 6
        line 7
$ sed 's/ /-/g; s/\t/<tab>/g' file
--line-1
--line-2
<tab>line-6
<tab><tab>line-7

And now compare the two files.

Share:
14,116

Related videos on Youtube

Romski
Author by

Romski

Former Java developer in recovery, doing Javascript everywhere! Node, Angular, React, Redux, rxjs, Docker, Vagrant, AWS (Lambda, SQS, SNS, DynamoDb, ...). OSX &gt; Linux &gt; Windows. Spaces not tabs.

Updated on September 18, 2022

Comments

  • Romski
    Romski over 1 year

    I have two files that when I compare with diff show that every line has changed. When I compare them with diff -w (ignoring whitespace) it shows the few minimal changes that I expect.

    Obviously there is some difference between the whitespace in each file, but I don't know what they are or how to find them. I have tried editing the files to ensure that the whitespace is actually space characters (as opposed to tabs) but am unsure what else to do.

    I have used vim with :set list on to confirm there was no trailing space at the end of the lines.

    I also believe that each file has Linux line terminators as vim didn't show the ^M at the end of the lines.

    • Romski
      Romski over 9 years
      Good suggestion. I used vim with ":set list on" this showed the "$" at the end of the line and there was no trailing space. I'll update my question
    • Romski
      Romski over 9 years
      @John1024 I was unaware of vimdiff, but it looks promising. Add it as answer and I'll accept
    • Lie Ryan
      Lie Ryan over 9 years
      Vim shows ^M only when it misdetects a Unix line ending but the file actually has DOS line ending. Usually this happens if you've got mixed line ending in a single file, e.g. applying a patch with different line ending than the original file. When vim detects DOS line ending correctly, it wouldn't have shown the ^M.
  • Romski
    Romski over 9 years
    Thanks for the quick reply. Doing a line count shows that one file has 5 more lines (I expect this as I've made edits). I got one file from a Linux machine and the other was checked out from a code repository onto Linux. I believe that viewing a file with Windows terminators in vim will show the last character as ^M and that's not the case.
  • fencepost
    fencepost over 9 years
    vim is actually smart enough to autodetect the line termination, see stackoverflow.com/questions/3852868 for details.
  • Romski
    Romski over 9 years
    I was not aware of that! I'll re-check
  • Daniel Labonté
    Daniel Labonté about 9 years
    Even better, you could run that filter on the diff output. Or you could use the ready-made filter in cat, as in superuser.com/a/913368/37154