C# - Compare Two Text Files

12,261

You would then have to compare the string content if the files. The StreamReader (which ReadLines uses) should detect the encoding.

var areEquals = System.IO.File.ReadLines("c:\\file1.txt").SequenceEqual(
                System.IO.File.ReadLines("c:\\file2.txt"));

Note that ReadLines will not read the complete file into memory.

Share:
12,261
Danny Lager
Author by

Danny Lager

Updated on June 27, 2022

Comments

  • Danny Lager
    Danny Lager almost 2 years

    Background

    I'm developing a simple windows service which monitors certain directories for file creation events and logs these - long story short, to ascertain if a file was copied from directory A to directory B. If a file is not in directory B after X time, an alert will be raised.

    The issue with this is I only have the file to go on for information when working out if it has made its way to directory B - I'd assume two files with the same name are the same, but as there are over 60 directory A's and a single directory B - AND the files in any directory A may accidentally be the same as another (by date or sequence) this is not a safe assumption...

    Example

    Lets say, for example, I store a log that file "E17999_XXX_2111.txt" was created in directory C:\Test. I would store the filename, file path, file creation date, file length and the BOM for this file.

    30 seconds later, I detect that the file "E17999_XXX_2111.txt" was created in directory C:\FinalDestination... now I have the task of determining whether;

    a) the file is the same one created in C:\Test, therefore I can update the first log as complete and stop worrying about it.

    b) the file is not the same and I somehow missed the previous steps - therefore I can ignore this file because it has found its way to the destination dir.

    Research

    So, in order to determine if the file created in the destination is exactly the same as the one created in the first instance, I've done a bit of research and found the following options:

    a) filename compare

    b) length compare

    c) a creation-date compare

    d) byte-for-byte compare

    e) hash compare

    Problems

    a) As I said above, going by Filename alone is too presumptuous.

    b) Again, just because the length of the contents of a file is the same, it doesn't necessarily mean the files are actually the same.

    c) The problem with this is that a copied file is technically a new file, therefore the creation date changes. I would want to set the first log as complete regardless of the time elapsed between the file appearing in directory A and directory B.

    d) Aside from the fact that this method is extremely slow, it appears there's an issue if the second file has somehow changed encoding - for example between ANSII and ASCII, which would cause a byte mis-match for things like ascii quotes

    I would like not to assume that just because an ASCII ' has changed to an ANSII ', the file is now different as it is near enough the same.

    e) This seems to have the same downfalls as a byte-for-byte compare

    EDIT

    It appears the actual issue I'm experiencing comes down to the reason for the difference in encoding between directories - I'm not currently able to access the code which deals with this part, so I can't tell why this happens, but I am looking to implement a solution which can compare files regardless of encoding to determine "real" differences (i.e. not those whereby a byte has changed due to encoding)

    SOLUTION

    I've managed to resolve this now by using the SequenceEqual comparison below after encoding my files to remove any bad data if the initial comparison suggested by @Magnus failed to find a match due to this. Code below:

    byte[] bytes1 = Encoding.Convert(Encoding.GetEncoding(1252), Encoding.ASCII, Encoding.GetEncoding(1252).GetBytes(File.ReadAllText(FilePath))); 
    byte[] bytes2 = Encoding.Convert(Encoding.GetEncoding(1252), Encoding.ASCII, Encoding.GetEncoding(1252).GetBytes(File.ReadAllText(FilePath))); 
    
    if (Encoding.ASCII.GetChars(bytes1).SequenceEqual(Encoding.ASCII.GetChars(bytes2)))
        { 
        //matched! 
        } 
    

    Thanks for the help!

  • Danny Lager
    Danny Lager over 8 years
    Thanks, I'll give this a go when possible, would this return true regardless of encoding as we are comparing two string literals or would it do the same as a byte-by-byte comparison?
  • Danny Lager
    Danny Lager over 8 years
    Just given this a try with the issue I'm experiencing, using UTF8 Encoding for both - File.ReadLines(FilePath1, Encoding.UTF8).SequenceEqual(File.ReadLines(FilePath2, Encoding.UTF8)) - this is returning false, yet the only difference in the files is the quote so I assume this is still throwing it off... any suggestions on how to get around this? It was extremely quick running which is a positive...
  • Magnus
    Magnus over 8 years
    Perhaps the quote character is actually different and it is not an encoding issue.
  • Danny Lager
    Danny Lager over 8 years
    Turns out I had to re-read both files and then CONVERT them to ASCII encoding before doing the above comparison, if I found that the initial comparison failed.