powershell binary file comparison

19,933

Solution 1

Another method is to compare the MD5 hashes of the files:

$Filepath1 = 'c:\testfiles\testfile.txt'
$Filepath2 = 'c:\testfiles\testfile1.txt'

$hashes = 
foreach ($Filepath in $Filepath1,$Filepath2)
{
 $MD5 = [Security.Cryptography.HashAlgorithm]::Create( "MD5" )
 $stream = ([IO.StreamReader]"$Filepath").BaseStream
 -join ($MD5.ComputeHash($stream) | 
 ForEach { "{0:x2}" -f $_ })
 $stream.Close()
 }

if ($hashes[0] -eq $hashes[1])
  {'Files Match'}

Solution 2

With PowerShell 4 you can use native commandlets to do this:

function CompareFiles {
    param(
    [string]$Filepath1,
    [string]$Filepath2
    )
    if ((Get-FileHash $Filepath1).Hash -eq (Get-FileHash $Filepath2).Hash) {
        Write-Host 'Files Match' -ForegroundColor Green
    } else {
        Write-Host 'Files do not match' -ForegroundColor Red
    }
}

PS C:> CompareFiles .\20131104.csv .\20131104-copy.csv

Files Match

PS C:> CompareFiles .\20131104.csv .\20131107.csv

Files do not match

You could easily modify the above function to return a $true or $false value if you want to use this programmatically on a large scale


EDIT

After seeing this answer, I just wanted to supply larger scale version that simply returns true or false:

function CompareFiles 
{
    param
    (
        [parameter(
            Mandatory = $true,
            HelpMessage = "Specifies the 1st file to compare. Make sure it's an absolute path with the file name and its extension."
        )]
        [string]
        $file1,

        [parameter(
            Mandatory = $true,
            HelpMessage = "Specifies the 2nd file to compare. Make sure it's an absolute path with the file name and its extension."
        )]
        [string]
        $file2
    )

    ( Get-FileHash $file1 ).Hash -eq ( Get-FileHash $file2 ).Hash
}

Solution 3

You could use fc.exe. It comes with Windows. Here's how you would use it:

fc.exe /b d:\local\prodexport2 d:\local\prodexport1 > $null
if (!$?) {
    "The files are different"
}

Solution 4

A while back I wrote an article on a buffered comparison routine to compare two files with PowerShell:

function FilesAreEqual {
    param(
        [System.IO.FileInfo] $first,
        [System.IO.FileInfo] $second, 
        [uint32] $bufferSize = 524288) 

    if ($first.Length -ne $second.Length) return $false

    if ( $bufferSize -eq 0 ) $bufferSize = 524288

    $fs1 = $first.OpenRead()
    $fs2 = $second.OpenRead()

    $one = New-Object byte[] $bufferSize
    $two = New-Object byte[] $bufferSize
    $equal = $true

    do {
        $bytesRead = $fs1.Read($one, 0, $bufferSize)
        $fs2.Read($two, 0, $bufferSize) | out-null

        if ( -Not [System.Linq.Enumerable]::SequenceEqual($one, $two)) {
            $equal = $false
        }

    } while ($equal -and $bytesRead -eq $bufferSize)

    $fs1.Close()
    $fs2.Close()

    return $equal
}

You can use it by:

FilesAreEqual c:\temp\test.html c:\temp\test.html

A hash (like MD5) needs to traverse the entire file to do the hash calculation. This script returns as soon at it sees a difference in the buffer. It compares the buffer using LINQ which is faster than native PowerShell.

Solution 5

if ( (Get-FileHash c:\testfiles\testfile1.txt).Hash -eq (Get-FileHash c:\testfiles\testfile2.txt).Hash ) {
   Write-Output "Files match"
} else {
   Write-Output "Files do not match"
}
Share:
19,933
user2967267
Author by

user2967267

Updated on June 18, 2022

Comments

  • user2967267
    user2967267 almost 2 years

    All, There is a application which generates it's export dumps.I need to write a script that will compare the previous days dump against the latest and if there are differences among them i have to some basic manipulations of moving and deleting sort of stuff. I have tried finding a suitable way of doing it and the method i tried was : $var_com=diff (get-content D:\local\prodexport2 -encoding Byte) (get-content D:\local\prodexport2 -encoding Byte) I tried the Compare-Object cmdlet as well. I notice a very high memory usage and eventually i get a message System.OutOfMemoryException after few minutes. Has one of you done something similer ?. Some thoughts please. There was a thread which mentioned about a has comparison which i have no idea as to how to go about. Thanks in advance folks Osp

  • user2967267
    user2967267 over 10 years
    Thanks for this. It took away the long time it used to take for the comparison.
  • Duncan
    Duncan about 10 years
    I tried using this code with relative paths (so in Powershell cd somewhere and then $FilePath1 = 'testfile.txt') but the StreamReader doesn't pick up Powershell's change of folder and thinks it is relative to my home folder instead. The fix is to use $Filepath1 = Get-Item 'testfile.txt' instead and then Powershell passes the correct absolute path to StreamReader.
  • Code Maverick
    Code Maverick over 9 years
    I might be inclined to not use the if (!$?) and replace it with if ($LastExitCode -eq 0). Check out stackoverflow.com/q/10666101 and all the answers.
  • Code Maverick
    Code Maverick over 9 years
    How would your routine compare with the @ericnils answer with respect to performance? When using it inside a function that could get called from a foreach that contains however many files of varying sizes, is yours more optimized than the 4.0 Get-FileHash?
  • arberg
    arberg over 8 years
    This is extremely slow for different files, because it prints all differences (to null). It seems fc does not support not printing output. One can use 'fc /a /b ' which might try to output less but didn't make big difference for me.
  • Keith Hill
    Keith Hill over 8 years
    Just out of curiosity does it help to assign to $null e.g. $null = fc.exe ...?
  • Nacht
    Nacht about 8 years
    @CodeMaverick, it should be for exactly the reason he stated. it doesn't have to read both entire files unless they are the same. It's the ideal solution
  • herzbube
    herzbube over 6 years
    I suggest setting $BYTES_TO_READ to some higher value than 8. On my system reading 8 Bytes per iteration was extremely slow. I don't know what the best value is, but increasing the buffer size to 32768 (32 KB) certainly made the file compare a lot snappier.
  • herzbube
    herzbube over 6 years
    I realized that changing $BYTES_TO_READ is not enough, because inside the loop the BitConverter calls only compare the first 8 Bytes (= one Int64) of the buffer. After some deliberation I settled for a second, inner loop that iterates over the byte arrays and individually compares every byte. This is reasonably fast, and it's especially much faster than the ultra-slow compare-object cmdlet.
  • John Rees
    John Rees over 5 years
    Unfortunately as herzbube notes, the current implementation gives completely wrong answers because only 8 bytes out of every 32768 are actually compared.
  • NoBrassRing
    NoBrassRing almost 5 years
    Powershell's Get-FileHash function is (now) available, and does the same thing more simply.
  • Mattia Lancieri
    Mattia Lancieri over 4 years
    Very interesting, is the version with the int64 problem solved?
  • Plutian
    Plutian about 4 years
    Hi and welcome to stackoverflow, and thank you for answering. While this code might answer the question, can you consider adding some explanation for what the problem was you solved, and how you solved it? This will help future readers to understand your answer better and learn from it.
  • Kees C. Bakker
    Kees C. Bakker almost 4 years
    Added a buffer, as read by buffer is way faster. Updated the original blog article as well.