How to find md5sum of files on remote machines by doing ssh?

9,369

Solution 1

I have modified your script and this one works now. I have added some comments inside the script to make it more understandable. Let me know if you need more help.

#!/bin/bash

#The export path which we set here.
export PRIMARY=/home/ramesh

#The main for loop execution starts here. 
for entry in "$PRIMARY"/*
do
    #Get the base name of the file which we check in the remote servers.
    #Get just the filenames without the path.
    #I am going to use the filename in the remote server to check. 

    filename=$(basename "$entry")
    echo "File Name: $filename"
    #Calculate the MD5Sum locally.
    local_md5sum=$(md5sum "$entry")
    echo "Local MD5Sum: $local_md5sum"

    #Check if the file exists in server1. 
    #Otherwise I can check in the other server.

    if ssh ramesh@server1 stat /home/ramesh/'$filename' \> /dev/null 2\>\&1 then

        #I have the file in server1 and so I get the md5sum from server1.
        #I store the md5sum inside remote_md5sum variable.
        remote_md5sum=$(ssh ramesh@server1 "cd /home/ramesh/; find -name '$filename'  -exec md5sum {} \;")
    else
        #Now, I know the file is in server2 as it is not present in server1. 
        remote_file=$(ssh ramesh@server2 "cd /home/ramesh/; find -name '$filename'  -exec md5sum {} \;")
    fi
    echo "Remote MD5Sum: $remote_file"
done

Testing

I wanted to test the above script for file names with spaces as well. It works well and this is the output that I get when I execute the script.

File Name: file1
Local MD5Sum:  39eb72b3e8e174ed20fe66bffdc9944e  /home/ramesh/file1
Remote MD5Sum: b5fc751f836c5430b617bf90a8c4725d  ./file1

File Name: file with spaces
Local MD5Sum:  36707e275264f4ac25254e2bbe5ef041  /home/ramesh/file with spaces
Remote MD5Sum: 36707e275264f4ac25254e2bbe5ef041  ./file with spaces

Solution 2

First, you never use your variable FILES_LOCATION. That makes it useless. Second, you cannot use || like that in shell.

Try something like:

entry="$FILES_LOCATION/$(basename "$entry")"

remote_md5sum=$(ssh user@$SERVERS_1 /usr/bin/md5sum "$entry" | awk '{print $1}')
if [ -z $remote_md5sum ] ; then 
    remote_md5sum=$(ssh user@$SERVERS_2 /usr/bin/md5sum "$entry" | awk '{print $1}')
fi

Solution 3

Here is a quick and easy way to do it (all commands running on "machine2" as "user1"):

[user1@machine2]$ cd /home/user1/src

[user1@machine2]$ ssh user1@machine1 "cd src;find . -type f -exec md5sum {} \;" | md5sum --check | grep -v "OK"

Change directories to match your scenario.

I picked this up somewhere on the net in 2007. It helps.

Share:
9,369
david
Author by

david

Updated on September 18, 2022

Comments

  • david
    david almost 2 years

    I am running my below shell script on machineC which gets the md5sum of my files in my PRIMARY directory in machineC itself.

    #!/bin/bash
    
    export PRIMARY=/data01/primary
    
    for entry in "$PRIMARY"/*
    do
        local_md5sum=$(/usr/bin/md5sum "$entry" | awk '{print $1}')
        echo $local_md5sum
    done
    

    As soon as I run above shell script then it prints out the md5sum of my files in machineC and it is working fine.

    Now same file for which I am calculating the md5sum can be either in machineA or machineB as well so I need to do ssh on machineA and machineB and do the same md5sum on the same file and store it in remote_md5sum variable.

    If the file is not there in machineA, then it should be there in machineB for sure and the files will be in this directory in machineA and machineB

    /bat/test/data/snapshot/20140918
    

    So I got below shell script which I am running on machineC and is also trying to find the md5sum of the files on machineA or machineB

    #!/bin/bash
    
    # folder in machineC
    export PRIMARY=/data01/primary
    
    readonly SERVERS=(machineA machineB)
    export SERVERS_1=${SERVERS[0]}
    export SERVERS_2=${SERVERS[1]}
    
    export FILES_LOCATION=/bat/test/data/snapshot/20140918  
    
    for entry in "$PRIMARY"/*
    do
        # find local md5sum on machineC
        local_md5sum=$(/usr/bin/md5sum "$entry" | awk '{print $1}')
        echo $local_md5sum
    
        # find remote md5sum of the file which will be on machineA or machineB
        remote_md5sum=$(ssh user@$SERVERS_1 /usr/bin/md5sum "$entry" | awk '{print $1}' || ssh bullseye@$SERVERS_2 /usr/bin/md5sum "$entry" | awk '{print $1}')
        echo "Remote Checksum: $remote_md5sum"
    
        # now compare local_md5sum and remote_md5sum
    done
    

    But whenever I run above shell script my ssh command fails and it doesn't store md5sum value of that file in remote_md5sum. Is there anything wrong in this syntax?

    remote_md5sum=$(ssh user@$SERVERS_1 /usr/bin/md5sum "$entry" | awk '{print $1}' || ssh user@$SERVERS_2 /usr/bin/md5sum "$entry" | awk '{print $1}')
    
  • david
    david almost 10 years
    I think you misunderstood the question. I am not comparing the md5sum of the files in machineA and machineB. FileA will be there in machineA or machineB, and I need to compare this md5sum with local md5sum.
  • david
    david almost 10 years
    Thanks a lot Totor. Can you explain what does basename do here?
  • Ramesh
    Ramesh almost 10 years
    @user2809564, please see the updated answer and let me know if it is fine.
  • Ramesh
    Ramesh almost 10 years
    @Halosghost, thanks for the edit. I appreciate it. :)
  • Totor
    Totor almost 10 years
    @user2809564 You're welcome. basename takes a full /path/to/filename and gives you the filename only. The "opposite" tool is dirname which keeps the path only.