Velvet Star Monitor

Standout celebrity highlights with iconic style.

news

Can md5sum check based on just the filename, not the file path

Writer Matthew Barrera

Is it possible to have an md5 checksum file (as generated from somethng like md5sum -r * > checklist.chk where each line contains just the hash and the filename of the file (and not the path from the current directory to the file)?

I have two large directory trees that I am looking to compare, except the second directory tree has a different structure as i use it more often and have been slowly rearranging things over time. I am curious if it is possible to have md5sum check all the files in the first directory tree to see if they have the same filename and hash as some file, somewhere in the second directory tree.

Most of the posts i have found so far don't seem to touch on this use case where file paths dont matter.

Essentially, i want to be able to do this:

  1. open second directory tree and generate a checksum list (using something like md5sum -r * > checklist.chk except each line only contains the file's hash and name (without the path)
  2. open the first directory tree and go through every file and verifying its hash against the checksum list from step 1 to determine whether or not they exist in the second directory tree.
2

2 Answers

The most automatic way that springs to mind is read man find xargs md5sum sort uniq and do something like this untested code:

find sourcedir oldsourcedir -type f -print0 | \ xargs -r -0 md5sum | \ sort | uniq -c -w 32 | \ sort -nr | tee md5sums.txt | \ less

After some more searching, I seem to have found a script that seems to do pretty much exactly this:

#!/bin/bash
#for merging dir1 into dir2
maindir=$(pwd)
dir1="$1"
d1checkfilename="dir1.flatchecksum"
dir2="$2"
d2checkfilename="dir2.flatchecksum"
resultfile="result.txt"
create_flat_checksum () { local currentdir="$1" local checksumfile="$2" # input needs a leading ./ touch "$checksumfile" echo "now checksumming $currentdir" local subdirs=$(find "$currentdir" -type d) # echo "$subdirs" while read -r line; do echo "$line" cd "$line" #run the checksum and redirect errors to stderr (ignoring them) local result=$(md5sum * 2> /dev/null) # append the results of this directories checksum to the main file echo "$result" >> "$maindir/$checksumfile" # cd back up to current dir for next iteration # echo "$maindir/${currentdir:2}" # pwd # echo "$maindir/${currentdir:2}" cd "$maindir" done <<< "$subdirs"
}
# create flat checksum files for both directories
create_flat_checksum "$dir1" "$d1checkfilename"
create_flat_checksum "$dir2" "$d2checkfilename"
comm -23 <(sort $d1checkfilename) <(sort $d2checkfilename) > "$resultfile"

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy