Can md5sum check based on just the filename, not the file path
Matthew Barrera
Is it possible to have an md5 checksum file (as generated from somethng like md5sum -r * > checklist.chk where each line contains just the hash and the filename of the file (and not the path from the current directory to the file)?
I have two large directory trees that I am looking to compare, except the second directory tree has a different structure as i use it more often and have been slowly rearranging things over time. I am curious if it is possible to have md5sum check all the files in the first directory tree to see if they have the same filename and hash as some file, somewhere in the second directory tree.
Most of the posts i have found so far don't seem to touch on this use case where file paths dont matter.
Essentially, i want to be able to do this:
- open second directory tree and generate a checksum list (using something like
md5sum -r * > checklist.chkexcept each line only contains the file's hash and name (without the path) - open the first directory tree and go through every file and verifying its hash against the checksum list from step 1 to determine whether or not they exist in the second directory tree.
2 Answers
The most automatic way that springs to mind is read man find xargs md5sum sort uniq and do something like this untested code:
find sourcedir oldsourcedir -type f -print0 | \ xargs -r -0 md5sum | \ sort | uniq -c -w 32 | \ sort -nr | tee md5sums.txt | \ less After some more searching, I seem to have found a script that seems to do pretty much exactly this:
#!/bin/bash
#for merging dir1 into dir2
maindir=$(pwd)
dir1="$1"
d1checkfilename="dir1.flatchecksum"
dir2="$2"
d2checkfilename="dir2.flatchecksum"
resultfile="result.txt"
create_flat_checksum () { local currentdir="$1" local checksumfile="$2" # input needs a leading ./ touch "$checksumfile" echo "now checksumming $currentdir" local subdirs=$(find "$currentdir" -type d) # echo "$subdirs" while read -r line; do echo "$line" cd "$line" #run the checksum and redirect errors to stderr (ignoring them) local result=$(md5sum * 2> /dev/null) # append the results of this directories checksum to the main file echo "$result" >> "$maindir/$checksumfile" # cd back up to current dir for next iteration # echo "$maindir/${currentdir:2}" # pwd # echo "$maindir/${currentdir:2}" cd "$maindir" done <<< "$subdirs"
}
# create flat checksum files for both directories
create_flat_checksum "$dir1" "$d1checkfilename"
create_flat_checksum "$dir2" "$d2checkfilename"
comm -23 <(sort $d1checkfilename) <(sort $d2checkfilename) > "$resultfile"