How to find / list all unique files across two directories?
Matthew Martinez
There's a great tool, fdupes, for finding duplicate files across two (or more) directories.
I'm looking for a simple tool/command that can output the complementary set - the paths of those files that do not have a duplicate.
4 Answers
find DIR1 DIR2 -type f -exec sha1sum '{}' \+ | sort | \ uniq -c --check-chars 40 | egrep '^ *1 ' | cut -c 51- 1 jdupes gained the option you're looking for in June 2020, available in v1.17.0 or higher.
Try this (-r = recurse, -u = only print files that didn't match any other files, a.k.a. "uniques"):
jdupes -ru dir1/ dir2/ As an alternative that will keep the partial matching of jdupes (much faster than doing a full hash)
jdupes -r -T -T "$DIR1" "$DIR2" | awk -v p1="$DIR1" -v p2="$DIR2" '$0 ~ p1 && $0 ~ p2' RS="\n\n" ORS="\n\n" | sort > dupes
find "$DIR1 $DIR2" -type f | sort > all
comm -23 all dupesI create two lists: one with duplicated files one with all files. The awk statement ensures that the dupes files only contains files that appear both in $DIR1 AND $DIR2 (--isolate does something else apparently)
Then I use the comm utility to compare them for files that are unique to only the all list (not the dupes list). The only caveat is that jdupes only checks the first 4K of each file, which may not be enough to ensure a unique match, so for personal use I bumped that up to 1MB and it still runs very fast.
Note: If you only want files unique to $DIR1 then remove the $DIR2 search in the find command.
1I once had the same problem with finding these unique files and I did NOT want to checksum them (because they were too large and too many), so I wrote a script based on the filename and filesize:
isolated-files.py --source folder1 --target folder2This will show any files (recursively) within folder2 which are not in folder1 (also recursively). It can also be used on SSH connections and with multiple folders.
see