Velvet Star Monitor

Standout celebrity highlights with iconic style.

news

How to find / list all unique files across two directories?

Writer Matthew Martinez

There's a great tool, fdupes, for finding duplicate files across two (or more) directories.

I'm looking for a simple tool/command that can output the complementary set - the paths of those files that do not have a duplicate.

4 Answers

find DIR1 DIR2 -type f -exec sha1sum '{}' \+ | sort | \ uniq -c --check-chars 40 | egrep '^ *1 ' | cut -c 51-
1

jdupes gained the option you're looking for in June 2020, available in v1.17.0 or higher.

Try this (-r = recurse, -u = only print files that didn't match any other files, a.k.a. "uniques"):

jdupes -ru dir1/ dir2/

As an alternative that will keep the partial matching of jdupes (much faster than doing a full hash)

jdupes -r -T -T "$DIR1" "$DIR2" | awk -v p1="$DIR1" -v p2="$DIR2" '$0 ~ p1 && $0 ~ p2' RS="\n\n" ORS="\n\n" | sort > dupes
find "$DIR1 $DIR2" -type f | sort > all
comm -23 all dupes

I create two lists: one with duplicated files one with all files. The awk statement ensures that the dupes files only contains files that appear both in $DIR1 AND $DIR2 (--isolate does something else apparently)

Then I use the comm utility to compare them for files that are unique to only the all list (not the dupes list). The only caveat is that jdupes only checks the first 4K of each file, which may not be enough to ensure a unique match, so for personal use I bumped that up to 1MB and it still runs very fast.

Note: If you only want files unique to $DIR1 then remove the $DIR2 search in the find command.

1

I once had the same problem with finding these unique files and I did NOT want to checksum them (because they were too large and too many), so I wrote a script based on the filename and filesize:

isolated-files.py --source folder1 --target folder2

This will show any files (recursively) within folder2 which are not in folder1 (also recursively). It can also be used on SSH connections and with multiple folders.

see

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy