Duplicate file checker

Description

dup - Duplicate file checker, using MD5 message digests to identity duplicated files.

Description

Duplicate file checker, using MD5 message digests to identity duplicated files.

The name of the INDEX must be specified. A new index will be created if it does not exist already. Note that the index is read into memory for each operation: collate operations on multiple files into one call if possible. No performance tests have been run, so there is no advice as to how large the index can realisticly be. It is currently used to index a collection of 10,000 files.

Commands are:

add an item to the index
remove an item from the index
find matching items in the index

Options are:

-l Each of the FILEs listed contains a list of files to process.
-q Screen display should be limited to essential information.

Remarks:: Implemented by dup.cpp.

Format of the Index File

A very simple format is used for the index file:

digest in hex-string format on a line
the name of each matching file on a line
a blank line terminating the list of names for that digest
... this is repeated until
blank digest terminating the end of file - no more digests.

13fe625700d47a6f9ab20a47de5a22ea
dup.cpp

1ebe001b770e8b4d06439e0b4564a667
test_md5.cpp

Example of Use

$ ./dup index add *.cpp
Unable to read the digest index, will create one
Added dup.cpp
Added test_md5.cpp

$ cp dup.cpp dup.dup

$ ./dup index find dup.dup
Found duplicate:
   *dup.dup
    dup.cpp

$ ./dup index remove *.cpp
Removed dup.cpp
Removed test_md5.cpp

Duplicate file checker

Description

Synopsis

Description

Format of the Index File

Example of Use