fdedup is a cmdline file deduplication program for unix filesystems. fdedup is highly inspired by rdfind but aims to better support deduplication of backups that rely on hardlinks.
When fdedup is run it will first scan user supplied paths and collect all found files. Then a duplicate file search algorithm similar to rdfind will be executed in order to determine which files are identical (based on file hash). At the end, fdedup can print the summary of found duplicate files, or perform the actual deduplication, replacing duplicates by either hardlinks or symlinks.
fdedup is written in rust
and relies on a cargo
package manager. You
can use either cargo
commands to build and install fdedup or use provided
Makefile
make install
You can control installation paths via PREFIX
, DESTDIR
variables.
Search for duplicates in home directory and save results to ~/dups.txt
$ fdedup --output ~/dups.txt ~/
Find duplicates and replace them by symlinks in home directory ignoring noncritical I/O errors
$ fdedup --action symlink --sloppy ~/
Deduplicate backup drive /mnt/backup
$ fdedup --action hardlink /mnt/backup
I have directly compared performance of fdedup to rdfind
in -removeidentinode true
mode on my backup HDD with dropped caches.
Both fdedup to rdfind display exactly the same runtime (~20min) within
statistical error and fdedup uses slightly less RAM overall. However, in the
-removeidentinode true
mode rdfind fails to handle the existing hardlinks
correctly. Running rdfind in the -removeidentinode false
mode makes it
handle hardlinks properly, but increases its runtime by a factor of 6, making
fdedup a clear winner in this case.
This is an alpha software. I have tested it only on the ext4
filesystem.
Use at your own risk.