For the times that I need to find duplicate files on my linux server and delete them, I go through the following procedure.
md5sum
This command calculates and outputs the md5 checksum of a file. If two files are the same, they will have the same hash.
To, get the md5sum of a file, simply do the following:
# md5sum example.php # 312e9f7d1d6600989f0d1ac8c72f1de7 example.php
In the above, 312e9f7d1d6600989f0d1ac8c72f1de7 is the md5 hash of the example.php file.
Now, find all files with the exact md5 hash, and store their filenames in a file.
# find /home/ -type f -exec md5sum {} + | grep 312e9f7d1d6600989f0d1ac8c72f1de7 | awk '{ print $2 }' > duplicates.txt
With the above code, we are finding all files that have the md5 hash 312e9f7d1d6600989f0d1ac8c72f1de7, and outputting the second column (which is the filename) to a file called duplicates.txt
Now we loop through the duplicates.txt file, and delete each file one by one.
for f in $(cat duplicates.txt); do rm -f $f; done;