Checksum vs. byte-by-byte comparison


When checking if files are identical, AcuteFinder uses CRC32 or MD5 checksums. It means that each file that is possibly a duplicate of another, is read from the beginning to the end, and a unique number calculated from its contents. This number is stored and used to compare this file's contents to other files to determine if they are truly identical. This method calculates a long integer (32 bits) from the file and is generally considered to be very accurate, since the files also have to be of the same size. CRC32 is the standard checksum used in ZIP-archives. If you choose to use MD5 checksum (128 bits) then it is even more reliable, and you can rest assured that the files are really identical. The possibility of two different files being considered identical, when using CRC32 is about 1 in 4,294,967,296. Using MD5 checksum this figure is 1 in 3,40282 x 1038 i.e. almost astronomical.

 

Byte-by-byte comparison of files on the other hand, means that the contents of two files to be compared, is read byte-by-byte (in parallel) and checked for equality. AcuteFinder does not offer this option, as some of our competitors do, since it is a much slower method than using checksums. The reason is that some files need to be read more than once, if many candidates exist. Obviously, this method is not the most appropriate for huge files, like multimedia pictures, video and sound. On the plus side, this method is 100% accurate.