30 May 2014

Integrating Integrity, part 2

Last post, I made a perl script to generate MD5 checksums for me, while displaying a progress bar. Now I want to expand its functionality to generate a .md5sum file listing the md5 for everything in a given directory, or check all the files in the list to see if their actual md5 matches the 'correct' one. I will also set things up so that any checksum mismatches or other errors are reported at the end of the run so that they aren't pushed off the terminal's scrollback buffer when working with a large list of files.

02 May 2014

Integrating Integrity, part 1

One of the most important parts of any backup solution is being able to identify when files have become corrupt due to a failing disk. Ideally, we'd be able to identify impending failure before the excrement hits the rotational cooling device, but we don't always have that luxury. I intend to cover things like S.M.A.R.T. disk checks in a later post; for today, I want to address per-file integrity checking. Because the only thing worse than having no backup is having a backup of the already-corrupt data.

md5sum has been my go-to tool for this in the past. The checksums it generates are slightly better than the old CRC32 method, and it's ubiquitous. While it is important to realise it is not a cryptographically secure checksum and cannot protect against malicious tampering, it is a very effective way to check a file for damage. However, while the venerable md5sum command works perfectly fine, I really want to make my own version with a few improvements that I find myself wanting.