by Alexander Thomas (aka Dr. Lex)
Mail
This simple Bash script makes it easy to repair corrupted chunks in a huge data file without having to copy over the entire file again. It works by comparing checksums across smaller subparts of both the damaged and original files. Only the damaged chunks then need to be transferred again.
This is especially useful when copying the entire file is prohibitively expensive, but you do have access to the original and it is OK to copy smaller chunks across a slow link. By default the script works with chunks of 100 MiB, but a different chunk size can be set with a parameter.
The script is hosted on GitHub. You can download a zip or tarball of the latest release.
Run the script in any way you want, but most convenient is probably to place it somewhere in your shell's PATH, like /usr/local/bin/. Make sure it has executable permissions.
Usage is pretty straightforward:
checksums.txt
file. Copy over this file to where the undamaged original is.-c
and the original file as argument. This compares blocks of the file with the values in checksums.txt
. If a block differs, it extracts it from the original file. The extracted blocks all have file names like BLOCK_100
. It also prints the command to be executed in the next step. The command will also have a truncate argument in case the damaged file somehow is bigger than the original.BLOCK_*
files, as well as the command given in the previous step, back to where the damaged file is. Then run the command with the appropriate file name.This is the built-in help accessible by running with the “-h” switch:
Semi-offline big file repair tool v1.1. For when a huge file has small errors and it is too expensive to pull the whole file across a limited bandwidth link again. Usage: 1. Run on the damaged file: bigfilerepair <big_file> This will produce a 'checksums.txt'. Copy it to where the original file is. 2. Then run at the original location: bigfilerepair -c <original_file> This will extract blocks to be repaired to files named BLOCK_x. It will also print a command to be executed at the other side. 3. Copy over the BLOCK_x files to where the damaged file is. Then run the command given in the previous step to repair the file. Advanced: -s FILE: use checksums file name different from the default 'checksums.txt'. -b MIB: use a different chunk size (MiB) (default is 100 MiB). -i x1,[x2..]: inject chunk file(s) BLOCK_x1 etc. at their positions. -t BYTES: truncate the file to a size of BYTES.
Big File Repair Tool is released under the New BSD License. This software is provided “as is”, without any implied warranty or claim of fitness for a particular purpose. Use of this software is completely at your own risk.