Backup media scrub
From KCLUG Wiki
From Bash and Command line One-Liners
- For every file in the old backup media, if ANY file in the live heirarchy has the same contents (even if it's been renamed or moved), delete the copy on the old media.
- If there's no matching live file for a file on the backup media, print that filename to the screen so that the admin can investigate.
- Diffing every file on one drive against every file on another drive 'could' do this, but would take impossibly long. Instead, this oneliner uses sha1sum to make an index of all live files.
- The hash of all files on the old backup media is grep'd for in that index, and if a match is found, it's deleted from the old backup media
- /mnt/live is the live, production environment in this example
- /mnt/old is the old archive media in this example
- /mnt/live.sha1 is the sha1 index we create of all files in /mnt/live
find /mnt/live/ -type f | while read livefile ; do sha1sum "${livefile}"; done > /mnt/live.sha1; find /mnt/old/ -type f | while read oldfile ; do oldhash=`sha1sum "${oldfile}" | sed s/\ \ .*//`; grep -q ${oldhash} /mnt/live.sha1 ; if [[ $? == 0 ]]; then rm -rf "${oldfile} ; else echo NO MATCH FOR ${oldfile}; fi; done
Or written nicely:
#For each file in live:
find /mnt/live/ -type f | while read livefile
do
#Get it's sha1sum.
sha1sum "${livefile}"
done > /mnt/live.sha1
#Append it to an index file.
#For each file in old:
find /mnt/old/ -type f | while read oldfile
do
#Get it's sha1sum.
oldhash=`sha1sum "${oldfile}" | sed s/\ \ .*//`
#Search for it in live's index file.
grep -q ${oldhash} /mnt/live.sha1
if [[ $? == 0 ]]
then
#If a match was found, delete the file from 'old'.
rm -rf "${oldfile}
else
#If no match was found, say something about it.
echo NO MATCH FOR ${oldfile}
fi
done

