- small set of example files, extracted text, tokenized, token-hashes, and complete hashes (in this repo) - package moz's simhash-cpp for debian