Murmurhash3 and the Bloom Filter

There is often a need to speed things up by generating a unique hash via a bloom filter. Bloom filters are arrays of bits to which objects are mapped using a set of hashing functions.

Using some knowledge from creating a hashing trick algorithm, I came across the idea of deploying murmurhash3 in scala to improve performance and generate a fairly unique key. My immediate case, is to avoid collecting millions of duplicate pages from a large online scrape of a single source.

More information is available from this great blog article as I am a bit busy to get into details.