Murmurhash3 and the Bloom Filter

There is often a need to speed things up by generating a unique hash via a bloom filter. Bloom filters are arrays of bits to which objects are mapped using a set of hashing functions.

Using some knowledge from creating a hashing trick algorithm, I came across the idea of deploying murmurhash3 in scala to improve performance and generate a fairly unique key. My immediate case, is to avoid collecting millions of duplicate pages from a large online scrape of a single source.

More information is available from this great blog article as I am a bit busy to get into details.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s