Change hash algorithm for penciller

Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.

Also split the hash into two 16-bit integers.  We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits.  the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.

Performance testing has been difficult so far - I think due to “cloud”
mysteries.
This commit is contained in:
Martin Sumner 2017-10-20 23:04:29 +01:00
parent ede0982b2d
commit a128dcdadf
7 changed files with 75 additions and 62 deletions

View file

@ -254,7 +254,7 @@ generate_randomkeys(Count, Acc, BucketLow, BRange) ->
K = {o, "Bucket" ++ BNumber, "Key" ++ KNumber},
RandKey = {K, {Count + 1,
{active, infinity},
leveled_codec:magic_hash(K),
leveled_codec:segment_hash(K),
null}},
generate_randomkeys(Count - 1, [RandKey|Acc], BucketLow, BRange).