Provide a function for generating segmentfilter lists so that it can handle trees that are "too small".
Test those smaller trees - plus also false positives and cold caches
Note that accelerating segment_list queries will not work for tree sizes smaller than small. How to flag this up?
Should smaller tree sizes just be removed from leveled_tictac?
Initially with basic tests. If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm.
This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID.
This also required a slight tweak to the penciller keyfolder. It now caches the next answer from the SSTiter, rather than restart the iterator. When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.
When leveled is used with Riak, buckets and keys are always binaries. So we can treat them as such.
Want to move tictac tree testing away from the leveled internal tests, to a set of tests for the Riak scenario. so riak_SUITE created for this and other riak-specific backend tests.
Use 4 keys in the bloom (which is closer to optimal size). This should halve the fpr - as we cna now use the large ExtraHash rather than being constrained by the SegmentHash here.
More entropy by using the position index with the segment hash - so this would be a better filter to apply.
Also could increase the key count now, as extra hash can be larger.
As an aside - a leveled_iclerk unit test failure appeared - the range was just wrong. Don't know why this strated happening
Discovered a bug with search ranges in leveled_tree - this was uncovered by an intermittently fialing 19.3 test.
Test case added and bug fixed. It was due to a fialure to use end_key passed causing issues with particular manifests and full bucket ranges.
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.
Also split the hash into two 16-bit integers. We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits. the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.
Performance testing has been difficult so far - I think due to “cloud”
mysteries.
Introduce a dedicated module for all the different fold types. Also simplify the list of folders by deprecating those folds that should eb achieveable by fold_heads/fold_objects type folds but with smarter functions.
Makes sure that the fold functiosn also have better spec coverage, and are dialyzer checked.
When new trees are initialised they are started with 1 byte binaries at Level2 - and become full-size following a merge or add event.
The idea is that when trees are distributed before they are added to, or when over-sized trees are used - the output may be smaller on the network.