More entropy by using the position index with the segment hash - so this would be a better filter to apply.
Also could increase the key count now, as extra hash can be larger.
As an aside - a leveled_iclerk unit test failure appeared - the range was just wrong. Don't know why this strated happening
Discovered a bug with search ranges in leveled_tree - this was uncovered by an intermittently fialing 19.3 test.
Test case added and bug fixed. It was due to a fialure to use end_key passed causing issues with particular manifests and full bucket ranges.
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.
Also split the hash into two 16-bit integers. We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits. the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.
Performance testing has been difficult so far - I think due to “cloud”
mysteries.
Introduce a dedicated module for all the different fold types. Also simplify the list of folders by deprecating those folds that should eb achieveable by fold_heads/fold_objects type folds but with smarter functions.
Makes sure that the fold functiosn also have better spec coverage, and are dialyzer checked.
When new trees are initialised they are started with 1 byte binaries at Level2 - and become full-size following a merge or add event.
The idea is that when trees are distributed before they are added to, or when over-sized trees are used - the output may be smaller on the network.
As descibed in https://github.com/martinsumner/leveled/issues/92
Only the first fix was made.
Just to eb safe - archiving means renaming to another file with a different extension. Assumption is that renamed files cna be manually reaped if necessary.
Because there's no sensible way of using it if objects are mutable - you still end up with the same false positives in the tictactree.
Didn't fully rollback the change as spec and docs were added which chould be useful going forward.
This is an interim stage towwards enhancing the proxy object so that it contains more helper information (other than size).
The aim is to be able to run more efficient fold_heads queries that might filter on LMD range (so as not to have to co-ordinate the running of comparative queries). For example if producing a tictactree to compare between two different offsets, a max LMD could be passed in so that changes beyond the time the first query was requested can be ignored.
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary. So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
The "small" tree will serialise to 1.5MB - which seems large. Much smaller trees seem to be more suitable for things like recently modified aae indexes.
this required a switch to change the sync strategy based on rebar parameter.
However tests could be slow on macbook with OTP16 and sync - so timeouts added in unit tests, and ct tests sync_startegy changed to not sync for OTP16.
Comparing the inbuilt tictac_tree fold, to using "proper" abstraction and achieving the same thing through fold_heads.
The fold_heads method is slower (a lot more manipulation required in the fold) - expect it to require > 2 x CPU.
However, this does give the flexibility to change the hash algorithm. This would allow for a fold over a database of AAE trees (where the hash has been pre-computed using sha) to be compared with a fold over a database of leveled backends.
Also can vary whether the fold_heads checks for presence of the object in the Inker. So normally we can get the speed advantage of not checking the Journal for presence, but periodically we can.
Need to know {Bucket, Key} not just Key if all buckets are being covered
by nrt aae. So shoehorning this in - will also allow for proper use of
FilterFun when filtering by partition.