this was previously not na issue as leveled_codec:segment_hash/1 would handle anyhting that could be hashed. This now has to be a tuple, and one with a first element - so corrupted tuples are failing.
Add a guard chekcing for a corrupted tuple, but we only need this when doing journal compaction.
Change user_defined keys to be `retain` as a tag strategy
This allows for all fold functions to throw an exception to exit out of a fold with all dependencies still closed down as expected.
This was previously available for key folds, which was necessary for the folds to work in Riak (as max_results in index queries depends one xiting the fold with an exception). This change now adds a ct test, and adds support for head folds, object folds (key order) and object folds (sqn order)
Adds support with test for tuplebuckets in Riak keys.
This exposed that there was no filter using the seglist on the in-mmemory keys. This means that if there is no filter applied in the fold_function, many false positives may emerge.
This is probably not a big performance benefit (and indeed for performance it may be better to apply during the leveled_pmem:merge_trees).
Some thought still required as to what is more likely to contribute to future bugs: an extra location using the hash matching found in leveled_sst, or the extra results in the query.
Acc in response is now of form {Reason, Acc} not just Acc so that the application can understand the reason for the results ending - and take appropriate action (e.g. restart again from the LastKey to return more results).
To support max_keys and the last modified date range.
This applies the last modified date check on all ledger folds. This is hard to avoid, but ultimately a very low cost.
The limit on the number of heads to fold, is the limit based on passing to the accumulator - not on the limit being added to the accumulator. So if the FoldFun perfoms a filter (e.g. for the preflist), then those filtered results will still count towards the maximum.
There needs to be someway at the end of signalling from the fold if the outcome was or was not 'constrained' by max_keys - as the fold cannot simply tel by lenght checking the outcome.
Note this is used rather than length checking the buffer and throwing a 'stop_fold' message when the limit is reached. The choice is made for simplicity, and ease of testing. The throw mechanism is necessary if there is a need to stop parallel folds across the the cluster - but in this case the node_worker_pool will be used.
Externally to leveled_sst all folds are actually managed through exapnd_list_by_pointer.
Make the API a bit clearer in this regards, and add specs to help dialyzer.
This also adds LowLastMod to the API for expanding pointers (although the leveled_penciller just defaults this to 0 for everything.
Queries that in Riak will be based on fold_keys need to be able to catch throws, and re-throw them to be detected by the worker (whilst still clearing up the snapshot)
During EQC testing it was found that snapshots are still usable even
if the bookie process crashes. This change has snapshots monitor the
bookie and close when the bookie process dies.
head_only mode cna be run with_lookup - but there is no L0 index created in this case.
So the L0 index wasn't returning a potition list and the L0 cache wasn't being checked.
Code now checks every position in the L0 cache, when a lookup is attempted in head_only mode.
Interetsingly setting max_pencillercachesize to a non-integer merely had the impact of making the penciller cache size infinite.
So a guard added to make sure it is an integer going forward.
The IMM iterator should not be reused, as it has already been filtered for a query. so if reused for a different query incorrect and unexpected results may occur.
This reuse had been stopped by a previous commit, and this cleans up subsequently unused code.
Initial commit to add head_only mode to leveled. This allows leveled to receive batches of object changes, but where those objects exist only in the Penciller's Ledger (once they have been persisted within the Ledger).
The aim is to reduce significantly the cost of compaction. Also, the objects ar enot directly accessible (they can only be accessed through folds). Again this makes life easier during merging in the LSM trees (as no bloom filters have to be created).
Previouslythe tinybloom was used within the SST file as an extra check to remove false fetches.
However the SST already has a low FPR check in the slot_index. If the newebloom was used (which is no longer per slot, but per sst), this can be shared with the penciller and then the penciller could use it and avoid the message pass.
the message pass may be blocked by a 2i query or a slot fetch request for a merge. So this should make performance within the Penciller snappier.
This is as a result of taking sst_timings within a volume test - where there was an average of + 100microsecs for each level that was dropped down. Given the bloom/slot checks were < 20 microsecs - there seems to be some further delay.
The bloom is a binary of > 64 bytes - so passing it around should not require a copy.
Compression can be switched between LZ4 and zlib (native).
The setting to determine if compression should happen on receipt is now a macro definition in leveled_codec.
Initially with basic tests. If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm.
This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID.
This also required a slight tweak to the penciller keyfolder. It now caches the next answer from the SSTiter, rather than restart the iterator. When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.
Discovered a bug with search ranges in leveled_tree - this was uncovered by an intermittently fialing 19.3 test.
Test case added and bug fixed. It was due to a fialure to use end_key passed causing issues with particular manifests and full bucket ranges.
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.
Also split the hash into two 16-bit integers. We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits. the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.
Performance testing has been difficult so far - I think due to “cloud”
mysteries.