As descibed in https://github.com/martinsumner/leveled/issues/92
Only the first fix was made.
Just to eb safe - archiving means renaming to another file with a different extension. Assumption is that renamed files cna be manually reaped if necessary.
Because there's no sensible way of using it if objects are mutable - you still end up with the same false positives in the tictactree.
Didn't fully rollback the change as spec and docs were added which chould be useful going forward.
This is an interim stage towwards enhancing the proxy object so that it contains more helper information (other than size).
The aim is to be able to run more efficient fold_heads queries that might filter on LMD range (so as not to have to co-ordinate the running of comparative queries). For example if producing a tictactree to compare between two different offsets, a max LMD could be passed in so that changes beyond the time the first query was requested can be ignored.
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary. So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
Need to know {Bucket, Key} not just Key if all buckets are being covered
by nrt aae. So shoehorning this in - will also allow for proper use of
FilterFun when filtering by partition.
With basic ct test.
Doesn't currently prove expiry of index. Doesn't prove ability to find
segments.
Assumes that either "all" buckets or a special list of buckets require
indexing this way. Will lead to unexpected results if the same bucket
name is used across different Tags.
The format of the index has been chosen so that hopeully standard index
features can be used (e.g. return_terms).
Remove the ets index in pmem and use a binary index instead. This may
be slower, but avoids the bulk upload to ets, and means that matches
know of position (so only skiplists with a match need be tried).
Also stops the discrepancy between snapshots and non-snapshots - as
previously the snapshots were always slowed by not having access to the
ETS table.
The full riak metadata had been stripped from the Ledger update for
performance reasons. However, the full metadata is required in order to
save a GET before a PUT. Therefore we want to do isolated testing on
this change to establish the relative cost value in that cost saving.
It is expensive on the CPU - but it leads to a 4 x increase in the cache
coverage.
Try and make some small micro gains in list handling in create_block
Move to using the DJ Bernstein Magic Hash consistently, and trying to
make sure we only hash once for each operation (as the hash is more
expensive than phash2).
The improved lookup time for missing keys should allow for the L0 index
to be removed, and hence speed up the completion time for push_mem
operations.
It is expected there will be a second stage of creating a tinybloom as
part of the SFT creation process, and then adding that tinybloom to the
manifest. This will then reduce the message passing required for a GET
not in the cache or higher levels
Change the extract of Riak metadata.
In Riak-based volume tests hte writing of SFT files is tanking. Could
this be the "extra" metadata. i.e. There are only current plans to look
at the vclock. Sibling count is free to fetch, what if we just get
these two items, will it be less CPU to extract the metadata, but also
will the reduced weight reduce the downstream impact?
Firts part of adding support for scanning for Keys and Hashes. as part
of this discovered TTL support did the opposite (only fetched things in
the past!).
Test added for the "retain" recovery strategy. This strategy makes sure
a full history of index changes is made so that if the Ledger is wiped
out, the Ledger cna be fully rebuilt from the Journal.
This exposed two journal compaction problems
- The BestRun selected did not have the source files correctly sorted in
order before compaction
- The compaction process incorrectly dealt with the KeyDelta object
left after a compaction - i.e. compacting twice the same key caused that
key history to be lost.
These issues have now been corrected.
The no_hash option in CDB files became too hard to manage, in particular
the need to scan the whole file to find the last_key rather than cheat
and use the index. It has been removed for now.
The writing to the journal during journal compaction has now been
enhanced by a mput option on the CDB file write - so it can write each
batch as one pwrite operation.