Performance has regressed following the hashtable change. Speculation
that the hashtable format might not be right, and so there is more
cycling around the hashtree. Logging added.
Improved performance by a combination of switching to an ordered_set
(so a list can be extracted in a sane way), and building the binary
from an ordered list.
The attempt to refcator the writer meant that files were never reaching
the max slots - and so we were only ever stopping when the lists were
exhausted. This meant that the merge tree just had a C0 and a C1 file!
Add extra bloom check - but get the SFT process to perform not the chekc
not the Penciller. This avoids complexity of negotiating the transfer
of the bloom to the Penciller - but doesn't avoid the potentially
unecessary message pass between processes.
Previously the code had involved veyr high arity functions which were
hard to follow. This has been simplified somewhat with the addition of
a writer record to make things easier to track, as well as a general
refactoring to better logically seperate the building of things.
This is desirable to add back in going forward, but wasn't implemented
in a safe or clear way.
The way the bloom was or was not on the LoopState was clumsy, and it got
persisted in multiple places without a CRC check.
Intention to implement back in wherby it is requested on-demand by the
Penciller, and then the SFT worker lifts it off disk and CRC checks it.
So it is never on the SFT LoopState. Also it will be easier to control
the logic over which levels have the bloom in the Penciller.
It is expensive on the CPU - but it leads to a 4 x increase in the cache
coverage.
Try and make some small micro gains in list handling in create_block
Move to using the DJ Bernstein Magic Hash consistently, and trying to
make sure we only hash once for each operation (as the hash is more
expensive than phash2).
The improved lookup time for missing keys should allow for the L0 index
to be removed, and hence speed up the completion time for push_mem
operations.
It is expected there will be a second stage of creating a tinybloom as
part of the SFT creation process, and then adding that tinybloom to the
manifest. This will then reduce the message passing required for a GET
not in the cache or higher levels
Hope is that this will cause less garbage collection, and also will be
slightly faster.
Note that snapshots don't now get an index - they get the special index
'snap'. However, the SkipLists have bloom protection, and most
snapshots are iterators not fetchers.
Split out hashtree implementation functions in leveled_cdb to make it
easier to swap this out. Currently using an array of skiplists - may be
better with an ets ordered_set
Change the extract of Riak metadata.
In Riak-based volume tests hte writing of SFT files is tanking. Could
this be the "extra" metadata. i.e. There are only current plans to look
at the vclock. Sibling count is free to fetch, what if we just get
these two items, will it be less CPU to extract the metadata, but also
will the reduced weight reduce the downstream impact?