WIP - Recent Modifications

Just some initial WIP code for this.  Will revisit this again after
exploring some ideas as to how to reduce the cost of the
get_keys_by_segment.

The overlal idea is that there are trees of recent modifications, with
recent being some rolling time window made up of hourly blocks, and
recency being dtermined by the last-modified date on the object metadata
- which should be conistent across a cluster.

So if we were at 15:30 we would get the tree for 14:00 - 15:00 and the
tree for 15:00-16:00 from two different queries which cover the same
partitions and then compare.

Comparison may find differences, and we know what segment the difference
is in - but how to then find all keys in that segment which have been
modified in the period?  Three ways:

Do it inefficeintly and infrequently using a fold_keys and a filter
(perhaps with SST files having a highest LMD in the metadata so that
they can be skipped).
Add a special index, where verye entry has a TTL, and the Key is
{$segment, Segment, Bucket, Key}  so that a normal 2i query cna be used.
Align hashing for segments with hashing for penciller lookup so that a
query over the actual keys cna be optimised skipping chunks of the
in-memory part, and chunks of the SST file
This commit is contained in:
martinsumner 2017-06-26 13:26:08 +01:00
parent fde9af28dd
commit 9fca17d56a
2 changed files with 77 additions and 4 deletions

View file

@ -67,7 +67,7 @@ many_put_compare(_Config) ->
{ok, Bookie3} = leveled_bookie:book_start(StartOpts3),
lists:foreach(fun(ObjL) -> testutil:riakload(Bookie3, ObjL) end, CLs),
% Now run a tictac query against both stores to see th extent to which
% Now run a tictac query against both stores to see the extent to which
% state between stores is consistent
TicTacQ = {tictactree_obj,