Commit graph

66 commits

Author SHA1 Message Date
Martin Sumner
6af1d3b003 Use more keys in bloom
Use 4 keys in the bloom (which is closer to optimal size).  This should halve the fpr - as we cna now use the large ExtraHash rather than being constrained by the SegmentHash here.
2017-10-24 15:42:53 +01:00
Martin Sumner
29a2d9fc35 Revert "Use lower fpr tinyblooms"
This reverts commit 3fd5260cd9.
2017-10-24 15:16:25 +01:00
Martin Sumner
3fd5260cd9 Use lower fpr tinyblooms
... but maybe they're slower?
2017-10-24 15:15:15 +01:00
Martin Sumner
a128dcdadf Change hash algorithm for penciller
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.

Also split the hash into two 16-bit integers.  We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits.  the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.

Performance testing has been difficult so far - I think due to “cloud”
mysteries.
2017-10-20 23:04:29 +01:00
Martin Sumner
61724cfedb Merge branch 'master' into mas-riakaae-impl-2 2017-09-28 13:23:29 +01:00
Martin Sumner
3950942da3 Roll in fix for intermittently failing test
As descibed in https://github.com/martinsumner/leveled/issues/92

Only the first fix was made.

Just to eb safe - archiving means renaming to another file with a different extension.  Assumption is that renamed files cna be manually reaped if necessary.
2017-09-27 23:52:49 +01:00
Martin Sumner
433cc37eb6 Rolled back LMD in metadata
Because there's no sensible way of using it if objects are mutable - you still end up with the same false positives in the tictactree.

Didn't fully rollback the change as spec and docs were added which chould be useful going forward.
2017-09-27 12:26:12 +01:00
Martin Sumner
2e5b9c80f4 Add max LMD to Riak metadata
This is an interim stage towwards enhancing the proxy object so that it contains more helper information (other than size).

The aim is to be able to run more efficient fold_heads queries that might filter on LMD range (so as not to have to co-ordinate the running of comparative queries).  For example if producing a tictactree to compare between two different offsets, a max LMD could be passed in so that changes beyond the time the first query was requested can be ignored.
2017-09-27 12:15:18 +01:00
Martin Sumner
389694b11b Add exportable option to tictac
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary.  So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
2017-09-26 22:49:40 +01:00
Martin Sumner
2f9afa1469 Add support for performing a magic hash on a binary
Ignore unnecessray term_to_binary if already binary.  This will be useful when we use magic_hash in tictac_trees we wish to be exportable.
2017-09-26 16:32:59 +01:00
Heinz N. Gies
25389893cf Add compatibility for old and new random / rand functions 2017-08-01 11:24:12 +02:00
martinsumner
80fd2615f6 Implement blacklist/whitelist
Change from the all/whitelist ebhavior to the blacklist/whitelist
behaviour documented in the write-up
2017-07-11 11:44:01 +01:00
martinsumner
97fdd36d53 Returning bucket when bucket is all
Need to know {Bucket, Key} not just Key if all buckets are being covered
by nrt aae.  So shoehorning this in - will also allow for proper use of
FilterFun when filtering by partition.
2017-07-03 18:03:13 +01:00
martinsumner
52ca0e4b6c Test expansion
Detect a recent difference
2017-07-02 19:33:18 +01:00
martinsumner
954995e23f Support for recent AAE index
With basic ct test.

Doesn't currently prove expiry of index.  Doesn't prove ability to find
segments.

Assumes that either "all" buckets or a special list of buckets require
indexing this way.  Will lead to unexpected results if the same bucket
name is used across different Tags.

The format of the index has been chosen so that hopeully standard index
features can be used (e.g. return_terms).
2017-06-30 16:31:22 +01:00
martinsumner
8da8722b9e Add temporary aae index
Pending ct tests.  The aae index should expire after limit_minutes and
be on an index which is rounded to unit_minutes.
2017-06-30 10:03:36 +01:00
martinsumner
ebef27f021 Extract Last Modified Date from Riak Object
As part of process to supporting a recent changes index for
near-real-time anti-entropy
2017-06-27 16:25:18 +01:00
martinsumner
c586b78f45 Initial code with busted ct test
Initiat comparison made betwene trees externally - but ct test is bust.
2017-06-19 11:36:57 +01:00
Martin Sumner
7642aac2cc Change Riak object hash approach
Change the riak object hash being kept in the metadata, to being a hash
of the vector clock
2017-06-16 10:14:24 +01:00
martinsumner
400f65f557 Switch to binary metadata
Trya nd maintain binary format when stored in Ledger so less
swapping/changing as added and removed.
2017-04-04 10:02:35 +00:00
martinsumner
6143fcb664 Remove binary_to_term
when fetching don't need to binary_to_term key changes
2017-03-29 15:37:04 +01:00
martinsumner
f108871691 Vclock metadata change
Test performance ocntinues to be worse since the vlock metadata change.
Reversing out juts in case.
2017-03-21 18:15:56 +00:00
Martin Sumner
eec9d509f9 Add back hash performance tests
Need to consider if magic hash is an issue
2017-03-20 20:28:47 +00:00
martinsumner
7154815a2b Keep vclock as binary
No obvious, need at present for vlock to be a term within leveled
2017-03-20 20:28:02 +00:00
martinsumner
f3ffa920af Trying to standardise binary manipulation of value
Looking into theory that use of term_to_binary is imperfect.  Also may
be better to compress values only when they are compacted?
2017-03-20 15:43:54 +00:00
martinsumner
508da0be45 Additional unit tests 2017-03-14 22:47:48 +00:00
martinsumner
8a5ed1e198 Confirm skip on unknowns when compacting journal 2017-03-14 17:26:39 +00:00
martinsumner
5311a157d5 Merge remote-tracking branch 'refs/remotes/origin/mas-sstblock-i42' into mas-sstblockv2-i42 2017-03-13 19:22:41 +00:00
martinsumner
c787e0cd78 Handle corrupted Ledger Key when applying recovery strategy
Otherwise may blow up in journal_compaction_bustedjournal test
2017-03-13 14:32:46 +00:00
martinsumner
b2f3d882a9 Draft of branch to condense range_only keys 2017-03-10 20:43:37 +00:00
martinsumner
a9101e4781 SibCount must be non-zero 2017-02-26 22:45:20 +00:00
martinsumner
90c920fe86 Additional unit test work
Reverts a previous ct test fix
2017-01-23 15:15:40 +00:00
martinsumner
76bdd83346 Manifest refactor - STILL BROKEN
Some working tests now, but sitll broken
2017-01-14 16:36:05 +00:00
martinsumner
5a88565c08 Switch to binary index in pmem
Remove the ets index in pmem and use a binary index instead.  This may
be slower, but avoids the bulk upload to ets, and means that matches
know of position (so only skiplists with a match need be tried).

Also stops the discrepancy between snapshots and non-snapshots - as
previously the snapshots were always slowed by not having access to the
ETS table.
2017-01-05 21:58:33 +00:00
martinsumner
2f8ff640a9 Test coverage
Add some furthe runit tests to improve test coverage
2017-01-04 21:36:59 +00:00
martinsumner
060ce2e263 Add put timing points 2016-12-20 23:11:50 +00:00
martinsumner
299e8e6de3 Initial phash test
phash does not appear to be a potential causer of delay
2016-12-20 20:55:56 +00:00
martinsumner
9e28287231 Resolve failing recovery test
Now passing consistently with a number of different corruptions catered
for (including corruption of the Tag in the Inker Key)
2016-12-16 23:18:55 +00:00
martinsumner
f4e2e274e0 Reintroduce riak metadata extraction
The full riak metadata had been stripped from the Ledger update for
performance reasons.  However, the full metadata is required in order to
save a GET before a PUT.  Therefore we want to do isolated testing on
this change to establish the relative cost value in that cost saving.
2016-12-14 10:27:11 +00:00
martinsumner
4b48ed14c6 Correct Mistyped 2 ^ 32 2016-12-11 20:38:20 +00:00
martinsumner
1b63845050 Bring compression back to SFT
It is expensive on the CPU - but it leads to a 4 x increase in the cache
coverage.

Try and make some small micro gains in list handling in create_block
2016-12-11 15:02:33 +00:00
martinsumner
44cee5a6e8 Experiemnt with no compression
Does compression hurt CPU more than the benefit gaine din some cases?
2016-12-11 12:33:09 +00:00
martinsumner
32ac305c67 Compaction test error
Compaction tests now throwing up different corruption points
2016-12-11 06:53:25 +00:00
martinsumner
2d3a40e6f1 Magic Hash - and no L0 Index
Move to using the DJ Bernstein Magic Hash consistently, and trying to
make sure we only hash once for each operation (as the hash is more
expensive than phash2).

The improved lookup time for missing keys should allow for the L0 index
to be removed, and hence speed up the completion time for push_mem
operations.

It is expected there will be a second stage of creating a tinybloom as
part of the SFT creation process, and then adding that tinybloom to the
manifest.  This will then reduce the message passing required for a GET
not in the cache or higher levels
2016-12-11 01:02:56 +00:00
martinsumner
5bdb7fd7fa Alter Riak HEAD
Change the extract of Riak metadata.

In Riak-based volume tests hte writing of SFT files is tanking.  Could
this be the "extra" metadata.  i.e. There are only current plans to look
at the vclock.  Sibling count is free to fetch, what if we just get
these two items, will it be less CPU to extract the metadata, but also
will the reduced weight reduce the downstream impact?
2016-12-08 23:38:50 +00:00
martinsumner
e8c1d39df9 Switch to binary format Riak object
Initial change to try and test assuming that leveled received the binary
format of Riak objects (and parses that for metadata).
2016-11-28 22:26:09 +00:00
martinsumner
6684e8e1d3 Refine query to accept fold functions
Need to be able to pass external fold functions into different queries,
to work as a Riak backend
2016-11-18 15:53:22 +00:00
martinsumner
7147ec0470 Logging - Phase 1
Abstract out logging and introduce a logbase
2016-11-02 18:14:46 +00:00
martinsumner
84a92b5f95 Further testing of compaction
Check we avoid crashing in challenging compaction scenarios
2016-11-01 00:46:14 +00:00
martinsumner
7d3a04428b Refactor snapshot
Better reuse snapshotting fucntions in the Bookie, and use it to support
doing Inker clone checks
2016-10-31 17:26:28 +00:00