Commit graph

1118 commits

Author SHA1 Message Date
Martin Sumner
f80aae7d78 Type typo 2017-10-31 23:35:57 +00:00
Martin Sumner
b141dd199c Allow for segment-acceleration of folds
Initially with basic tests.  If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm.

This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID.

This also required a slight tweak to the penciller keyfolder.  It now caches the next answer from the SSTiter, rather than restart the iterator.   When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.
2017-10-31 23:28:35 +00:00
Martin Sumner
f5878548f9 Make binary Riak bucket/keys a special case
When leveled is used with Riak, buckets and keys are always binaries.  So we can treat them as such.

Want to move tictac tree testing away from the leveled internal tests, to a set of tests for the Riak scenario.  so riak_SUITE created for this and other riak-specific backend tests.
2017-10-30 17:39:21 +00:00
Martin Sumner
6bb7ceef0c Attempt to standardise on segment hashes
To allow for the segment has that accelerates queries to be re-used in tictac tree related queries.
2017-10-30 13:57:41 +00:00
Martin Sumner
7763df3cef Merge pull request #98 from martinsumner/mas-segid-cryptohash
Mas segid cryptohash
2017-10-25 10:02:04 +01:00
Martin Sumner
e24eaf655b Revert to previous standard slot size
But maintain configurability of slot size to maximum
2017-10-25 08:59:34 +01:00
Martin Sumner
a22610cee7 Experiment with alternate slot size
Improves fpr.  Does this change anything in volume tests?
2017-10-24 17:58:33 +01:00
Martin Sumner
6af1d3b003 Use more keys in bloom
Use 4 keys in the bloom (which is closer to optimal size).  This should halve the fpr - as we cna now use the large ExtraHash rather than being constrained by the SegmentHash here.
2017-10-24 15:42:53 +01:00
Martin Sumner
f08faf6432 Revert "Revert "Check fpr with 4 keys""
This reverts commit 74c28b52c9.
2017-10-24 15:22:12 +01:00
Martin Sumner
74c28b52c9 Revert "Check fpr with 4 keys"
This reverts commit d5bcccf0ec.
2017-10-24 15:21:07 +01:00
Martin Sumner
d5bcccf0ec Check fpr with 4 keys
Up key count in bloom
2017-10-24 15:20:59 +01:00
Martin Sumner
29a2d9fc35 Revert "Use lower fpr tinyblooms"
This reverts commit 3fd5260cd9.
2017-10-24 15:16:25 +01:00
Martin Sumner
3fd5260cd9 Use lower fpr tinyblooms
... but maybe they're slower?
2017-10-24 15:15:15 +01:00
Martin Sumner
26aa573ce1 Switch segment and extra hash
More entropy by using the position index with the segment hash - so this would be a better filter to apply.

Also could increase the key count now, as extra hash can be larger.

As an aside - a leveled_iclerk unit test failure appeared - the range was just wrong.  Don't know why this strated happening
2017-10-24 14:32:04 +01:00
Martin Sumner
36264eb416 Search range failure
Discovered a bug with search ranges in leveled_tree - this was uncovered by an intermittently fialing 19.3 test.

Test case added and bug fixed.  It was due to a fialure to use end_key passed causing issues with particular manifests and full bucket ranges.
2017-10-24 13:19:30 +01:00
Martin Sumner
a128dcdadf Change hash algorithm for penciller
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.

Also split the hash into two 16-bit integers.  We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits.  the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.

Performance testing has been difficult so far - I think due to “cloud”
mysteries.
2017-10-20 23:04:29 +01:00
Martin Sumner
ede0982b2d Merge branch 'mas-bloomtest' into mas-segid-cryptohash 2017-10-20 20:47:21 +01:00
Martin Sumner
1964f1055b Add test timeout 2017-10-19 21:44:07 +01:00
Martin Sumner
f38d3fde4b Test frequency change 2017-10-19 13:56:07 +01:00
Martin Sumner
87731a85f5 Loop test 2017-10-19 13:51:32 +01:00
Martin Sumner
ef6df2387d Merge pull request #97 from martinsumner/mas-runner
Mas runner
2017-10-17 23:09:02 +01:00
Martin Sumner
84239955ed Clarify wording 2017-10-17 22:31:11 +01:00
Martin Sumner
f89e2cf1f1 Improve test coverage 2017-10-17 22:06:30 +01:00
Martin Sumner
bfaed921e6 Split code for folders - introduce runner actor
Introduce a dedicated module for all the different fold types.  Also simplify the list of folders by deprecating those folds that should eb achieveable by fold_heads/fold_objects type folds but with smarter functions.

Makes sure that the fold functiosn also have better spec coverage, and are dialyzer checked.
2017-10-17 20:39:11 +01:00
Martin Sumner
d0b8e47f77 Merge pull request #96 from martinsumner/mas-riakaae-impl-2
Mas riakaae impl 2
2017-10-17 09:39:12 +01:00
Martin Sumner
212b08a44d Merge branch 'master' into mas-riakaae-impl-2 2017-10-16 21:14:21 +01:00
Martin Sumner
5c8eea3f0e Extend foldheads_bybucket test
Now explicitly checking key ranges
2017-10-06 15:07:36 +01:00
Martin Sumner
0c5f5cdb65 Add key range to fold_heads queries 2017-10-06 15:02:14 +01:00
Martin Sumner
7912742a84 Add valid_size/1 2017-09-29 17:35:15 +01:00
Martin Sumner
fd4fbf7ea8 Keep trees empty on merge
Done't blow out a tree unnecessarily on merge
2017-09-29 15:28:17 +01:00
Martin Sumner
5e6534fb49 Initialise empty trees
When new trees are initialised they are started with 1 byte binaries at Level2 - and become full-size following a merge or add event.

The idea is that when trees are distributed before they are added to, or when over-sized trees are used - the output may be smaller on the network.
2017-09-29 15:01:16 +01:00
Martin Sumner
4d6f816ab2 Switch back to binary 2017-09-29 11:17:02 +01:00
Martin Sumner
8e3b7baa18 Change encoding - issue with JSON friendliness
Also add compression to reduce penalty on sparse trees
2017-09-29 11:08:37 +01:00
Martin Sumner
556ab5a95a Merge pull request #94 from martinsumner/mas-explain-headgetchoice
Mas explain headgetchoice
2017-09-29 10:31:07 +01:00
Martin Sumner
056e65ff74 Edited for clarity 2017-09-29 10:28:35 +01:00
Martin Sumner
855b1d3ad8 More edits 2017-09-28 20:23:29 +01:00
Martin Sumner
ed9a444805 Initial edits, more info on cache experiment 2017-09-28 20:14:18 +01:00
Martin Sumner
e81a53b539 Initial draft of option comparison
n HEADs and 1 GET

or

1 GET and n-1 HEADs
2017-09-28 20:04:21 +01:00
Martin Sumner
61724cfedb Merge branch 'master' into mas-riakaae-impl-2 2017-09-28 13:23:29 +01:00
Martin Sumner
15a0a6f0f1 Merge pull request #93 from martinsumner/mas-tictac-hashfun
Mas tictac hashfun
2017-09-28 11:05:14 +01:00
Martin Sumner
0f5911ab70 Add unit test of archive files 2017-09-28 10:50:54 +01:00
Martin Sumner
3950942da3 Roll in fix for intermittently failing test
As descibed in https://github.com/martinsumner/leveled/issues/92

Only the first fix was made.

Just to eb safe - archiving means renaming to another file with a different extension.  Assumption is that renamed files cna be manually reaped if necessary.
2017-09-27 23:52:49 +01:00
Martin Sumner
433cc37eb6 Rolled back LMD in metadata
Because there's no sensible way of using it if objects are mutable - you still end up with the same false positives in the tictactree.

Didn't fully rollback the change as spec and docs were added which chould be useful going forward.
2017-09-27 12:26:12 +01:00
Martin Sumner
2e5b9c80f4 Add max LMD to Riak metadata
This is an interim stage towwards enhancing the proxy object so that it contains more helper information (other than size).

The aim is to be able to run more efficient fold_heads queries that might filter on LMD range (so as not to have to co-ordinate the running of comparative queries).  For example if producing a tictactree to compare between two different offsets, a max LMD could be passed in so that changes beyond the time the first query was requested can be ignored.
2017-09-27 12:15:18 +01:00
Martin Sumner
389694b11b Add exportable option to tictac
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary.  So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
2017-09-26 22:49:40 +01:00
Martin Sumner
2f9afa1469 Add support for performing a magic hash on a binary
Ignore unnecessray term_to_binary if already binary.  This will be useful when we use magic_hash in tictac_trees we wish to be exportable.
2017-09-26 16:32:59 +01:00
Martin Sumner
f50a2a19d3 File should not be pushed 2017-09-25 15:57:06 +01:00
Martin Sumner
dfab33e8da Add smaller trees
The "small" tree will serialise to 1.5MB - which seems large.  Much smaller trees seem to be more suitable for things like recently modified aae indexes.
2017-09-25 13:07:08 +01:00
Martin Sumner
69ed945e58 Merge pull request #90 from martinsumner/mas-riakaae-impl-2
Mas riakaae impl 2
2017-09-22 14:19:33 +01:00
Martin Sumner
9730816c38 Merge branch 'master' into mas-riakaae-impl-2 2017-09-22 09:39:32 +01:00