Commit graph

168 commits

Author SHA1 Message Date
Martin Sumner
61b7be5039 Make compression algorithm an option
Compression can be switched between LZ4 and zlib (native).

The setting to determine if compression should happen on receipt is now a macro definition in leveled_codec.
2017-11-06 15:54:58 +00:00
Martin Sumner
99428d0e55 Remove erroneously added file 2017-11-03 14:26:51 +00:00
Martin Sumner
0ecb83f8ec Remove eroneously added files 2017-11-03 14:26:18 +00:00
Martin Sumner
9fa8ed6cca Add LZ4 2017-11-03 14:18:49 +00:00
Martin Sumner
4fbb770a8c Revert "Failed attempt to hack in LZ4"
This reverts commit 912920a53c.
2017-11-03 11:47:25 +00:00
Martin Sumner
912920a53c Failed attempt to hack in LZ4 2017-11-03 11:47:00 +00:00
Martin Sumner
36264eb416 Search range failure
Discovered a bug with search ranges in leveled_tree - this was uncovered by an intermittently fialing 19.3 test.

Test case added and bug fixed.  It was due to a fialure to use end_key passed causing issues with particular manifests and full bucket ranges.
2017-10-24 13:19:30 +01:00
Martin Sumner
f89e2cf1f1 Improve test coverage 2017-10-17 22:06:30 +01:00
Martin Sumner
bfaed921e6 Split code for folders - introduce runner actor
Introduce a dedicated module for all the different fold types.  Also simplify the list of folders by deprecating those folds that should eb achieveable by fold_heads/fold_objects type folds but with smarter functions.

Makes sure that the fold functiosn also have better spec coverage, and are dialyzer checked.
2017-10-17 20:39:11 +01:00
Martin Sumner
0c5f5cdb65 Add key range to fold_heads queries 2017-10-06 15:02:14 +01:00
Martin Sumner
61724cfedb Merge branch 'master' into mas-riakaae-impl-2 2017-09-28 13:23:29 +01:00
Martin Sumner
3950942da3 Roll in fix for intermittently failing test
As descibed in https://github.com/martinsumner/leveled/issues/92

Only the first fix was made.

Just to eb safe - archiving means renaming to another file with a different extension.  Assumption is that renamed files cna be manually reaped if necessary.
2017-09-27 23:52:49 +01:00
Martin Sumner
389694b11b Add exportable option to tictac
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary.  So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
2017-09-26 22:49:40 +01:00
Martin Sumner
dfab33e8da Add smaller trees
The "small" tree will serialise to 1.5MB - which seems large.  Much smaller trees seem to be more suitable for things like recently modified aae indexes.
2017-09-25 13:07:08 +01:00
Martin Sumner
eba21f49fa Make tests compatible with OTP 16
this required a switch to change the sync strategy based on rebar parameter.

However tests could be slow on macbook with OTP16 and sync - so timeouts added in unit tests, and ct tests sync_startegy changed to not sync for OTP16.
2017-09-15 15:10:04 +01:00
Martin Sumner
869e799b41 Fix tests
Obviously got totally messed up and confused when testing previous
commits.

Multiple tests were failing for a change which got merged in as the
tests were not reflecting the required API.
2017-09-15 10:33:16 +01:00
Martin Sumner
53ddc8950b Add tests using fold_heads
Comparing the inbuilt tictac_tree fold, to using "proper" abstraction and achieving the same thing through fold_heads.

The fold_heads method is slower (a lot more manipulation required in the fold) - expect it to require > 2 x CPU.

However, this does give the flexibility to change the hash algorithm.  This would allow for a fold over a database of AAE trees (where the hash has been pre-computed using sha) to be compared with a fold over a database of leveled backends.

Also can vary whether the fold_heads checks for presence of the object in the Inker.  So normally we can get the speed advantage of not checking the Journal for presence, but periodically we can.
2017-08-07 10:45:41 +01:00
Martin Sumner
dd20132892 Add test with fold_heads
Build the AAE tree equally using fold_heads.  This is a pre-cursor to running this within Riak.

In part this leans on some of the work done to improve standard Riak AAE with leveled.  When rebuilding the standard AAE store only the head is required, and so this process was switched in riak_kv_sweeper to make a fold_heads request if supported by the backend.

The head response is a proxy object, which when loaded into a riak_object will allow for access to object metadata, but will use the passed function if access to object contents is requested.
2017-08-05 16:43:03 +01:00
Heinz N. Gies
38e9b0e80a Add missing uniform/0 function 2017-08-01 11:24:12 +02:00
Heinz N. Gies
25389893cf Add compatibility for old and new random / rand functions 2017-08-01 11:24:12 +02:00
Martin Sumner
8748fef28c Add extra second to sleep
Sleep for just one more second to resolve intermittent failure
2017-08-01 00:14:31 +01:00
Martin Sumner
65fd029ca6 typo - backlist/blacklist 2017-07-11 12:25:06 +01:00
martinsumner
80fd2615f6 Implement blacklist/whitelist
Change from the all/whitelist ebhavior to the blacklist/whitelist
behaviour documented in the write-up
2017-07-11 11:44:01 +01:00
martinsumner
3105656d2e Add test descriptions and further documentation 2017-07-06 15:40:30 +01:00
martinsumner
0d72b353fe Add test of expiry of nrt aae terms 2017-07-04 13:29:40 +01:00
martinsumner
439bf8c3b8 Add bucket whitelist test 2017-07-04 10:55:53 +01:00
Martin Sumner
1af9ac56dc Revert passing Bucket
Bad edit.  Reverted
2017-07-03 19:06:41 +01:00
martinsumner
97fdd36d53 Returning bucket when bucket is all
Need to know {Bucket, Key} not just Key if all buckets are being covered
by nrt aae.  So shoehorning this in - will also allow for proper use of
FilterFun when filtering by partition.
2017-07-03 18:03:13 +01:00
martinsumner
d0a825a145 Extend test to detect keys
When comparing recent changes demonstration the detection of the keys
which have changed with a follow-up query
2017-07-03 10:33:34 +01:00
Martin Sumner
fd84e4f608 Test timeouts
So that coverage testing will run.
2017-07-02 22:23:02 +01:00
martinsumner
52ca0e4b6c Test expansion
Detect a recent difference
2017-07-02 19:33:18 +01:00
martinsumner
da53808e2e Extend test beyond restart
Prove that recency check still works after a restart
2017-07-01 08:24:58 +01:00
martinsumner
a15c046887 Re-introduce commented tests 2017-06-30 16:31:48 +01:00
martinsumner
954995e23f Support for recent AAE index
With basic ct test.

Doesn't currently prove expiry of index.  Doesn't prove ability to find
segments.

Assumes that either "all" buckets or a special list of buckets require
indexing this way.  Will lead to unexpected results if the same bucket
name is used across different Tags.

The format of the index has been chosen so that hopeully standard index
features can be used (e.g. return_terms).
2017-06-30 16:31:22 +01:00
martinsumner
8da8722b9e Add temporary aae index
Pending ct tests.  The aae index should expire after limit_minutes and
be on an index which is rounded to unit_minutes.
2017-06-30 10:03:36 +01:00
martinsumner
8e7aaf0ee7 Correct testutil to understand riak_extract_metadata
Change, but change not reflected in tets code
2017-06-27 17:11:13 +01:00
martinsumner
f81a4bca0d Revert "WIP - Recent Modifications"
This reverts commit bc19a05d83a02d7ec03771657df85b33acc6cfee.
2017-06-27 16:25:18 +01:00
martinsumner
9fca17d56a WIP - Recent Modifications
Just some initial WIP code for this.  Will revisit this again after
exploring some ideas as to how to reduce the cost of the
get_keys_by_segment.

The overlal idea is that there are trees of recent modifications, with
recent being some rolling time window made up of hourly blocks, and
recency being dtermined by the last-modified date on the object metadata
- which should be conistent across a cluster.

So if we were at 15:30 we would get the tree for 14:00 - 15:00 and the
tree for 15:00-16:00 from two different queries which cover the same
partitions and then compare.

Comparison may find differences, and we know what segment the difference
is in - but how to then find all keys in that segment which have been
modified in the period?  Three ways:

Do it inefficeintly and infrequently using a fold_keys and a filter
(perhaps with SST files having a highest LMD in the metadata so that
they can be skipped).
Add a special index, where verye entry has a TTL, and the Key is
{$segment, Segment, Bucket, Key}  so that a normal 2i query cna be used.
Align hashing for segments with hashing for penciller lookup so that a
query over the actual keys cna be optimised skipping chunks of the
in-memory part, and chunks of the SST file
2017-06-27 16:25:18 +01:00
Martin Sumner
e938eaa153 Add close to test 2017-06-23 16:51:28 +01:00
Martin Sumner
99131320c5 Broken test log 2017-06-23 15:20:24 +01:00
martinsumner
25a5065edd Re-introduce test (again) 2017-06-23 14:56:32 +01:00
martinsumner
5e9e1347c7 Add test to find {term, key} that represents difference
Not just detect existence of difference, but clarify what that
difference that is.
2017-06-23 14:55:49 +01:00
martinsumner
2be4422e47 Re-add test 2017-06-23 12:44:52 +01:00
martinsumner
4e5c3e2f64 Fix merge
Fix typo in merge,  and extra validation step to unit tests to prevent
it returning.
2017-06-23 12:32:37 +01:00
martinsumner
47655dc9c7 Uncomment previous test 2017-06-22 14:30:14 +01:00
martinsumner
5a012ff8a6 Add test of index comparison
Compare two indexes for consistency
2017-06-22 13:54:51 +01:00
martinsumner
7cfa392b6e Flexible TicTacTree sizes
Allow tictac tree sizes to be flexible.

Tested lots of different sizes.  Having both level 1 and level 2 the
same size seemed to be consistently quicker than trying to make either
of the levels relatively wider.

There's an 8% performance improvement if the SegmentCount is reduced by
a quarter.
2017-06-20 10:58:13 +01:00
martinsumner
d5b4cb844f Finding keys
Progresses from a segment list to scanning for the keys in that segment
2017-06-19 18:38:55 +01:00
martinsumner
8203487a11 Expanded test
ct testing of tictac trees now compares between differently partitioned
stores.
2017-06-19 15:43:19 +01:00
Martin Sumner
833c7a80cb corrected test
differing object was in wrong bucket
2017-06-19 13:11:43 +01:00