Commit graph

1115 commits

Author SHA1 Message Date
Martin Sumner
3950942da3 Roll in fix for intermittently failing test
As descibed in https://github.com/martinsumner/leveled/issues/92

Only the first fix was made.

Just to eb safe - archiving means renaming to another file with a different extension.  Assumption is that renamed files cna be manually reaped if necessary.
2017-09-27 23:52:49 +01:00
Martin Sumner
433cc37eb6 Rolled back LMD in metadata
Because there's no sensible way of using it if objects are mutable - you still end up with the same false positives in the tictactree.

Didn't fully rollback the change as spec and docs were added which chould be useful going forward.
2017-09-27 12:26:12 +01:00
Martin Sumner
2e5b9c80f4 Add max LMD to Riak metadata
This is an interim stage towwards enhancing the proxy object so that it contains more helper information (other than size).

The aim is to be able to run more efficient fold_heads queries that might filter on LMD range (so as not to have to co-ordinate the running of comparative queries).  For example if producing a tictactree to compare between two different offsets, a max LMD could be passed in so that changes beyond the time the first query was requested can be ignored.
2017-09-27 12:15:18 +01:00
Martin Sumner
389694b11b Add exportable option to tictac
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary.  So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
2017-09-26 22:49:40 +01:00
Martin Sumner
2f9afa1469 Add support for performing a magic hash on a binary
Ignore unnecessray term_to_binary if already binary.  This will be useful when we use magic_hash in tictac_trees we wish to be exportable.
2017-09-26 16:32:59 +01:00
Martin Sumner
dfab33e8da Add smaller trees
The "small" tree will serialise to 1.5MB - which seems large.  Much smaller trees seem to be more suitable for things like recently modified aae indexes.
2017-09-25 13:07:08 +01:00
Martin Sumner
9730816c38 Merge branch 'master' into mas-riakaae-impl-2 2017-09-22 09:39:32 +01:00
Martin Sumner
eba21f49fa Make tests compatible with OTP 16
this required a switch to change the sync strategy based on rebar parameter.

However tests could be slow on macbook with OTP16 and sync - so timeouts added in unit tests, and ct tests sync_startegy changed to not sync for OTP16.
2017-09-15 15:10:04 +01:00
Martin Sumner
856a64c4d4 Unit tests use wrong query format
Not sure how this happened.  Bad merge?  Just plain sloppiness on my part?  Anyhow, the unit tests were not working ..
2017-09-14 18:06:50 +01:00
Martin Sumner
ed56ef17a1 Make export mochijson friendly 2017-09-13 23:45:48 +01:00
Martin Sumner
d8ca0274f6 Export the import 2017-09-13 22:02:44 +01:00
Martin Sumner
d98534b8f3 Add struct to help with mochijson2
Try and make this more mochijson2 friendly
2017-09-13 21:56:28 +01:00
Martin Sumner
9f97c82d0d Add import/export support
Also fix for fold_heads unit tests to reflect new booleans required by changes to support there use in MapFolds
2017-08-16 16:58:38 +01:00
Martin Sumner
53ddc8950b Add tests using fold_heads
Comparing the inbuilt tictac_tree fold, to using "proper" abstraction and achieving the same thing through fold_heads.

The fold_heads method is slower (a lot more manipulation required in the fold) - expect it to require > 2 x CPU.

However, this does give the flexibility to change the hash algorithm.  This would allow for a fold over a database of AAE trees (where the hash has been pre-computed using sha) to be compared with a fold over a database of leveled backends.

Also can vary whether the fold_heads checks for presence of the object in the Inker.  So normally we can get the speed advantage of not checking the Journal for presence, but periodically we can.
2017-08-07 10:45:41 +01:00
Heinz N. Gies
b319386210 Fix strong_rand to rand_bytes 2017-08-01 11:55:05 +02:00
Heinz N. Gies
bbe763514b Remove uniform_s/2 from old random code 2017-08-01 11:37:18 +02:00
Heinz N. Gies
38e9b0e80a Add missing uniform/0 function 2017-08-01 11:24:12 +02:00
Heinz N. Gies
25389893cf Add compatibility for old and new random / rand functions 2017-08-01 11:24:12 +02:00
Heinz N. Gies
379e33ba84 Cleanup dialyzer errrors in leveled_bookie 2017-07-31 19:58:56 +02:00
Heinz N. Gies
8717d42ffe Cleanup dialyzer errrors in leveled_cdb 2017-07-31 19:55:09 +02:00
Heinz N. Gies
e8ed7954cc Cleanup dialyzer errrors in leveled_iclerk 2017-07-31 19:53:01 +02:00
Heinz N. Gies
eece253222 Cleanup most dialyzer errrors in leveled_inker 2017-07-31 19:47:58 +02:00
Heinz N. Gies
44fd603474 Cleanup dialyzer errrors in leveled_pclerk 2017-07-31 19:41:26 +02:00
Heinz N. Gies
369bdece5f Cleanup dialyzer errrors in leveled_penciller 2017-07-31 19:39:40 +02:00
Heinz N. Gies
858ee9a915 Cleanup dialyzer errrors in leveled_pmanifest 2017-07-31 19:32:06 +02:00
Heinz N. Gies
5e6df539cb Cleanup dialyzer errrors in leveled_sst 2017-07-31 19:30:29 +02:00
martinsumner
80fd2615f6 Implement blacklist/whitelist
Change from the all/whitelist ebhavior to the blacklist/whitelist
behaviour documented in the write-up
2017-07-11 11:44:01 +01:00
martinsumner
97fdd36d53 Returning bucket when bucket is all
Need to know {Bucket, Key} not just Key if all buckets are being covered
by nrt aae.  So shoehorning this in - will also allow for proper use of
FilterFun when filtering by partition.
2017-07-03 18:03:13 +01:00
Martin Sumner
fd84e4f608 Test timeouts
So that coverage testing will run.
2017-07-02 22:23:02 +01:00
martinsumner
52ca0e4b6c Test expansion
Detect a recent difference
2017-07-02 19:33:18 +01:00
martinsumner
954995e23f Support for recent AAE index
With basic ct test.

Doesn't currently prove expiry of index.  Doesn't prove ability to find
segments.

Assumes that either "all" buckets or a special list of buckets require
indexing this way.  Will lead to unexpected results if the same bucket
name is used across different Tags.

The format of the index has been chosen so that hopeully standard index
features can be used (e.g. return_terms).
2017-06-30 16:31:22 +01:00
martinsumner
8da8722b9e Add temporary aae index
Pending ct tests.  The aae index should expire after limit_minutes and
be on an index which is rounded to unit_minutes.
2017-06-30 10:03:36 +01:00
martinsumner
2dd303237b Change XOR 2017-06-28 10:55:54 +01:00
martinsumner
ebef27f021 Extract Last Modified Date from Riak Object
As part of process to supporting a recent changes index for
near-real-time anti-entropy
2017-06-27 16:25:18 +01:00
martinsumner
f81a4bca0d Revert "WIP - Recent Modifications"
This reverts commit bc19a05d83a02d7ec03771657df85b33acc6cfee.
2017-06-27 16:25:18 +01:00
martinsumner
9fca17d56a WIP - Recent Modifications
Just some initial WIP code for this.  Will revisit this again after
exploring some ideas as to how to reduce the cost of the
get_keys_by_segment.

The overlal idea is that there are trees of recent modifications, with
recent being some rolling time window made up of hourly blocks, and
recency being dtermined by the last-modified date on the object metadata
- which should be conistent across a cluster.

So if we were at 15:30 we would get the tree for 14:00 - 15:00 and the
tree for 15:00-16:00 from two different queries which cover the same
partitions and then compare.

Comparison may find differences, and we know what segment the difference
is in - but how to then find all keys in that segment which have been
modified in the period?  Three ways:

Do it inefficeintly and infrequently using a fold_keys and a filter
(perhaps with SST files having a highest LMD in the metadata so that
they can be skipped).
Add a special index, where verye entry has a TTL, and the Key is
{$segment, Segment, Bucket, Key}  so that a normal 2i query cna be used.
Align hashing for segments with hashing for penciller lookup so that a
query over the actual keys cna be optimised skipping chunks of the
in-memory part, and chunks of the SST file
2017-06-27 16:25:18 +01:00
Martin Sumner
fde9af28dd comment test to avoid timeout 2017-06-26 17:08:31 +01:00
martinsumner
4e5c3e2f64 Fix merge
Fix typo in merge,  and extra validation step to unit tests to prevent
it returning.
2017-06-23 12:32:37 +01:00
martinsumner
7cfa392b6e Flexible TicTacTree sizes
Allow tictac tree sizes to be flexible.

Tested lots of different sizes.  Having both level 1 and level 2 the
same size seemed to be consistently quicker than trying to make either
of the levels relatively wider.

There's an 8% performance improvement if the SegmentCount is reduced by
a quarter.
2017-06-20 10:58:13 +01:00
martinsumner
d5b4cb844f Finding keys
Progresses from a segment list to scanning for the keys in that segment
2017-06-19 18:38:55 +01:00
martinsumner
c586b78f45 Initial code with busted ct test
Initiat comparison made betwene trees externally - but ct test is bust.
2017-06-19 11:36:57 +01:00
martinsumner
6ad98d77c5 Spec module for dialyzer
Add specs/docs for the leveled_tictac module.  Dialyzer passes.
2017-06-16 13:47:19 +01:00
martinsumner
f5dd154cee Rename hashtree query
Naming is now confusing now we have TicTac Trees.  This query builds a
list of keys and hashes not a tree - so it was misleading anyaway.  Now
renamed hashlist_query.
2017-06-16 12:38:59 +01:00
Martin Sumner
7642aac2cc Change Riak object hash approach
Change the riak object hash being kept in the metadata, to being a hash
of the vector clock
2017-06-16 10:14:24 +01:00
martinsumner
959e7f932f Add simple merge
Allow for tictac trees to be merged
2017-06-15 16:16:19 +01:00
martinsumner
86b11803c9 Build and compare
Build and compare of tictac trees.  These are mergable merkle trees that
are not cryptographically secure.
2017-06-15 15:40:23 +01:00
Martin Sumner
15c52ae118 Change default compaction settings
Need to allow specific settings to be passed into unit tests.

Also, too much journal compaction may lead to intermittent failures on
the basic_SUITE space_clear_on_delete test.  think this is because
there are less “deletes” to reload in on startup to trigger the cascade
down and clear up?
2017-06-02 08:37:57 +01:00
martinsumner
569b498727 Resolve dialyzer warnings
Botched switch to leveled_log in list - so reoslved dialyzer warnings
2017-06-01 22:03:51 +01:00
martinsumner
32612dfe4a Yet another array type OTP16 issue 2017-06-01 21:39:01 +01:00
martinsumner
afbf918f2c Change from using array type
Won't compile in OTP16
2017-06-01 21:37:23 +01:00