Commit graph

54 commits

Author SHA1 Message Date
Martin Sumner
1be55fcd15
Make tree compatible with binary L1 (#451)
The old leveled_tictac had a pure binary L1.  this was slower than the new map version.

However, in a Riak cluster, when running a merge_tree_range during a rolling update, the fold the query coordinator will initiate a tree.  If this tree is not a map-based tree (as that node has not yet been upgraded), then a node that has been upgraded would previously fail the query as it cannot handle a level 1 in a binary form.  This now enables updated nodes to handle both forms of trees.

Obviously, if the coordinating node has been updated non-updated nodes will crash queries as they cannot handle the tree with the map at Level 1.  The aim is to make it configurable to force non-map trees in a cluster, until all nodes have been upgraded.  So as long as each node understands how to update both non-map trees and map-based trees - evrything should be OK.
2024-09-18 10:24:16 +01:00
Martin Sumner
54e3096020
Switch to logger (#442)
* Switch to logger

Use logger rather than io:format when logging.  The ct tests have besn switched to log to file, testutil/init_per_suite/1 may offer useful guidance on configuring logger with leveled.

As all logs are produced by the leveled_log module, the MFA metadata is uninteresting for log outputs, but can be used for explicit filter controls for leveled logs.

* iolist_to_binary not unicode_binary()

logger filters will be error and be removed if the format line is a binary().  Must be either a charlist() or a unicode_binary() - so iolist_to_binary() can't be used

* Add metadata for filter

* Update test/end_to_end/tictac_SUITE.erl

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

---------

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
2024-09-06 11:18:24 +01:00
Martin Sumner
d45356a4f7
Extend perf_SUITE (#434)
* Extend perf_SUITE

This is v6 of the perf_SUITE tests.  The test adds a complex index entry to every object, and then adds a new test phase to test regex queries.

There are three profiles added so the full, mini and profiling versions of perf_SUITE can be run without having to edit the file itself:

e.g. ./rebar3 as perf_mini do ct --suite=test/end_to_end/perf_SUITE

When testing as `perf_prof` summarised versions of the eprof results are now printed to screen.

The volume of keys within the full test suite has been dropped ... just to make life easier so that test run times are not excessively increase by the new features.

* Load chunk in spawned processes

Assume to make the job of gs easier - name makes a massive difference to load time in OTP 24.

* Correctly account for pause

alos try and improve test stability by increasing pause

* Add microstate accounting to profile

* Add memory tracking during test phases

Identify and log out memory usage by test phase

* Use macros instead (#437)

* Don't print memory to screen in standard ct test

---------

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
2024-07-15 20:49:21 +01:00
Martin Sumner
d544db5461
Mas d31 i413 (#415)
* Allow snapshots to be reused in queries

Allow for a full bookie snapshot to be re-used for multiple queries, not just KV fetches.

* Reduce log noise

The internal dummy tag is expected so should not prompt a log on reload

* Snapshot should have same status of active db

wrt head_only and head_lookup

* Allow logging to specified on snapshots

* Shutdown snapshot bookie is primary goes down

Inker and Penciller already will shut down based on `erlang:monitor/2`

* Review feedback

Formatting and code readability fixes
2023-11-08 09:18:01 +00:00
Martin Sumner
0333604fd9 Change to cast in inker/iclerk interaction
This allows for leveled_iclerk:clerk_stop to be a sync call, so that files will only be closed once the iclerk has stopped.  This is designed ot prevent iclerk crashes during shutdowns when files it is depnding on are closed mid shutdown.
2019-01-24 21:32:54 +00:00
Martin Sumner
e9fb893ea0 Check segment is as expected with tuplebuckets
In head_only mode
2018-11-05 10:31:15 +00:00
Martin Sumner
71fa1447e0 Allow for all keys head folds to used modifed range
This helps with kv_index_tictcatree with the leveled_so backend.  Now this cna do folds over ranges of keys with modified filters (as folds over ranges of keys must go over lal keys if the backend is segment_ordered)
2018-11-01 17:30:18 +00:00
Martin Sumner
f0208e9b12 Fix issues with deprecated folders
They were deprecated for a reason
2018-10-31 11:04:23 +00:00
Martin Sumner
2e2c35fe1b Extract deprecated recent_aae
Ready to add other forms of last modified filtering
2018-10-29 15:49:50 +00:00
Martin Sumner
c439e4144a Add new book_headonly/4 API
To address special situation of performing a head requets in head_only mode - where a sub-key is a required input.
2018-09-20 12:08:33 +01:00
Russell Brown
b7bd65d11f Provide a top level API for folds
As the fold functions have been added to get_runner in an ad hoc way,
naturally, given the ongoing development of levelEd to support Riak,
it was difficult for a new user (in this case Quviq) to see what folds
are supported, and with what arguments, and expectations.

This PR is for discussion. It is one of many ways to group, spec, and
document the fold functions.

A test is also added for coverage of range queries.
2018-09-06 15:01:54 +01:00
Martin Sumner
aedeb0c934 Add support for with_lookup head_only
head_only mode cna be run with_lookup - but there is no L0 index created in this case.

So the L0 index wasn't returning a potition list and the L0 cache wasn't being checked.

Code now checks every  position in the L0 cache, when  a lookup is attempted in head_only mode.
2018-06-23 15:15:49 +01:00
Martin Sumner
090e414b23 Coverage issues
Not making proxy object so get_size not required.

Extend tests to improve coverage
2018-02-16 20:27:49 +00:00
Martin Sumner
70dfb77088 Optional lookup in head_only mode
Allow decision to be made on startup whether ObjectSpecs can be looked up directly when running in head_only mode.
2018-02-16 17:06:30 +00:00
Martin Sumner
910ccb6072 Add lookup support in head_only mode
Originally had disabled the ability to lookup individual values when running in head_only mode.  This is a saving of about 11% at PUT time (about 3 microseconds  per PUT) on a macbook.

Not sure this saving is sufficient enought to justify the extra work if this is used as an AAE Keystore with Bitcask and LWW (when we need to lookup the current value before adjusting).

So reverted to re-adding support for HEAD requests with these keys.
2018-02-16 14:16:28 +00:00
Martin Sumner
2b6281b2b5 Initial head_only features
Initial commit to add head_only mode to leveled.  This allows leveled to receive batches of object changes, but where those objects exist only in the Penciller's Ledger (once they have been persisted within the Ledger).

The aim is to reduce significantly the cost of compaction.  Also, the objects ar enot directly accessible (they can only be accessed through folds).  Again this makes life easier during merging in the LSM trees (as no bloom filters have to be created).
2018-02-15 16:14:46 +00:00
Martin Sumner
c8ad39b33b foldheads_bybucket adds segment list support
Accelerate queries for foldheads_bybucket as well
2017-11-01 22:00:12 +00:00
Martin Sumner
6bb7ceef0c Attempt to standardise on segment hashes
To allow for the segment has that accelerates queries to be re-used in tictac tree related queries.
2017-10-30 13:57:41 +00:00
Martin Sumner
f89e2cf1f1 Improve test coverage 2017-10-17 22:06:30 +01:00
Martin Sumner
0c5f5cdb65 Add key range to fold_heads queries 2017-10-06 15:02:14 +01:00
Martin Sumner
61724cfedb Merge branch 'master' into mas-riakaae-impl-2 2017-09-28 13:23:29 +01:00
Martin Sumner
389694b11b Add exportable option to tictac
Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary.  So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.
2017-09-26 22:49:40 +01:00
Martin Sumner
dfab33e8da Add smaller trees
The "small" tree will serialise to 1.5MB - which seems large.  Much smaller trees seem to be more suitable for things like recently modified aae indexes.
2017-09-25 13:07:08 +01:00
Martin Sumner
53ddc8950b Add tests using fold_heads
Comparing the inbuilt tictac_tree fold, to using "proper" abstraction and achieving the same thing through fold_heads.

The fold_heads method is slower (a lot more manipulation required in the fold) - expect it to require > 2 x CPU.

However, this does give the flexibility to change the hash algorithm.  This would allow for a fold over a database of AAE trees (where the hash has been pre-computed using sha) to be compared with a fold over a database of leveled backends.

Also can vary whether the fold_heads checks for presence of the object in the Inker.  So normally we can get the speed advantage of not checking the Journal for presence, but periodically we can.
2017-08-07 10:45:41 +01:00
Martin Sumner
dd20132892 Add test with fold_heads
Build the AAE tree equally using fold_heads.  This is a pre-cursor to running this within Riak.

In part this leans on some of the work done to improve standard Riak AAE with leveled.  When rebuilding the standard AAE store only the head is required, and so this process was switched in riak_kv_sweeper to make a fold_heads request if supported by the backend.

The head response is a proxy object, which when loaded into a riak_object will allow for access to object metadata, but will use the passed function if access to object contents is requested.
2017-08-05 16:43:03 +01:00
Heinz N. Gies
25389893cf Add compatibility for old and new random / rand functions 2017-08-01 11:24:12 +02:00
Martin Sumner
8748fef28c Add extra second to sleep
Sleep for just one more second to resolve intermittent failure
2017-08-01 00:14:31 +01:00
Martin Sumner
65fd029ca6 typo - backlist/blacklist 2017-07-11 12:25:06 +01:00
martinsumner
80fd2615f6 Implement blacklist/whitelist
Change from the all/whitelist ebhavior to the blacklist/whitelist
behaviour documented in the write-up
2017-07-11 11:44:01 +01:00
martinsumner
3105656d2e Add test descriptions and further documentation 2017-07-06 15:40:30 +01:00
martinsumner
0d72b353fe Add test of expiry of nrt aae terms 2017-07-04 13:29:40 +01:00
martinsumner
439bf8c3b8 Add bucket whitelist test 2017-07-04 10:55:53 +01:00
Martin Sumner
1af9ac56dc Revert passing Bucket
Bad edit.  Reverted
2017-07-03 19:06:41 +01:00
martinsumner
97fdd36d53 Returning bucket when bucket is all
Need to know {Bucket, Key} not just Key if all buckets are being covered
by nrt aae.  So shoehorning this in - will also allow for proper use of
FilterFun when filtering by partition.
2017-07-03 18:03:13 +01:00
martinsumner
d0a825a145 Extend test to detect keys
When comparing recent changes demonstration the detection of the keys
which have changed with a follow-up query
2017-07-03 10:33:34 +01:00
martinsumner
52ca0e4b6c Test expansion
Detect a recent difference
2017-07-02 19:33:18 +01:00
martinsumner
da53808e2e Extend test beyond restart
Prove that recency check still works after a restart
2017-07-01 08:24:58 +01:00
martinsumner
a15c046887 Re-introduce commented tests 2017-06-30 16:31:48 +01:00
martinsumner
954995e23f Support for recent AAE index
With basic ct test.

Doesn't currently prove expiry of index.  Doesn't prove ability to find
segments.

Assumes that either "all" buckets or a special list of buckets require
indexing this way.  Will lead to unexpected results if the same bucket
name is used across different Tags.

The format of the index has been chosen so that hopeully standard index
features can be used (e.g. return_terms).
2017-06-30 16:31:22 +01:00
martinsumner
f81a4bca0d Revert "WIP - Recent Modifications"
This reverts commit bc19a05d83a02d7ec03771657df85b33acc6cfee.
2017-06-27 16:25:18 +01:00
martinsumner
9fca17d56a WIP - Recent Modifications
Just some initial WIP code for this.  Will revisit this again after
exploring some ideas as to how to reduce the cost of the
get_keys_by_segment.

The overlal idea is that there are trees of recent modifications, with
recent being some rolling time window made up of hourly blocks, and
recency being dtermined by the last-modified date on the object metadata
- which should be conistent across a cluster.

So if we were at 15:30 we would get the tree for 14:00 - 15:00 and the
tree for 15:00-16:00 from two different queries which cover the same
partitions and then compare.

Comparison may find differences, and we know what segment the difference
is in - but how to then find all keys in that segment which have been
modified in the period?  Three ways:

Do it inefficeintly and infrequently using a fold_keys and a filter
(perhaps with SST files having a highest LMD in the metadata so that
they can be skipped).
Add a special index, where verye entry has a TTL, and the Key is
{$segment, Segment, Bucket, Key}  so that a normal 2i query cna be used.
Align hashing for segments with hashing for penciller lookup so that a
query over the actual keys cna be optimised skipping chunks of the
in-memory part, and chunks of the SST file
2017-06-27 16:25:18 +01:00
Martin Sumner
e938eaa153 Add close to test 2017-06-23 16:51:28 +01:00
Martin Sumner
99131320c5 Broken test log 2017-06-23 15:20:24 +01:00
martinsumner
25a5065edd Re-introduce test (again) 2017-06-23 14:56:32 +01:00
martinsumner
5e9e1347c7 Add test to find {term, key} that represents difference
Not just detect existence of difference, but clarify what that
difference that is.
2017-06-23 14:55:49 +01:00
martinsumner
2be4422e47 Re-add test 2017-06-23 12:44:52 +01:00
martinsumner
4e5c3e2f64 Fix merge
Fix typo in merge,  and extra validation step to unit tests to prevent
it returning.
2017-06-23 12:32:37 +01:00
martinsumner
47655dc9c7 Uncomment previous test 2017-06-22 14:30:14 +01:00
martinsumner
5a012ff8a6 Add test of index comparison
Compare two indexes for consistency
2017-06-22 13:54:51 +01:00
martinsumner
7cfa392b6e Flexible TicTacTree sizes
Allow tictac tree sizes to be flexible.

Tested lots of different sizes.  Having both level 1 and level 2 the
same size seemed to be consistently quicker than trying to make either
of the levels relatively wider.

There's an 8% performance improvement if the SegmentCount is reduced by
a quarter.
2017-06-20 10:58:13 +01:00