leveled

Author	SHA1	Message	Date
Martin Sumner	c642575caa	Support sub-key queries (#457 ) * Support sub-key queries Also requires a refactoring of types. In head-only mode - the metadata in the ledger is just the value, and the value can be anything. So metadata() definition needs to reflect that. There are then issues with appdefined functions for extracting metadata. In theory an appdefined function could extract some unsopprted type. So made explicit that the appdefined function must extract std_metadata() as metadata - otherwise functionality will not work. This means that if it is an object key, that is not a ?HEAD key, then the Metadata must be a tuple (of either Riak or Standard type). * Fix coverage issues	2024-11-18 14:51:23 +00:00
Martin Sumner	aaeac7ba36	Mas d34 i453 eqwalizer (#454 ) * Add eqwalizer and clear for codec & sst The eqwalizer errors highlighted the need in several places for type clarification. Within tests there are some issue where a type is assumed, and so ignore has been used to handle this rather than write more complex code to be explicit about the assumption. The handling of arrays isn't great by eqwalizer - to be specific about the content of array causes issues when initialising an array. Perhaps a type (map maybe) where one can be more explicit about types might be a better option (even if there is a minimal performance impact). The use of a ?TOMB_COUNT defined option complicated the code much more with eqwalizer. So for now, there is no developer option to disable ?TOMB_COUNT. Test fixes required where strings have been used for buckets/keys not binaries. The leveled_sst statem needs a different state record for starting when compared to other modes. The state record has been divided up to reflect this, to make type management easier. The impact on performance needs to be tested. * Update ct tests to support binary keys/buckets only * Eqwalizer for leveled_cdb and leveled_tictac As array is used in leveled_tictac - there is the same issue as with leveled_sst * Remove redundant indirection of leveled_rand A legacy of pre-20 OTP * Morde modules eqwalized ebloom/log/util/monitor * Eqwalize further modules elp eqwalize leveled_codec; elp eqwalize leveled_sst; elp eqwalize leveled_cdb; elp eqwalize leveled_tictac; elp eqwalize leveled_log; elp eqwalize leveled_monitor; elp eqwalize leveled_head; elp eqwalize leveled_ebloom; elp eqwalize leveled_iclerk All concurrently OK * Refactor unit tests to use binary() no string() in key Previously string() was allowed just to avoid having to change all these tests. Go through the pain now, as part of eqwalizing. * Add fixes for penciller, inker Add a new ?IS_DEF macro to replace =/= undefined. Now more explicit about primary, object and query keys * Further fixes Need to clarify functions used by runner - where keys , query keys and object keys are used * Further eqwalisation * Eqwalize leveled_pmanifest Also make implementation independent of choice of dict - i.e. one can save a manifest using dict for blooms/pending_deletions and then open a manifest with code that uses a different type. Allow for slow dict to be replaced with map. Would not be backwards compatible though, without further thought - i.e. if you upgrade then downgrade. Redundant code created by leveled_sst refactoring removed. * Fix backwards compatibility issues * Manifest Entry to belong to leveled_pmanifest There are two manifests - leveled_pmanifest and leveled_imanifest. Both have manifest_entry() type objects, but these types are different. To avoid confusion don't include the pmanifest manifest_entry() within the global include file - be specific that it belongs to the leveled_pmanifest module * Ignore elp file - large binary * Update src/leveled_pmem.erl Remove unnecessary empty list from type definition Co-authored-by: Thomas Arts <thomas.arts@quviq.com> --------- Co-authored-by: Thomas Arts <thomas.arts@quviq.com>	2024-11-13 13:37:13 +00:00
Martin Sumner	1be55fcd15	Make tree compatible with binary L1 (#451 ) The old leveled_tictac had a pure binary L1. this was slower than the new map version. However, in a Riak cluster, when running a merge_tree_range during a rolling update, the fold the query coordinator will initiate a tree. If this tree is not a map-based tree (as that node has not yet been upgraded), then a node that has been upgraded would previously fail the query as it cannot handle a level 1 in a binary form. This now enables updated nodes to handle both forms of trees. Obviously, if the coordinating node has been updated non-updated nodes will crash queries as they cannot handle the tree with the map at Level 1. The aim is to make it configurable to force non-map trees in a cluster, until all nodes have been upgraded. So as long as each node understands how to update both non-map trees and map-based trees - evrything should be OK.	2024-09-18 10:24:16 +01:00
Martin Sumner	54e3096020	Switch to logger (#442 ) * Switch to logger Use logger rather than io:format when logging. The ct tests have besn switched to log to file, testutil/init_per_suite/1 may offer useful guidance on configuring logger with leveled. As all logs are produced by the leveled_log module, the MFA metadata is uninteresting for log outputs, but can be used for explicit filter controls for leveled logs. * iolist_to_binary not unicode_binary() logger filters will be error and be removed if the format line is a binary(). Must be either a charlist() or a unicode_binary() - so iolist_to_binary() can't be used * Add metadata for filter * Update test/end_to_end/tictac_SUITE.erl Co-authored-by: Thomas Arts <thomas.arts@quviq.com> --------- Co-authored-by: Thomas Arts <thomas.arts@quviq.com>	2024-09-06 11:18:24 +01:00
Martin Sumner	d45356a4f7	Extend perf_SUITE (#434 ) * Extend perf_SUITE This is v6 of the perf_SUITE tests. The test adds a complex index entry to every object, and then adds a new test phase to test regex queries. There are three profiles added so the full, mini and profiling versions of perf_SUITE can be run without having to edit the file itself: e.g. ./rebar3 as perf_mini do ct --suite=test/end_to_end/perf_SUITE When testing as `perf_prof` summarised versions of the eprof results are now printed to screen. The volume of keys within the full test suite has been dropped ... just to make life easier so that test run times are not excessively increase by the new features. * Load chunk in spawned processes Assume to make the job of gs easier - name makes a massive difference to load time in OTP 24. * Correctly account for pause alos try and improve test stability by increasing pause * Add microstate accounting to profile * Add memory tracking during test phases Identify and log out memory usage by test phase * Use macros instead (#437) * Don't print memory to screen in standard ct test --------- Co-authored-by: Thomas Arts <thomas.arts@quviq.com>	2024-07-15 20:49:21 +01:00
Martin Sumner	d544db5461	Mas d31 i413 (#415 ) * Allow snapshots to be reused in queries Allow for a full bookie snapshot to be re-used for multiple queries, not just KV fetches. * Reduce log noise The internal dummy tag is expected so should not prompt a log on reload * Snapshot should have same status of active db wrt head_only and head_lookup * Allow logging to specified on snapshots * Shutdown snapshot bookie is primary goes down Inker and Penciller already will shut down based on `erlang:monitor/2` * Review feedback Formatting and code readability fixes	2023-11-08 09:18:01 +00:00
Martin Sumner	0333604fd9	Change to cast in inker/iclerk interaction This allows for leveled_iclerk:clerk_stop to be a sync call, so that files will only be closed once the iclerk has stopped. This is designed ot prevent iclerk crashes during shutdowns when files it is depnding on are closed mid shutdown.	2019-01-24 21:32:54 +00:00
Martin Sumner	e9fb893ea0	Check segment is as expected with tuplebuckets In head_only mode	2018-11-05 10:31:15 +00:00
Martin Sumner	71fa1447e0	Allow for all keys head folds to used modifed range This helps with kv_index_tictcatree with the leveled_so backend. Now this cna do folds over ranges of keys with modified filters (as folds over ranges of keys must go over lal keys if the backend is segment_ordered)	2018-11-01 17:30:18 +00:00
Martin Sumner	f0208e9b12	Fix issues with deprecated folders They were deprecated for a reason	2018-10-31 11:04:23 +00:00
Martin Sumner	2e2c35fe1b	Extract deprecated recent_aae Ready to add other forms of last modified filtering	2018-10-29 15:49:50 +00:00
Martin Sumner	c439e4144a	Add new book_headonly/4 API To address special situation of performing a head requets in head_only mode - where a sub-key is a required input.	2018-09-20 12:08:33 +01:00
Russell Brown	b7bd65d11f	Provide a top level API for folds As the fold functions have been added to get_runner in an ad hoc way, naturally, given the ongoing development of levelEd to support Riak, it was difficult for a new user (in this case Quviq) to see what folds are supported, and with what arguments, and expectations. This PR is for discussion. It is one of many ways to group, spec, and document the fold functions. A test is also added for coverage of range queries.	2018-09-06 15:01:54 +01:00
Martin Sumner	aedeb0c934	Add support for with_lookup head_only head_only mode cna be run with_lookup - but there is no L0 index created in this case. So the L0 index wasn't returning a potition list and the L0 cache wasn't being checked. Code now checks every position in the L0 cache, when a lookup is attempted in head_only mode.	2018-06-23 15:15:49 +01:00
Martin Sumner	090e414b23	Coverage issues Not making proxy object so get_size not required. Extend tests to improve coverage	2018-02-16 20:27:49 +00:00
Martin Sumner	70dfb77088	Optional lookup in head_only mode Allow decision to be made on startup whether ObjectSpecs can be looked up directly when running in head_only mode.	2018-02-16 17:06:30 +00:00
Martin Sumner	910ccb6072	Add lookup support in head_only mode Originally had disabled the ability to lookup individual values when running in head_only mode. This is a saving of about 11% at PUT time (about 3 microseconds per PUT) on a macbook. Not sure this saving is sufficient enought to justify the extra work if this is used as an AAE Keystore with Bitcask and LWW (when we need to lookup the current value before adjusting). So reverted to re-adding support for HEAD requests with these keys.	2018-02-16 14:16:28 +00:00
Martin Sumner	2b6281b2b5	Initial head_only features Initial commit to add head_only mode to leveled. This allows leveled to receive batches of object changes, but where those objects exist only in the Penciller's Ledger (once they have been persisted within the Ledger). The aim is to reduce significantly the cost of compaction. Also, the objects ar enot directly accessible (they can only be accessed through folds). Again this makes life easier during merging in the LSM trees (as no bloom filters have to be created).	2018-02-15 16:14:46 +00:00
Martin Sumner	c8ad39b33b	foldheads_bybucket adds segment list support Accelerate queries for foldheads_bybucket as well	2017-11-01 22:00:12 +00:00
Martin Sumner	6bb7ceef0c	Attempt to standardise on segment hashes To allow for the segment has that accelerates queries to be re-used in tictac tree related queries.	2017-10-30 13:57:41 +00:00
Martin Sumner	f89e2cf1f1	Improve test coverage	2017-10-17 22:06:30 +01:00
Martin Sumner	0c5f5cdb65	Add key range to fold_heads queries	2017-10-06 15:02:14 +01:00
Martin Sumner	61724cfedb	Merge branch 'master' into mas-riakaae-impl-2	2017-09-28 13:23:29 +01:00
Martin Sumner	389694b11b	Add exportable option to tictac Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary. So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.	2017-09-26 22:49:40 +01:00
Martin Sumner	dfab33e8da	Add smaller trees The "small" tree will serialise to 1.5MB - which seems large. Much smaller trees seem to be more suitable for things like recently modified aae indexes.	2017-09-25 13:07:08 +01:00
Martin Sumner	53ddc8950b	Add tests using fold_heads Comparing the inbuilt tictac_tree fold, to using "proper" abstraction and achieving the same thing through fold_heads. The fold_heads method is slower (a lot more manipulation required in the fold) - expect it to require > 2 x CPU. However, this does give the flexibility to change the hash algorithm. This would allow for a fold over a database of AAE trees (where the hash has been pre-computed using sha) to be compared with a fold over a database of leveled backends. Also can vary whether the fold_heads checks for presence of the object in the Inker. So normally we can get the speed advantage of not checking the Journal for presence, but periodically we can.	2017-08-07 10:45:41 +01:00
Martin Sumner	dd20132892	Add test with fold_heads Build the AAE tree equally using fold_heads. This is a pre-cursor to running this within Riak. In part this leans on some of the work done to improve standard Riak AAE with leveled. When rebuilding the standard AAE store only the head is required, and so this process was switched in riak_kv_sweeper to make a fold_heads request if supported by the backend. The head response is a proxy object, which when loaded into a riak_object will allow for access to object metadata, but will use the passed function if access to object contents is requested.	2017-08-05 16:43:03 +01:00
Heinz N. Gies	25389893cf	Add compatibility for old and new random / rand functions	2017-08-01 11:24:12 +02:00
Martin Sumner	8748fef28c	Add extra second to sleep Sleep for just one more second to resolve intermittent failure	2017-08-01 00:14:31 +01:00
Martin Sumner	65fd029ca6	typo - backlist/blacklist	2017-07-11 12:25:06 +01:00
martinsumner	80fd2615f6	Implement blacklist/whitelist Change from the all/whitelist ebhavior to the blacklist/whitelist behaviour documented in the write-up	2017-07-11 11:44:01 +01:00
martinsumner	3105656d2e	Add test descriptions and further documentation	2017-07-06 15:40:30 +01:00
martinsumner	0d72b353fe	Add test of expiry of nrt aae terms	2017-07-04 13:29:40 +01:00
martinsumner	439bf8c3b8	Add bucket whitelist test	2017-07-04 10:55:53 +01:00
Martin Sumner	1af9ac56dc	Revert passing Bucket Bad edit. Reverted	2017-07-03 19:06:41 +01:00
martinsumner	97fdd36d53	Returning bucket when bucket is all Need to know {Bucket, Key} not just Key if all buckets are being covered by nrt aae. So shoehorning this in - will also allow for proper use of FilterFun when filtering by partition.	2017-07-03 18:03:13 +01:00
martinsumner	d0a825a145	Extend test to detect keys When comparing recent changes demonstration the detection of the keys which have changed with a follow-up query	2017-07-03 10:33:34 +01:00
martinsumner	52ca0e4b6c	Test expansion Detect a recent difference	2017-07-02 19:33:18 +01:00
martinsumner	da53808e2e	Extend test beyond restart Prove that recency check still works after a restart	2017-07-01 08:24:58 +01:00
martinsumner	a15c046887	Re-introduce commented tests	2017-06-30 16:31:48 +01:00
martinsumner	954995e23f	Support for recent AAE index With basic ct test. Doesn't currently prove expiry of index. Doesn't prove ability to find segments. Assumes that either "all" buckets or a special list of buckets require indexing this way. Will lead to unexpected results if the same bucket name is used across different Tags. The format of the index has been chosen so that hopeully standard index features can be used (e.g. return_terms).	2017-06-30 16:31:22 +01:00
martinsumner	f81a4bca0d	Revert "WIP - Recent Modifications" This reverts commit bc19a05d83a02d7ec03771657df85b33acc6cfee.	2017-06-27 16:25:18 +01:00
martinsumner	9fca17d56a	WIP - Recent Modifications Just some initial WIP code for this. Will revisit this again after exploring some ideas as to how to reduce the cost of the get_keys_by_segment. The overlal idea is that there are trees of recent modifications, with recent being some rolling time window made up of hourly blocks, and recency being dtermined by the last-modified date on the object metadata - which should be conistent across a cluster. So if we were at 15:30 we would get the tree for 14:00 - 15:00 and the tree for 15:00-16:00 from two different queries which cover the same partitions and then compare. Comparison may find differences, and we know what segment the difference is in - but how to then find all keys in that segment which have been modified in the period? Three ways: Do it inefficeintly and infrequently using a fold_keys and a filter (perhaps with SST files having a highest LMD in the metadata so that they can be skipped). Add a special index, where verye entry has a TTL, and the Key is {$segment, Segment, Bucket, Key} so that a normal 2i query cna be used. Align hashing for segments with hashing for penciller lookup so that a query over the actual keys cna be optimised skipping chunks of the in-memory part, and chunks of the SST file	2017-06-27 16:25:18 +01:00
Martin Sumner	e938eaa153	Add close to test	2017-06-23 16:51:28 +01:00
Martin Sumner	99131320c5	Broken test log	2017-06-23 15:20:24 +01:00
martinsumner	25a5065edd	Re-introduce test (again)	2017-06-23 14:56:32 +01:00
martinsumner	5e9e1347c7	Add test to find {term, key} that represents difference Not just detect existence of difference, but clarify what that difference that is.	2017-06-23 14:55:49 +01:00
martinsumner	2be4422e47	Re-add test	2017-06-23 12:44:52 +01:00
martinsumner	4e5c3e2f64	Fix merge Fix typo in merge, and extra validation step to unit tests to prevent it returning.	2017-06-23 12:32:37 +01:00
martinsumner	47655dc9c7	Uncomment previous test	2017-06-22 14:30:14 +01:00

1 2

56 commits