leveled

Author	SHA1	Message	Date
Martin Sumner	cda412508a	IsEmpty check Previously there was no is_empty check, and there was a workaround using binary_bucketlist. But what if there were many buckets - this is a slow seek (using get next key over and over). Instead have a proper is_empty check.	2018-03-21 15:31:00 +00:00
Martin Sumner	ef22aabe85	Alter comment	2018-03-20 11:11:41 +00:00
Martin Sumner	a9a20c9150	Smoother temrinate on destroy don't try and terminate a dead process	2018-03-07 16:20:41 +00:00
Martin Sumner	e29743310d	Make destroy "normal" Put the special actions in the handle_call not the terminate	2018-03-07 16:14:50 +00:00
Martin Sumner	4bf6d3e73d	Fiddle with naming in query API Was easier in the calling applictaion to switch between using and not using a list of the Query format was consistent between those two cases.	2018-03-02 10:20:43 +00:00
Martin Sumner	861aa5a7db	Support multi-query fold Allow a single snapshot to run query over multiple ranges. Used initially to fold over multiple buckets.	2018-03-01 23:19:52 +00:00
Martin Sumner	090e414b23	Coverage issues Not making proxy object so get_size not required. Extend tests to improve coverage	2018-02-16 20:27:49 +00:00
Martin Sumner	70dfb77088	Optional lookup in head_only mode Allow decision to be made on startup whether ObjectSpecs can be looked up directly when running in head_only mode.	2018-02-16 17:06:30 +00:00
Martin Sumner	910ccb6072	Add lookup support in head_only mode Originally had disabled the ability to lookup individual values when running in head_only mode. This is a saving of about 11% at PUT time (about 3 microseconds per PUT) on a macbook. Not sure this saving is sufficient enought to justify the extra work if this is used as an AAE Keystore with Bitcask and LWW (when we need to lookup the current value before adjusting). So reverted to re-adding support for HEAD requests with these keys.	2018-02-16 14:16:28 +00:00
Martin Sumner	2b6281b2b5	Initial head_only features Initial commit to add head_only mode to leveled. This allows leveled to receive batches of object changes, but where those objects exist only in the Penciller's Ledger (once they have been persisted within the Ledger). The aim is to reduce significantly the cost of compaction. Also, the objects ar enot directly accessible (they can only be accessed through folds). Again this makes life easier during merging in the LSM trees (as no bloom filters have to be created).	2018-02-15 16:14:46 +00:00
Martin Sumner	f8ceedc9bb	Compress L0 only Doing at L1 has a negative impact as tests draw on. Also improve head time tracking	2017-12-04 10:49:42 +00:00
Martin Sumner	f436cfd03e	Add consistent timing points Now all timing points should be made in a consistent fashion	2017-11-21 23:13:24 +00:00
Martin Sumner	50c81d0626	Make ink fold more generic Also makes the fold_from_sequence loop much easier to follow	2017-11-17 14:54:53 +00:00
Martin Sumner	4c05dc79f9	Merge branch 'master' into mas-aae-segementfoldplus	2017-11-08 18:38:49 +00:00
Martin Sumner	7de4dccbd9	Extend journal compaction test to cover with and without waste retention. Also makes sure that CDB files in a restarted store will respect the wast retention period set.	2017-11-08 16:18:48 +00:00
Martin Sumner	22e894c928	Allow waste retnetion to be ignored If wast retention period is undefined, then it should be ignored - and no waste retained (rather than retaining waste for 24 hours as at present). This wasn't working anyway - as reopen reader didn't get the cdb options (which didn't have the waste path on anyway) - so waste would not eb retained if the file had been opened after a stop/start.	2017-11-08 12:58:09 +00:00
Martin Sumner	5cee3a8e4e	Tidy up spec Also remove _app _sup originally added for dialyzer (due to false understanding they were needed for dialyzer)	2017-11-07 19:41:39 +00:00
Martin Sumner	8f27b3b628	Merge branch 'master' into mas-aae-segementfoldplus	2017-11-07 11:22:56 +00:00
Martin Sumner	0af0d85239	Add option description Add documentation of new options	2017-11-07 10:22:27 +00:00
Martin Sumner	1d475235d1	Improve test coverage Make compress on receipt/compaction configurable	2017-11-06 18:44:08 +00:00
Martin Sumner	61b7be5039	Make compression algorithm an option Compression can be switched between LZ4 and zlib (native). The setting to determine if compression should happen on receipt is now a macro definition in leveled_codec.	2017-11-06 15:54:58 +00:00
Martin Sumner	c8ad39b33b	foldheads_bybucket adds segment list support Accelerate queries for foldheads_bybucket as well	2017-11-01 22:00:12 +00:00
Martin Sumner	5b5b4a3a29	Test coverage Code no longer requires LongRunning to be undefined so that it can be decided through bext guess. Also cover branches of tictac tree code.	2017-11-01 17:14:19 +00:00
Martin Sumner	b141dd199c	Allow for segment-acceleration of folds Initially with basic tests. If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm. This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID. This also required a slight tweak to the penciller keyfolder. It now caches the next answer from the SSTiter, rather than restart the iterator. When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.	2017-10-31 23:28:35 +00:00
Martin Sumner	a128dcdadf	Change hash algorithm for penciller Switch from magic hash to md5 - to hopefully remove the need for some of the artificial jumps required to get expected fall positive ratios. Also split the hash into two 16-bit integers. We assume that SegmentID (from the perspective of AAE merkle/tictac trees) will always be at least 16 bits. the idea is that hashes should be used in blooms and indexes such that some advantage can be gained from just knowing the segmentID - in particular when folding over all the keys in a bucket. Performance testing has been difficult so far - I think due to “cloud” mysteries.	2017-10-20 23:04:29 +01:00
Martin Sumner	bfaed921e6	Split code for folders - introduce runner actor Introduce a dedicated module for all the different fold types. Also simplify the list of folders by deprecating those folds that should eb achieveable by fold_heads/fold_objects type folds but with smarter functions. Makes sure that the fold functiosn also have better spec coverage, and are dialyzer checked.	2017-10-17 20:39:11 +01:00
Martin Sumner	5c8eea3f0e	Extend foldheads_bybucket test Now explicitly checking key ranges	2017-10-06 15:07:36 +01:00
Martin Sumner	0c5f5cdb65	Add key range to fold_heads queries	2017-10-06 15:02:14 +01:00
Martin Sumner	389694b11b	Add exportable option to tictac Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary. So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.	2017-09-26 22:49:40 +01:00
Martin Sumner	9730816c38	Merge branch 'master' into mas-riakaae-impl-2	2017-09-22 09:39:32 +01:00
Martin Sumner	eba21f49fa	Make tests compatible with OTP 16 this required a switch to change the sync strategy based on rebar parameter. However tests could be slow on macbook with OTP16 and sync - so timeouts added in unit tests, and ct tests sync_startegy changed to not sync for OTP16.	2017-09-15 15:10:04 +01:00
Martin Sumner	856a64c4d4	Unit tests use wrong query format Not sure how this happened. Bad merge? Just plain sloppiness on my part? Anyhow, the unit tests were not working ..	2017-09-14 18:06:50 +01:00
Martin Sumner	9f97c82d0d	Add import/export support Also fix for fold_heads unit tests to reflect new booleans required by changes to support there use in MapFolds	2017-08-16 16:58:38 +01:00
Martin Sumner	53ddc8950b	Add tests using fold_heads Comparing the inbuilt tictac_tree fold, to using "proper" abstraction and achieving the same thing through fold_heads. The fold_heads method is slower (a lot more manipulation required in the fold) - expect it to require > 2 x CPU. However, this does give the flexibility to change the hash algorithm. This would allow for a fold over a database of AAE trees (where the hash has been pre-computed using sha) to be compared with a fold over a database of leveled backends. Also can vary whether the fold_heads checks for presence of the object in the Inker. So normally we can get the speed advantage of not checking the Journal for presence, but periodically we can.	2017-08-07 10:45:41 +01:00
Heinz N. Gies	25389893cf	Add compatibility for old and new random / rand functions	2017-08-01 11:24:12 +02:00
Heinz N. Gies	379e33ba84	Cleanup dialyzer errrors in leveled_bookie	2017-07-31 19:58:56 +02:00
martinsumner	80fd2615f6	Implement blacklist/whitelist Change from the all/whitelist ebhavior to the blacklist/whitelist behaviour documented in the write-up	2017-07-11 11:44:01 +01:00
martinsumner	97fdd36d53	Returning bucket when bucket is all Need to know {Bucket, Key} not just Key if all buckets are being covered by nrt aae. So shoehorning this in - will also allow for proper use of FilterFun when filtering by partition.	2017-07-03 18:03:13 +01:00
martinsumner	954995e23f	Support for recent AAE index With basic ct test. Doesn't currently prove expiry of index. Doesn't prove ability to find segments. Assumes that either "all" buckets or a special list of buckets require indexing this way. Will lead to unexpected results if the same bucket name is used across different Tags. The format of the index has been chosen so that hopeully standard index features can be used (e.g. return_terms).	2017-06-30 16:31:22 +01:00
martinsumner	8da8722b9e	Add temporary aae index Pending ct tests. The aae index should expire after limit_minutes and be on an index which is rounded to unit_minutes.	2017-06-30 10:03:36 +01:00
martinsumner	ebef27f021	Extract Last Modified Date from Riak Object As part of process to supporting a recent changes index for near-real-time anti-entropy	2017-06-27 16:25:18 +01:00
martinsumner	7cfa392b6e	Flexible TicTacTree sizes Allow tictac tree sizes to be flexible. Tested lots of different sizes. Having both level 1 and level 2 the same size seemed to be consistently quicker than trying to make either of the levels relatively wider. There's an 8% performance improvement if the SegmentCount is reduced by a quarter.	2017-06-20 10:58:13 +01:00
martinsumner	c586b78f45	Initial code with busted ct test Initiat comparison made betwene trees externally - but ct test is bust.	2017-06-19 11:36:57 +01:00
martinsumner	f5dd154cee	Rename hashtree query Naming is now confusing now we have TicTac Trees. This query builds a list of keys and hashes not a tree - so it was misleading anyaway. Now renamed hashlist_query.	2017-06-16 12:38:59 +01:00
Martin Sumner	7642aac2cc	Change Riak object hash approach Change the riak object hash being kept in the metadata, to being a hash of the vector clock	2017-06-16 10:14:24 +01:00
martinsumner	0d8ab0899e	Add test for is_empty Bucket listing didn't care if keys were active - now does.	2017-05-23 11:59:44 +01:00
martinsumner	fbb4879d81	Change fold_heads to do basic Journal presence check This at least checks the file is present, and the Key exists in the index of that file. If the value is corrupt it will be removed by compation, and then this will fail (unless the file is never compacted). TODO: resolve issus of files which are corrupt - but never compacted - a job for backup?	2017-04-21 15:55:03 +01:00
Martin Sumner	fa9daf8696	Correct async fold fold objects which snaps in the fold was implemented incorrectly - it took information from the LedgeCache at the point of the request, not at the point of the fold. So the LedgerCache SQN may have been surpassed in the Penciller by the time the fold was called.	2017-04-17 23:01:55 +01:00
martinsumner	7cba182951	Merge remote-tracking branch 'origin/mas-sweeperfold-i59' into mas-sweeperfold-i59 # Conflicts: # src/leveled_bookie.erl	2017-04-17 15:07:35 +01:00
martinsumner	50d95ef6aa	Move snapshot inside of the fold function riak_kv_sweeper gets the async fold function, then determines if the function can be called. If the system is busy the fold may be queued, and may never be acted upon. This may cause issues with snapshot timeouts etc.	2017-04-17 15:03:03 +01:00

1 2 3 4

179 commits