leveled

Author	SHA1	Message	Date
Martin Sumner	6677f2e5c6	Push log update through to cdb/sst Using the cdb_options and sst_options records	2018-12-11 20:42:00 +00:00
Martin Sumner	90574122c9	Merge remote-tracking branch 'aeternity/uw-avoid-set_env' into mas-pr231-review	2018-12-10 18:33:23 +00:00
Ulf Wiger	d30ca16a17	store log opts in pdict + inherit at proc start	2018-12-10 16:09:11 +01:00
Martin Sumner	714e128df8	Tidy up protecting against corrupt Keys this was previously not na issue as leveled_codec:segment_hash/1 would handle anyhting that could be hashed. This now has to be a tuple, and one with a first element - so corrupted tuples are failing. Add a guard chekcing for a corrupted tuple, but we only need this when doing journal compaction. Change user_defined keys to be `retain` as a tag strategy	2018-12-07 09:07:22 +00:00
Martin Sumner	cee5a60ceb	Protect against bad journal keys in Ledger	2018-12-07 00:48:42 +00:00
Martin Sumner	ef2a8c62af	Add capability to exit a head or object fold with a throw This allows for all fold functions to throw an exception to exit out of a fold with all dependencies still closed down as expected. This was previously available for key folds, which was necessary for the folds to work in Riak (as max_results in index queries depends one xiting the fold with an exception). This change now adds a ct test, and adds support for head folds, object folds (key order) and object folds (sqn order)	2018-11-23 16:00:11 +00:00
Martin Sumner	ea7aa3086d	Refactor membership check To change to set membership when size beyond threshold	2018-11-07 17:43:26 +00:00
Martin Sumner	174a40aab2	Tidy up unexported types also re:mp may not be exported in R16	2018-11-05 16:02:19 +00:00
Martin Sumner	e72a946f43	TupleBuckets in Riak objects Adds support with test for tuplebuckets in Riak keys. This exposed that there was no filter using the seglist on the in-mmemory keys. This means that if there is no filter applied in the fold_function, many false positives may emerge. This is probably not a big performance benefit (and indeed for performance it may be better to apply during the leveled_pmem:merge_trees). Some thought still required as to what is more likely to contribute to future bugs: an extra location using the hash matching found in leveled_sst, or the extra results in the query.	2018-11-05 01:21:08 +00:00
Martin Sumner	2eec8a5378	MaxCount monitoring and responding Stop issue of {no_more_keys, Acc} being passed on fold over list of ranges to next range (and blowing up)	2018-11-01 23:40:28 +00:00
Martin Sumner	dc84eabe0c	Revert "Temp log" This reverts commit `2b57ff831c`.	2018-11-01 20:16:08 +00:00
Martin Sumner	2b57ff831c	Temp log	2018-11-01 19:58:32 +00:00
Martin Sumner	aaccd09a98	Allow for setting max_keys to wrap Acc Acc in response is now of form {Reason, Acc} not just Acc so that the application can understand the reason for the results ending - and take appropriate action (e.g. restart again from the LastKey to return more results).	2018-10-31 14:22:28 +00:00
Martin Sumner	11627bbdd9	Extend API To support max_keys and the last modified date range. This applies the last modified date check on all ledger folds. This is hard to avoid, but ultimately a very low cost. The limit on the number of heads to fold, is the limit based on passing to the accumulator - not on the limit being added to the accumulator. So if the FoldFun perfoms a filter (e.g. for the preflist), then those filtered results will still count towards the maximum. There needs to be someway at the end of signalling from the fold if the outcome was or was not 'constrained' by max_keys - as the fold cannot simply tel by lenght checking the outcome. Note this is used rather than length checking the buffer and throwing a 'stop_fold' message when the limit is reached. The choice is made for simplicity, and ease of testing. The throw mechanism is necessary if there is a need to stop parallel folds across the the cluster - but in this case the node_worker_pool will be used.	2018-10-31 00:09:24 +00:00
Martin Sumner	ae1ada86b2	Add accumulator check for last mod range Perhaps should also do the segment check at this point. Seems odd to check last modified date and segments in different places.	2018-10-30 19:35:29 +00:00
Martin Sumner	b7e697f7f0	Fold API to leveled_sst Externally to leveled_sst all folds are actually managed through exapnd_list_by_pointer. Make the API a bit clearer in this regards, and add specs to help dialyzer. This also adds LowLastMod to the API for expanding pointers (although the leveled_penciller just defaults this to 0 for everything.	2018-10-30 16:44:00 +00:00
Martin Sumner	a9b097e392	Add a wrapper to fold_keys queries Queries that in Riak will be based on fold_keys need to be able to catch throws, and re-throw them to be detected by the worker (whilst still clearing up the snapshot)	2018-09-24 19:54:28 +01:00
Martin Sumner	8ada5e78fa	Max penciller cache change Missed a bit	2018-09-14 17:22:25 +01:00
Russell Brown	4334e2d734	Add unit test for pclr snapshot closing This was tested by the eqc, but we merged the fix without the test. This eunit test fixes that. Coverage!	2018-09-06 14:08:09 +01:00
Russell Brown	ef9ac672e5	Stop snapshots when the bookie stops During EQC testing it was found that snapshots are still usable even if the bookie process crashes. This change has snapshots monitor the bookie and close when the bookie process dies.	2018-09-06 11:47:52 +01:00
Martin Sumner	c4e376ece5	Don't link snapshots If a snapshot breaks a penciller clone, this shouldn't crash the main process.	2018-07-10 10:25:20 +01:00
Martin Sumner	082eabb65b	Switch to start_link Start all processes linked - to collapse the whole tree if one process fails	2018-06-28 12:16:43 +01:00
Martin Sumner	aedeb0c934	Add support for with_lookup head_only head_only mode cna be run with_lookup - but there is no L0 index created in this case. So the L0 index wasn't returning a potition list and the L0 cache wasn't being checked. Code now checks every position in the L0 cache, when a lookup is attempted in head_only mode.	2018-06-23 15:15:49 +01:00
Martin Sumner	990e857ebe	Add to log	2018-06-23 13:25:10 +01:00
Martin Sumner	ac14bbdf41	Add log to penciller	2018-06-23 13:18:32 +01:00
Martin Sumner	319c6b4ca7	Undefined typo Interetsingly setting max_pencillercachesize to a non-integer merely had the impact of making the penciller cache size infinite. So a guard added to make sure it is an integer going forward.	2018-06-07 14:53:34 +01:00
Martin Sumner	039b135f5f	Ease timeout pressure in unit test	2018-05-18 14:36:47 +01:00
Martin Sumner	6a20b2ce66	Use leveled_codec types ... and exporting them. Previously types wer enot exported, and it appears dialyzer treated tham as any() when they were unexported types ??!!??	2018-05-04 15:24:08 +01:00
Martin Sumner	dd7b753688	Add spec and comments to penciller	2018-05-01 21:28:40 +01:00
Martin Sumner	11ba7029aa	de-terminate penciller	2018-04-10 09:51:21 +01:00
Martin Sumner	5312806592	Stop Iterator re-use The IMM iterator should not be reused, as it has already been filtered for a query. so if reused for a different query incorrect and unexpected results may occur. This reuse had been stopped by a previous commit, and this cleans up subsequently unused code.	2018-03-02 08:16:34 +00:00
Martin Sumner	861aa5a7db	Support multi-query fold Allow a single snapshot to run query over multiple ranges. Used initially to fold over multiple buckets.	2018-03-01 23:19:52 +00:00
Martin Sumner	2b6281b2b5	Initial head_only features Initial commit to add head_only mode to leveled. This allows leveled to receive batches of object changes, but where those objects exist only in the Penciller's Ledger (once they have been persisted within the Ledger). The aim is to reduce significantly the cost of compaction. Also, the objects ar enot directly accessible (they can only be accessed through folds). Again this makes life easier during merging in the LSM trees (as no bloom filters have to be created).	2018-02-15 16:14:46 +00:00
Martin Sumner	834704a3ff	Merge branch 'mas-i117-factor4scale' of https://github.com/martinsumner/leveled into mas-i117-factor4scale	2018-02-10 08:10:32 +00:00
Martin Sumner	f748fc8611	Narrower still Make the LSM tree more bottle shaped. Experiment to judge performance impact	2018-02-10 08:10:24 +00:00
Martin Sumner	5673d8b558	Expand test to ensure coverage catch	2018-02-10 08:09:33 +00:00
Martin Sumner	8113aebdcf	Add timings for Level 3 Level 3 readings now relatively common - so time the separately	2018-02-09 08:59:21 +00:00
Martin Sumner	7e4c3db915	Alternate scale factor Also had failed unit test - there was an issue with bit-flipping the position not being safely caught	2018-02-08 10:29:27 +00:00
Martin Sumner	5342e3a94f	Improve testing of bloom feature In particular will blooms re-appear following startup	2017-11-28 11:43:46 +00:00
Martin Sumner	c2f19d8825	Switch to using bloom at penciller Previouslythe tinybloom was used within the SST file as an extra check to remove false fetches. However the SST already has a low FPR check in the slot_index. If the newebloom was used (which is no longer per slot, but per sst), this can be shared with the penciller and then the penciller could use it and avoid the message pass. the message pass may be blocked by a 2i query or a slot fetch request for a merge. So this should make performance within the Penciller snappier. This is as a result of taking sst_timings within a volume test - where there was an average of + 100microsecs for each level that was dropped down. Given the bloom/slot checks were < 20 microsecs - there seems to be some further delay. The bloom is a binary of > 64 bytes - so passing it around should not require a copy.	2017-11-28 01:19:30 +00:00
Martin Sumner	f436cfd03e	Add consistent timing points Now all timing points should be made in a consistent fashion	2017-11-21 23:13:24 +00:00
Martin Sumner	3ef550d9f8	Refactor timing point management For Penciller and timing head requests.	2017-11-21 19:58:36 +00:00
Martin Sumner	51f504fec5	Add extra slow_fetch test sometimes ct tests don’t hit this - surprisingly	2017-11-20 17:29:57 +00:00
Martin Sumner	f55cbbeac3	OTP 19 requires defaults in dialyzer	2017-11-13 14:02:39 +00:00
Martin Sumner	8f27b3b628	Merge branch 'master' into mas-aae-segementfoldplus	2017-11-07 11:22:56 +00:00
Martin Sumner	61b7be5039	Make compression algorithm an option Compression can be switched between LZ4 and zlib (native). The setting to determine if compression should happen on receipt is now a macro definition in leveled_codec.	2017-11-06 15:54:58 +00:00
Martin Sumner	ee7f9ee4e0	Test coverage ... and column width formatting	2017-11-01 15:11:14 +00:00
Martin Sumner	b141dd199c	Allow for segment-acceleration of folds Initially with basic tests. If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm. This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID. This also required a slight tweak to the penciller keyfolder. It now caches the next answer from the SSTiter, rather than restart the iterator. When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.	2017-10-31 23:28:35 +00:00
Martin Sumner	36264eb416	Search range failure Discovered a bug with search ranges in leveled_tree - this was uncovered by an intermittently fialing 19.3 test. Test case added and bug fixed. It was due to a fialure to use end_key passed causing issues with particular manifests and full bucket ranges.	2017-10-24 13:19:30 +01:00
Martin Sumner	a128dcdadf	Change hash algorithm for penciller Switch from magic hash to md5 - to hopefully remove the need for some of the artificial jumps required to get expected fall positive ratios. Also split the hash into two 16-bit integers. We assume that SegmentID (from the perspective of AAE merkle/tictac trees) will always be at least 16 bits. the idea is that hashes should be used in blooms and indexes such that some advantage can be gained from just knowing the segmentID - in particular when folding over all the keys in a bucket. Performance testing has been difficult so far - I think due to “cloud” mysteries.	2017-10-20 23:04:29 +01:00

1 2 3 4 5

247 commits