leveled

Author	SHA1	Message	Date
martinsumner	95d5e12ce7	Switch to using ets set as index of L0 cache Hope is that this will cause less garbage collection, and also will be slightly faster. Note that snapshots don't now get an index - they get the special index 'snap'. However, the SkipLists have bloom protection, and most snapshots are iterators not fetchers.	2016-12-10 14:15:35 +00:00
martinsumner	626a8e63f9	Experiment converting CDB to use skiplist not gb_tree Might insertion time be faster?	2016-12-10 10:55:35 +00:00
martinsumner	d2bd01eaf1	Add fast fail to skiplist Add a bloom filter to the skiplist, to make it faster at returning not found. The SkipList is now encapsulated within a dict().	2016-12-09 18:30:40 +00:00
martinsumner	f0db730f07	Adjust jitter settings	2016-12-09 16:34:15 +00:00
martinsumner	82cb49638a	Attempt at performance improvement Try to add some extra jitter in to the process of L0 writes, and also make L0 writes delayed to help with bufferring	2016-12-09 14:36:03 +00:00
Martin Sumner	fe080895fd	Revert type definition Can’t find a type definition support din both OTP 16 and OTP 18, so reverting to not defining type	2016-11-25 18:22:35 +00:00
Martin Sumner	ec32a7e3eb	OTP16 compliance - array type	2016-11-25 18:20:17 +00:00
martinsumner	03d025d581	Replace ledger-side gb_trees Try to make minimal change to replace gb_trees with gb_tree API-like skiplists	2016-11-25 14:50:13 +00:00
martinsumner	638fc69e01	Correctly set array type Otherwise cannot compile in both OTP 16 and 17	2016-11-21 22:26:12 +00:00
martinsumner	0f7e421371	Add destruction Allow a store to be cleared out and destroyed	2016-11-21 12:34:40 +00:00
martinsumner	386d40928b	Fast List Buckets Copied the technique from HanoiDB to speed up list buckets.	2016-11-20 21:21:31 +00:00
martinsumner	f40ecdd529	Pick-up test misses There were some coverage misses in tests, so check in unit test coverage or remove branches not currently needed.	2016-11-18 21:35:45 +00:00
martinsumner	8cbe2ef93a	Coverage cheats You juke the stats, and majors become colonels. I've been here before	2016-11-14 20:43:38 +00:00
martinsumner	630f802780	Inker Close nastiness Try to stop some of the potential deadlocking around Inker close and prove that snapshots at higher Manifest SQNs can be ignored	2016-11-14 19:34:11 +00:00
martinsumner	75d6af75c6	Penciller review The penciller attempt to close the L0 file if pending was unpredictable in behaviour. If a L0 file is still pending it will be lost - but this is at least a predictable event.	2016-11-14 17:18:28 +00:00
martinsumner	8035583301	Comment review	2016-11-07 11:17:13 +00:00
martinsumner	4583460328	Clean API of Riak-specific Methods Clena the API of Riak specific methods, and also resolve timing issue in simple_server unit test. Previously this would end up with missing data (and a lower sequence number after start) because of the penciller_clerk timeout being relatively large in the context of this test. Now the timeout has bene reduced the L0 slot is cleared by the time of the close. To make sure an extra sleep has been added as a precaution to avoid any intermittent issues.	2016-11-07 10:11:57 +00:00
Martin Sumner	a7ed3e4b85	Trim dead branches Also an experiment with altering the slowoffer_delay.	2016-11-05 15:59:31 +00:00
martinsumner	a73c233154	Correct the recording of excess work	2016-11-05 15:10:21 +00:00
martinsumner	376176eba3	Correct overlap in naming with Backlog	2016-11-05 14:35:01 +00:00
martinsumner	7f456fa993	Add back-pressure on work queue limit Previously under heavy load, as long as L0 was being cleared, the ledger woud keep accapting. Now there is a formla limit on how far behind the work queue (of compactions required at other levels) before the break is applied on new updates coming in).	2016-11-05 14:04:45 +00:00
martinsumner	4556573a5c	Rationalise logging on push_mem	2016-11-05 13:42:44 +00:00
martinsumner	87b5bd0b18	Set Persisted SQN (regression) As part of previous change had stopped setting the persisted SQN in the ledger - which stopped journal compaction from working)	2016-11-05 12:03:21 +00:00
martinsumner	61c6269200	Penciller back-pressure - Phase 1 There were issues with how the Penciller behaves under ehavy write pressure - most particularly where there are a large number of keys per update (i.e. 2i heavy objects). Most immediately the attempt to chekc whether the l0 file was ready slowed down the process of producing the L0 file - so back-pressure created more back-pressure. Going forward want to alter this most significantly as also the work queue can build up unsustainably. there needs to be some pausing prompted by the bookie on 'returned', and the use of 'returend when the work queue exceeds a threshold.	2016-11-05 11:22:27 +00:00
martinsumner	41f00ba6fa	Filename nonsense	2016-11-03 20:48:23 +00:00
martinsumner	dd99d624b1	Tangling with filenames filename join does not work as expected	2016-11-03 20:46:56 +00:00
martinsumner	c3a6489b93	Ensure manifest dir when starting Penciller Otherwise may fail based on test ordering	2016-11-03 20:09:38 +00:00
martinsumner	d5ac4d412d	Use filename join Potentiall to avoid *nix vs windows differences	2016-11-03 20:06:30 +00:00
martinsumner	341e245c09	Remove unnecessary no match condition	2016-11-03 19:34:54 +00:00
martinsumner	2716d912ea	Timeout and close race Race condition presvented in test - but still not handled nicely. Perhaps need to consider making it a FSM and handling close differently when L0 pending - i.e. don't close immediately, but set a timeout to close on if we don't get the last fetch_levelzero	2016-11-03 19:02:50 +00:00
martinsumner	f41c788bff	Minor quibbles Move legacy CDB code used only in unit tests into test area. Fix column width in pmem and comment out the unused case statement (in healthy tests) from the penciller test code	2016-11-03 16:46:25 +00:00
martinsumner	4e46c9735d	Log improvements Continuation of log review and conversion to using central log function. Fixup of convoluted shutdown process between Bookie, Inker and Inker's Clerk	2016-11-03 16:05:43 +00:00
martinsumner	7147ec0470	Logging - Phase 1 Abstract out logging and introduce a logbase	2016-11-02 18:14:46 +00:00
martinsumner	4cffecf2ca	Handle gen_server:cast slowness There was some unpredictable performance in tests, that was related to the amount of time it took the sft gen_server to accept a cast whihc passed the levelzero_cache. The response time looked to be broadly proportional to the size of the cache - so it appeared to be an issue with passing the large object to the process queue. To avoid this, the penciller now instructs the SFT gen_server to callback to the server for each tree in the cache in turn as it is building the list from the cache. Each of these requests should be reltaively short, and the processing in-between should space out the requests so the Pencille ris not blocked from answering queries when pompting a L0 write.	2016-10-31 01:33:33 +00:00
martinsumner	95609702bd	Penciller Memory Refactor Plugged the ne wpencille rmemory into the Penciller, and took advantage of the increased speed to simplify the callbacks involved. The outcome is much simpler code	2016-10-30 18:25:30 +00:00
martinsumner	cdb01cd24f	Quality Review Looked through test coverage and dialyzer output and attempted to fill test gaps and strip out untestable code (to let it crash).	2016-10-29 00:52:49 +01:00
martinsumner	c6ca973517	Penciller shutdown when empty Stop the penciller from writing an empty file, when shutting down and the L0 Cache is empty. Also parameter fiddle to see impact of the Penciller changes.	2016-10-27 21:40:43 +01:00
martinsumner	20cc17f916	Penciller Refactor Removed o(100) lines of code by refactoring the Penciller to no longer use ETS tables. The code is less confusing, and probably not an awful lot slower.	2016-10-27 20:56:18 +01:00
martinsumner	30f4f2edf6	Comment change on stall behaviour	2016-10-27 09:45:05 +01:00
martinsumner	4cdc6211a0	Handling 'returned' in penciller unit tests The unit tests for the Penciller couldn't cope with the returned status - and so would intermittently fail (after tightening the timeout on sft check_ready.	2016-10-26 21:03:50 +01:00
martinsumner	e9c568a8b3	Test fix-up There was a test that failed to close down a bookie and that caused some issues. The issues are double-reoslved, the close down was tidied as well as the forgotten close being added back in. There is some generla tidy around in anticipation of TTL support.	2016-10-21 21:26:28 +01:00
martinsumner	3ad9e42b61	Changed SFT shutdown to cast-based The SFT shutdown process ahs become a series of casts to-and-from between Penciller and SFT to stop the two processes syncronously making requests on each other	2016-10-21 12:18:06 +01:00
martinsumner	c431bf3b0a	Broken snapshot test The test confirming that deleting sft files wer eheld open whilst snapshots were registered was actually broken. This test has now been fixed, as well as the logic in registring snapshots which had used ledger_sqn mistakenly rather than manifest_sqn.	2016-10-21 11:38:30 +01:00
martinsumner	5c2029668d	Tombstone preperation Some initial code changes preparing for the test and implementation of tombstones and tombstone reaping	2016-10-20 16:00:08 +01:00
martinsumner	cf66431c8e	Smoother handling of back-pressure The Penciller had two problems in previous commits: - If it had a push_mem soon after a L0 file had been created, the push_mem would stall waiting for the L0 file to complete - and this count take 100-200ms - The penciller's clerk favoured L0 work, but was lazy about asking for other work in-between, so often the L1 layer was bursting over capacity and the clerk was doing nothing but merging more L0 files in (with those merges getting more and more expensive as they had to cover more and more files) There are some partial resolutions to this. There is now an aggressive timeout when checking whther the L0 file is ready on a push_mem, and if the timeout is breached the error is caught and a 'returned' message goes back to the Bookie. the Bookie doesn't now empty its cache, it carrie son filling it, but on some probability it will keep trying to push_mem on future pushes. This increases Jitter around the expensive operation and split out the L0 delay into defined chunks. The penciller's clerk is now more aggressive in asking for work. There is also some simplification of the relationship between clerk timeouts and penciller back-pressure. Also resolved is an issue of inconcistency between the loader and the on startup (replaying the transaction log) and the standard push_mem process. The loader was not correctly de-duplicating by adding first (in order) to a tree before outputting the list from the tree. Some thought will be given later as to whether non-L0 work can be safely prioritised if the merge process still keeps getting behind.	2016-10-20 02:23:45 +01:00
martinsumner	7319b8f415	Redundant clauses Remove some redundant clauses, and fix up some logging	2016-10-19 20:51:30 +01:00
martinsumner	12fe1d01bd	Penciller Manifest and Locking The penciller had the concept of a manifest_lock - but it wasn't clear what the purpose of it was. The updating of the manifest has now been updated to reduce the code and make the process cleaner and more obvious. Now the committed manifest only covers non-L0 levels. A clerk can work concurrently on a manifest change whilst the Penciller is accepting a new L0 file. On startup the manifets is opened as well as any L0 file. There is a possible race condition with killing process where there may be a L0 file which is merged but undeleted - and this is believed to be inert. There is some outstanding work still. Currently the whole store is paused if a push_mem is received by the Penciller, and the writing of a L0 sft file has not been completed. The creation of a L0 file appears to take about 300ms, so if the ledger_cache fills in this period a pause will occurr (perhaps due to objects with lots of index entries). It would be preferable to pause more elegantly in this situation. Perhaps there should be a harsh timeout on the call to check the SFT complete, and catching it should cause a refused response. The next PUT will then wait, but a any queued GETs can progress.	2016-10-19 17:34:58 +01:00
martinsumner	f16f71ae81	Revert ominshambles performance refactoring To try and improve performance index entries had been removed from the Ledger Cache, and a shadow list of the LedgerCache (in SQN order) was kept to avoid gb_trees:to_list on push_mem. This did not go well. The issue was that ets does not deal with duplicate keys in the list when inserting (it will only insert one, but it is not clear which one). This has been reverted back out. The ETS parameters have been changed to [set, private]. It is not used as an iterator, and is no longer passed out of the process (the memtable_copy is sent instead). This also avoids the tab2list function being called.	2016-10-19 00:10:48 +01:00
martinsumner	8f29a6c40f	Complete 2i work - some refactoring The 2i work now has tests for removals as well as regex etc. Some initial refactoring work has also been tried - to try and take some tasks of the critical path of push_mem. The primary change has been to avoid putting index keys into the gb_tree, and building the KeyChanges list in parallel to the gb_tree (now known as ObjectTree) within the Ledger Cache. Some initial experiments done as to changing the ETS table in the Penciller now that it will now be used for iterating - but that has been reverted for now.	2016-10-18 19:41:33 +01:00
martinsumner	3e475f46e8	Support for 2i query part1 Added basic support for 2i query. This involved some refactoring of the test code to share functions between suites. There is sill a need for a Part 2 as no tests currently cover removal of index entries.	2016-10-18 01:59:18 +01:00

1 2 3 4

178 commits