leveled

Author	SHA1	Message	Date
Martin Sumner	7e4c3db915	Alternate scale factor Also had failed unit test - there was an issue with bit-flipping the position not being safely caught	2018-02-08 10:29:27 +00:00
Martin Sumner	f8ceedc9bb	Compress L0 only Doing at L1 has a negative impact as tests draw on. Also improve head time tracking	2017-12-04 10:49:42 +00:00
Martin Sumner	1f5d5033a4	Revert "Revert "Disable compression L0 and L1"" This reverts commit `958d3f5e14`.	2017-12-04 09:30:27 +00:00
Martin Sumner	958d3f5e14	Revert "Disable compression L0 and L1" This reverts commit `b10c0cf895`.	2017-12-04 09:29:44 +00:00
Martin Sumner	b10c0cf895	Disable compression L0 and L1	2017-12-02 09:19:17 +00:00
Martin Sumner	6e589942b6	Cover bit flips in the slot header	2017-12-01 16:20:48 +00:00
Martin Sumner	5bac389d0c	Switch to CRC check at Block Level Previously done at Slot Level - but Blocks were still read from disk after the Slot CRC had been checked. This seems safer. It requires an extra CRC check for every fetch. However, CRC chekcing smaller binaries during the buld process appears to be beneficial to performance. Hoped this will be an enabler to turning off compression at Levels 0 and 1 to improve performance (wihtout having a compensating issues with reduced CRC performance)	2017-12-01 14:15:13 +00:00
Martin Sumner	9d9ad17d36	Typo	2017-11-30 16:29:10 +00:00
Martin Sumner	3b42bc28d1	Add build timing info to merge_list log Help to determine what the expensive part of the operation is	2017-11-30 16:15:38 +00:00
Martin Sumner	41c308c5fd	As used in lookup - will always be hash	2017-11-28 22:13:18 +00:00
Martin Sumner	eb90541a85	Add a small cache to SST file so that a HEAD which folllows a HEAD (e.g. when a GET follows a HEAD) has a chance of avoiding the binary_to_term CPU load	2017-11-28 14:56:40 +00:00
Martin Sumner	c2f19d8825	Switch to using bloom at penciller Previouslythe tinybloom was used within the SST file as an extra check to remove false fetches. However the SST already has a low FPR check in the slot_index. If the newebloom was used (which is no longer per slot, but per sst), this can be shared with the penciller and then the penciller could use it and avoid the message pass. the message pass may be blocked by a 2i query or a slot fetch request for a merge. So this should make performance within the Penciller snappier. This is as a result of taking sst_timings within a volume test - where there was an average of + 100microsecs for each level that was dropped down. Given the bloom/slot checks were < 20 microsecs - there seems to be some further delay. The bloom is a binary of > 64 bytes - so passing it around should not require a copy.	2017-11-28 01:19:30 +00:00
Martin Sumner	f436cfd03e	Add consistent timing points Now all timing points should be made in a consistent fashion	2017-11-21 23:13:24 +00:00
Martin Sumner	58946a7f98	Amend SST Timing Capture Use sampling mechansm from CDB timing capture. Do it less though - as far more SST fetches in comparison to CDB fetches.	2017-11-21 17:00:23 +00:00
Martin Sumner	f55cbbeac3	OTP 19 requires defaults in dialyzer	2017-11-13 14:02:39 +00:00
Martin Sumner	8f27b3b628	Merge branch 'master' into mas-aae-segementfoldplus	2017-11-07 11:22:56 +00:00
Martin Sumner	61b7be5039	Make compression algorithm an option Compression can be switched between LZ4 and zlib (native). The setting to determine if compression should happen on receipt is now a macro definition in leveled_codec.	2017-11-06 15:54:58 +00:00
Martin Sumner	9fa8ed6cca	Add LZ4	2017-11-03 14:18:49 +00:00
Martin Sumner	c6749e61a9	Split out block serialisation To allow for alternate compression scenarios to be more easily tested	2017-11-03 11:04:31 +00:00
Martin Sumner	53c3bf6c37	Remove get_slotid Had been used in some debug logging - now not called	2017-11-01 17:05:35 +00:00
Martin Sumner	ee7f9ee4e0	Test coverage ... and column width formatting	2017-11-01 15:11:14 +00:00
Martin Sumner	81180e9310	Add tests for different tree sizes Note that accelerating segment_list queries will not work for tree sizes smaller than small. How to flag this up? Should smaller tree sizes just be removed from leveled_tictac?	2017-11-01 11:51:51 +00:00
Martin Sumner	f80aae7d78	Type typo	2017-10-31 23:35:57 +00:00
Martin Sumner	b141dd199c	Allow for segment-acceleration of folds Initially with basic tests. If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm. This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID. This also required a slight tweak to the penciller keyfolder. It now caches the next answer from the SSTiter, rather than restart the iterator. When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.	2017-10-31 23:28:35 +00:00
Martin Sumner	e24eaf655b	Revert to previous standard slot size But maintain configurability of slot size to maximum	2017-10-25 08:59:34 +01:00
Martin Sumner	a22610cee7	Experiment with alternate slot size Improves fpr. Does this change anything in volume tests?	2017-10-24 17:58:33 +01:00
Martin Sumner	26aa573ce1	Switch segment and extra hash More entropy by using the position index with the segment hash - so this would be a better filter to apply. Also could increase the key count now, as extra hash can be larger. As an aside - a leveled_iclerk unit test failure appeared - the range was just wrong. Don't know why this strated happening	2017-10-24 14:32:04 +01:00
Martin Sumner	a128dcdadf	Change hash algorithm for penciller Switch from magic hash to md5 - to hopefully remove the need for some of the artificial jumps required to get expected fall positive ratios. Also split the hash into two 16-bit integers. We assume that SegmentID (from the perspective of AAE merkle/tictac trees) will always be at least 16 bits. the idea is that hashes should be used in blooms and indexes such that some advantage can be gained from just knowing the segmentID - in particular when folding over all the keys in a bucket. Performance testing has been difficult so far - I think due to “cloud” mysteries.	2017-10-20 23:04:29 +01:00
Heinz N. Gies	25389893cf	Add compatibility for old and new random / rand functions	2017-08-01 11:24:12 +02:00
Heinz N. Gies	5e6df539cb	Cleanup dialyzer errrors in leveled_sst	2017-07-31 19:30:29 +02:00
martinsumner	8da8722b9e	Add temporary aae index Pending ct tests. The aae index should expire after limit_minutes and be on an index which is rounded to unit_minutes.	2017-06-30 10:03:36 +01:00
martinsumner	ebef27f021	Extract Last Modified Date from Riak Object As part of process to supporting a recent changes index for near-real-time anti-entropy	2017-06-27 16:25:18 +01:00
martinsumner	8b3ca78d49	spec help for SST file	2017-05-18 12:29:56 +01:00
Martin Sumner	4e9fa2a206	Timeout long-running snapshots Add logic to timeout long-running snapshots.	2017-04-05 09:16:01 +01:00
martinsumner	e59585d733	Merge remote-tracking branch 'refs/remotes/origin/mas-etsmem-i52' into mas-sstfiveblocks	2017-03-21 18:25:18 +00:00
martinsumner	eef2199335	Up level for yield to 2	2017-03-21 18:24:11 +00:00
martinsumner	756b46bb4d	Return to merge scan width of 16 This was reduced before the use of binary blocks was committed	2017-03-21 17:53:34 +00:00
martinsumner	1fdcdf3b37	Midblock size - lookup No real reason for the midblock to be smaller in lookup slots - so give the blocks a more consistent size	2017-03-21 17:47:08 +00:00
martinsumner	64e944d9ba	Change to 5 blocks in SST Slot Change to 5 blocks is intended to make the blocks in lookup slots fractionally smaller, but more importantly to introduce a middle block that cna be opened in a binary-split style fashion to reduce the number of blocks that need to be opened for range queries. Worst case for full slots is 3 blocks now not 4.	2017-03-21 16:54:23 +00:00
martinsumner	dd0316eedf	Yield on query selectively Still not clear if yielding is the cause of memory problems, but taking it away universally has impacted throughput. At the very least we should continue to yield on high-contention files (those at higher levels), where the processes are more likely to be quickly terminated anyway allowing GC to be invoked.	2017-03-21 11:03:29 +00:00
martinsumner	419541f5dd	Fix to delete_pending state	2017-03-20 23:43:31 +00:00
martinsumner	415ac6017b	Move sst get_kv range back inside process Moved outside to stop blocking, but also avoids copy. Move back out to see if it may be related to the binary memory leak	2017-03-20 23:22:46 +00:00
martinsumner	5c662aeca1	Additional unit test Need to test scenario where the key list the SST file created from is an exact multiple of the slot size	2017-03-19 23:42:24 +00:00
martinsumner	431c2cee40	Remove unnecessary line Brnach cannot be reached as firts key is always discovered when it is a no_loolup	2017-03-19 23:37:50 +00:00
martinsumner	f20aba9c8b	Curtail trimmed slot crazyness There was complicated and confusing code that achieved nothing for effiency when trimming slots. the expensive part (binary_to_term) was still needed on every block, and it was hard to get code coverage and make sense of what it was really trying to achieve. This is now much simpler - and may set us up for potential further indexing help.	2017-03-19 21:47:22 +00:00
martinsumner	c203e2ee06	Range queries - pass out as binaries Avoid converting to erlang temr wihtin the FSM and then requiring a copy outside of the FSM - pass out as a binary	2017-03-17 10:47:20 +00:00
martinsumner	f287895db0	Pass out slots as a binary If we convetr firts to a list, then the list has to be copied - passing out as binaries means the bulk can be passed as references	2017-03-17 10:43:34 +00:00
martinsumner	5dbd7a2bc2	Check query out of range It doesn't work - so protecting against it in fetch_range is pointless, will blow up in lookup_slots	2017-03-16 08:43:18 +00:00
martinsumner	6199a2c352	RTrim RTrim only worked in special case of key matching, that would never occur in real world range query. RTrim should really check for key passing. Returning empty list should not be possible - unless the query is outside of the range entirely (and such a query should never go to this SST).	2017-03-16 08:37:36 +00:00
martinsumner	dde37566b9	Add unit test for more than one slot	2017-03-15 16:40:43 +00:00

1 2 3 4

174 commits