Commit graph

174 commits

Author SHA1 Message Date
Martin Sumner
7e4c3db915 Alternate scale factor
Also had failed unit test - there was an issue with bit-flipping the position not being safely caught
2018-02-08 10:29:27 +00:00
Martin Sumner
f8ceedc9bb Compress L0 only
Doing at L1 has a negative impact as tests draw on.  Also improve head time tracking
2017-12-04 10:49:42 +00:00
Martin Sumner
1f5d5033a4 Revert "Revert "Disable compression L0 and L1""
This reverts commit 958d3f5e14.
2017-12-04 09:30:27 +00:00
Martin Sumner
958d3f5e14 Revert "Disable compression L0 and L1"
This reverts commit b10c0cf895.
2017-12-04 09:29:44 +00:00
Martin Sumner
b10c0cf895 Disable compression L0 and L1 2017-12-02 09:19:17 +00:00
Martin Sumner
6e589942b6 Cover bit flips in the slot header 2017-12-01 16:20:48 +00:00
Martin Sumner
5bac389d0c Switch to CRC check at Block Level
Previously done at Slot Level - but Blocks were still read from disk after the Slot CRC had been checked.

This seems safer.  It requires an extra CRC check for every fetch.  However, CRC chekcing smaller binaries during the buld process appears to be beneficial to performance.

Hoped this will be an enabler to turning off compression at Levels 0 and 1 to improve performance (wihtout having a compensating issues with reduced CRC performance)
2017-12-01 14:15:13 +00:00
Martin Sumner
9d9ad17d36 Typo 2017-11-30 16:29:10 +00:00
Martin Sumner
3b42bc28d1 Add build timing info to merge_list log
Help to determine what the expensive part of the operation is
2017-11-30 16:15:38 +00:00
Martin Sumner
41c308c5fd As used in lookup - will always be hash 2017-11-28 22:13:18 +00:00
Martin Sumner
eb90541a85 Add a small cache to SST file
so that a HEAD which folllows a HEAD (e.g. when a GET follows a HEAD) has a chance of avoiding the binary_to_term CPU load
2017-11-28 14:56:40 +00:00
Martin Sumner
c2f19d8825 Switch to using bloom at penciller
Previouslythe tinybloom was used within the SST file as an extra check to remove false fetches.

However the SST already has a low FPR check in the slot_index.  If the newebloom was used (which is no longer per slot, but per sst), this can be shared with the penciller and then the penciller could use it and avoid the message pass.

the message pass may be blocked by a 2i query or a slot fetch request for a merge.  So this should make performance within the Penciller snappier.

This is as a result of taking sst_timings within a volume test - where there was an average of + 100microsecs for each level that was dropped down.  Given the bloom/slot checks were < 20 microsecs - there seems to be some further delay.

The bloom is a binary of > 64 bytes - so passing it around should not require a copy.
2017-11-28 01:19:30 +00:00
Martin Sumner
f436cfd03e Add consistent timing points
Now all timing points should be made in a consistent fashion
2017-11-21 23:13:24 +00:00
Martin Sumner
58946a7f98 Amend SST Timing Capture
Use sampling mechansm from CDB timing capture.  Do it less though - as far more SST fetches in comparison to CDB fetches.
2017-11-21 17:00:23 +00:00
Martin Sumner
f55cbbeac3 OTP 19 requires defaults in dialyzer 2017-11-13 14:02:39 +00:00
Martin Sumner
8f27b3b628 Merge branch 'master' into mas-aae-segementfoldplus 2017-11-07 11:22:56 +00:00
Martin Sumner
61b7be5039 Make compression algorithm an option
Compression can be switched between LZ4 and zlib (native).

The setting to determine if compression should happen on receipt is now a macro definition in leveled_codec.
2017-11-06 15:54:58 +00:00
Martin Sumner
9fa8ed6cca Add LZ4 2017-11-03 14:18:49 +00:00
Martin Sumner
c6749e61a9 Split out block serialisation
To allow for alternate compression scenarios to be more easily tested
2017-11-03 11:04:31 +00:00
Martin Sumner
53c3bf6c37 Remove get_slotid
Had been used in some debug logging - now not called
2017-11-01 17:05:35 +00:00
Martin Sumner
ee7f9ee4e0 Test coverage
... and column width formatting
2017-11-01 15:11:14 +00:00
Martin Sumner
81180e9310 Add tests for different tree sizes
Note that accelerating segment_list queries will not work for tree sizes smaller than small.  How to flag this up?

Should smaller tree sizes just be removed from leveled_tictac?
2017-11-01 11:51:51 +00:00
Martin Sumner
f80aae7d78 Type typo 2017-10-31 23:35:57 +00:00
Martin Sumner
b141dd199c Allow for segment-acceleration of folds
Initially with basic tests.  If the SlotIndex has been cached, we can now use the slot index as it is based on the Segment hash algortihm.

This looks like it should lead to an order of magnitude improvement in querying for keys/clocks by segment ID.

This also required a slight tweak to the penciller keyfolder.  It now caches the next answer from the SSTiter, rather than restart the iterator.   When the IMMiter has many more entries than the SSTiter (as the sSTiter is being filtered but not the IMMiter) this could lead to lots of repeated folding.
2017-10-31 23:28:35 +00:00
Martin Sumner
e24eaf655b Revert to previous standard slot size
But maintain configurability of slot size to maximum
2017-10-25 08:59:34 +01:00
Martin Sumner
a22610cee7 Experiment with alternate slot size
Improves fpr.  Does this change anything in volume tests?
2017-10-24 17:58:33 +01:00
Martin Sumner
26aa573ce1 Switch segment and extra hash
More entropy by using the position index with the segment hash - so this would be a better filter to apply.

Also could increase the key count now, as extra hash can be larger.

As an aside - a leveled_iclerk unit test failure appeared - the range was just wrong.  Don't know why this strated happening
2017-10-24 14:32:04 +01:00
Martin Sumner
a128dcdadf Change hash algorithm for penciller
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.

Also split the hash into two 16-bit integers.  We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits.  the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.

Performance testing has been difficult so far - I think due to “cloud”
mysteries.
2017-10-20 23:04:29 +01:00
Heinz N. Gies
25389893cf Add compatibility for old and new random / rand functions 2017-08-01 11:24:12 +02:00
Heinz N. Gies
5e6df539cb Cleanup dialyzer errrors in leveled_sst 2017-07-31 19:30:29 +02:00
martinsumner
8da8722b9e Add temporary aae index
Pending ct tests.  The aae index should expire after limit_minutes and
be on an index which is rounded to unit_minutes.
2017-06-30 10:03:36 +01:00
martinsumner
ebef27f021 Extract Last Modified Date from Riak Object
As part of process to supporting a recent changes index for
near-real-time anti-entropy
2017-06-27 16:25:18 +01:00
martinsumner
8b3ca78d49 spec help for SST file 2017-05-18 12:29:56 +01:00
Martin Sumner
4e9fa2a206 Timeout long-running snapshots
Add logic to timeout long-running snapshots.
2017-04-05 09:16:01 +01:00
martinsumner
e59585d733 Merge remote-tracking branch 'refs/remotes/origin/mas-etsmem-i52' into mas-sstfiveblocks 2017-03-21 18:25:18 +00:00
martinsumner
eef2199335 Up level for yield to 2 2017-03-21 18:24:11 +00:00
martinsumner
756b46bb4d Return to merge scan width of 16
This was reduced before the use of binary blocks was committed
2017-03-21 17:53:34 +00:00
martinsumner
1fdcdf3b37 Midblock size - lookup
No real reason for the midblock to be smaller in lookup slots - so give
the blocks a more consistent size
2017-03-21 17:47:08 +00:00
martinsumner
64e944d9ba Change to 5 blocks in SST Slot
Change to 5 blocks is intended to make the blocks in lookup slots
fractionally smaller, but more importantly to introduce a middle block
that cna be opened in a binary-split style fashion to reduce the number
of blocks that need to be opened for range queries.   Worst case for
full slots is 3 blocks now not 4.
2017-03-21 16:54:23 +00:00
martinsumner
dd0316eedf Yield on query selectively
Still not clear if yielding is the cause of memory problems, but taking
it away universally has impacted throughput.  At the very least we
should continue to yield on high-contention files (those at higher
levels), where the processes are more likely to be quickly terminated
anyway allowing GC to be invoked.
2017-03-21 11:03:29 +00:00
martinsumner
419541f5dd Fix to delete_pending state 2017-03-20 23:43:31 +00:00
martinsumner
415ac6017b Move sst get_kv range back inside process
Moved outside to stop blocking, but also avoids copy.  Move back out to
see if it may be related to the binary memory leak
2017-03-20 23:22:46 +00:00
martinsumner
5c662aeca1 Additional unit test
Need to test scenario where the key list the SST file created from is an
exact multiple of the slot size
2017-03-19 23:42:24 +00:00
martinsumner
431c2cee40 Remove unnecessary line
Brnach cannot be reached as firts key is always discovered when it is a
no_loolup
2017-03-19 23:37:50 +00:00
martinsumner
f20aba9c8b Curtail trimmed slot crazyness
There was complicated and confusing code that achieved nothing for
effiency when trimming slots.  the expensive part (binary_to_term) was
still needed on every block, and it was hard to get code coverage and
make sense of what it was really trying to achieve.

This is now much simpler - and may set us up for potential further
indexing help.
2017-03-19 21:47:22 +00:00
martinsumner
c203e2ee06 Range queries - pass out as binaries
Avoid converting to erlang temr wihtin the FSM and then requiring a copy
outside of the FSM - pass out as a binary
2017-03-17 10:47:20 +00:00
martinsumner
f287895db0 Pass out slots as a binary
If we convetr firts to a list, then the list has to be copied - passing
out as binaries means the bulk can be passed as references
2017-03-17 10:43:34 +00:00
martinsumner
5dbd7a2bc2 Check query out of range
It doesn't work - so protecting against it in fetch_range is pointless,
will blow up in lookup_slots
2017-03-16 08:43:18 +00:00
martinsumner
6199a2c352 RTrim
RTrim only worked in special case of key matching, that would never
occur in real world range query.  RTrim should really check for key
passing.

Returning empty list should not be possible - unless the query is
outside of the range entirely (and such a query should never go to this
SST).
2017-03-16 08:37:36 +00:00
martinsumner
dde37566b9 Add unit test for more than one slot 2017-03-15 16:40:43 +00:00