Commit graph

99 commits

Author SHA1 Message Date
Martin Sumner
a22610cee7 Experiment with alternate slot size
Improves fpr.  Does this change anything in volume tests?
2017-10-24 17:58:33 +01:00
Martin Sumner
26aa573ce1 Switch segment and extra hash
More entropy by using the position index with the segment hash - so this would be a better filter to apply.

Also could increase the key count now, as extra hash can be larger.

As an aside - a leveled_iclerk unit test failure appeared - the range was just wrong.  Don't know why this strated happening
2017-10-24 14:32:04 +01:00
Martin Sumner
a128dcdadf Change hash algorithm for penciller
Switch from magic hash to md5 - to hopefully remove the need for some
of the artificial jumps required to get expected fall positive ratios.

Also split the hash into two 16-bit integers.  We assume that SegmentID
(from the perspective of AAE merkle/tictac trees) will always be at
least 16 bits.  the idea is that hashes should be used in blooms and
indexes such that some advantage can be gained from just knowing the
segmentID - in particular when folding over all the keys in a bucket.

Performance testing has been difficult so far - I think due to “cloud”
mysteries.
2017-10-20 23:04:29 +01:00
Heinz N. Gies
25389893cf Add compatibility for old and new random / rand functions 2017-08-01 11:24:12 +02:00
Heinz N. Gies
5e6df539cb Cleanup dialyzer errrors in leveled_sst 2017-07-31 19:30:29 +02:00
martinsumner
8da8722b9e Add temporary aae index
Pending ct tests.  The aae index should expire after limit_minutes and
be on an index which is rounded to unit_minutes.
2017-06-30 10:03:36 +01:00
martinsumner
ebef27f021 Extract Last Modified Date from Riak Object
As part of process to supporting a recent changes index for
near-real-time anti-entropy
2017-06-27 16:25:18 +01:00
martinsumner
8b3ca78d49 spec help for SST file 2017-05-18 12:29:56 +01:00
Martin Sumner
4e9fa2a206 Timeout long-running snapshots
Add logic to timeout long-running snapshots.
2017-04-05 09:16:01 +01:00
martinsumner
e59585d733 Merge remote-tracking branch 'refs/remotes/origin/mas-etsmem-i52' into mas-sstfiveblocks 2017-03-21 18:25:18 +00:00
martinsumner
eef2199335 Up level for yield to 2 2017-03-21 18:24:11 +00:00
martinsumner
756b46bb4d Return to merge scan width of 16
This was reduced before the use of binary blocks was committed
2017-03-21 17:53:34 +00:00
martinsumner
1fdcdf3b37 Midblock size - lookup
No real reason for the midblock to be smaller in lookup slots - so give
the blocks a more consistent size
2017-03-21 17:47:08 +00:00
martinsumner
64e944d9ba Change to 5 blocks in SST Slot
Change to 5 blocks is intended to make the blocks in lookup slots
fractionally smaller, but more importantly to introduce a middle block
that cna be opened in a binary-split style fashion to reduce the number
of blocks that need to be opened for range queries.   Worst case for
full slots is 3 blocks now not 4.
2017-03-21 16:54:23 +00:00
martinsumner
dd0316eedf Yield on query selectively
Still not clear if yielding is the cause of memory problems, but taking
it away universally has impacted throughput.  At the very least we
should continue to yield on high-contention files (those at higher
levels), where the processes are more likely to be quickly terminated
anyway allowing GC to be invoked.
2017-03-21 11:03:29 +00:00
martinsumner
419541f5dd Fix to delete_pending state 2017-03-20 23:43:31 +00:00
martinsumner
415ac6017b Move sst get_kv range back inside process
Moved outside to stop blocking, but also avoids copy.  Move back out to
see if it may be related to the binary memory leak
2017-03-20 23:22:46 +00:00
martinsumner
5c662aeca1 Additional unit test
Need to test scenario where the key list the SST file created from is an
exact multiple of the slot size
2017-03-19 23:42:24 +00:00
martinsumner
431c2cee40 Remove unnecessary line
Brnach cannot be reached as firts key is always discovered when it is a
no_loolup
2017-03-19 23:37:50 +00:00
martinsumner
f20aba9c8b Curtail trimmed slot crazyness
There was complicated and confusing code that achieved nothing for
effiency when trimming slots.  the expensive part (binary_to_term) was
still needed on every block, and it was hard to get code coverage and
make sense of what it was really trying to achieve.

This is now much simpler - and may set us up for potential further
indexing help.
2017-03-19 21:47:22 +00:00
martinsumner
c203e2ee06 Range queries - pass out as binaries
Avoid converting to erlang temr wihtin the FSM and then requiring a copy
outside of the FSM - pass out as a binary
2017-03-17 10:47:20 +00:00
martinsumner
f287895db0 Pass out slots as a binary
If we convetr firts to a list, then the list has to be copied - passing
out as binaries means the bulk can be passed as references
2017-03-17 10:43:34 +00:00
martinsumner
5dbd7a2bc2 Check query out of range
It doesn't work - so protecting against it in fetch_range is pointless,
will blow up in lookup_slots
2017-03-16 08:43:18 +00:00
martinsumner
6199a2c352 RTrim
RTrim only worked in special case of key matching, that would never
occur in real world range query.  RTrim should really check for key
passing.

Returning empty list should not be possible - unless the query is
outside of the range entirely (and such a query should never go to this
SST).
2017-03-16 08:37:36 +00:00
martinsumner
dde37566b9 Add unit test for more than one slot 2017-03-15 16:40:43 +00:00
martinsumner
c6d17b998e Additional unit tests for SST range fetches
Resolve some of coverage issues
2017-03-15 11:27:46 +00:00
martinsumner
4b60c0e35b Scan width semi-reverted
No evidence from valume test that the scan width has made a posiitve
difference - so reverting, but not fully as slots may now be twice as
big, so sticking to half previous value
2017-03-14 01:18:50 +00:00
martinsumner
19bc838d90 Fix bad exit with no FK 2017-03-14 00:52:07 +00:00
martinsumner
a1c49b668a Fix empty file again
No special definition of empty required, as now an empty list when empty
2017-03-14 00:17:09 +00:00
martinsumner
2b0ec1d9cc Don't double-loop on slots
Previous version built a list of slots, then iterated over it to build a
list of binaries.  This converts the slot to a binary before building
the list
2017-03-13 23:51:48 +00:00
martinsumner
4f0622d2ac Merge remote-tracking branch 'refs/remotes/origin/mas-sstblock-i42' into mas-sstblockv2-i42 2017-03-13 21:09:13 +00:00
martinsumner
54534e725f Experiment with smaller scan width
When testing with large numbers of 2i terms (and hence more Riak
Metadata), there is a surge in slow response times when there are
multiple concurrent merge events.

This could be veyr short term CPU starvation because of the merge
process.  Perhaps it is delays waiting for the scan to complete -
smaller scanwidth may mena more interleaving and less latency?
2017-03-13 19:53:12 +00:00
martinsumner
f2cd9b3f33 Consistency of empty slotlist references
Need to return an empty slotlist in a consistent way
2017-03-11 13:04:55 +00:00
martinsumner
1f8de798bd Fix empty slot issue 2017-03-11 12:41:30 +00:00
martinsumner
a07770a3df Unit tets of lookup over-size issue
A mistake meant resetting to lookup on a skipped key would cause issues
if the skipped key ocurred under a no_lookup slot after the ?SLOT_SIZE
had been reached.  This caused the slot to switch to lookup, but beyond
the maximum size
2017-03-11 00:03:55 +00:00
martinsumner
4e4f498f20 Correctly set no_lookup on skip_key
Otherwise could change to lookup after the size limit has been reached
2017-03-10 23:48:17 +00:00
martinsumner
1813317121 Correctly identify empty slotlist 2017-03-10 22:49:00 +00:00
martinsumner
b2f3d882a9 Draft of branch to condense range_only keys 2017-03-10 20:43:37 +00:00
martinsumner
730ab2ec48 tidy out io:format 2017-03-10 11:10:15 +00:00
martinsumner
601f43de3d Merge remote-tracking branch 'refs/remotes/origin/master' into mas-sstblock-i42 2017-03-10 10:24:51 +00:00
martinsumner
d7eee2f9c9 Remove rogue log 2017-03-09 22:24:11 +00:00
martinsumner
4c59342600 Change SST reference to split filename
The manifest and the logs are bloated by having the full file path for
every filename in there - given the root path is constant.

Could also cause issues if the mount point is ever changed.
2017-03-09 21:23:09 +00:00
martinsumner
04cfb453c4 Fetch specific block only
Rely on CRC check in zlib.  Still need to catch on failure
2017-03-07 20:19:11 +00:00
martinsumner
bc5388710b Update SST comments 2017-03-04 20:47:46 +00:00
martinsumner
19534122a2 Coverage checks 2017-02-26 21:37:47 +00:00
martinsumner
7320b34681 Comment update 2017-01-25 12:38:33 +00:00
martinsumner
d57b74d967 Re-introduce tinybloom to SST
This had been removed due to the CPU cost of adding - however then the
tinybloom wa simplemented by directly manipulating bits through binary
comprehension - rather than applying bor band bsl bsr operations.

With these operations the cost of producing and checking the bloom is
<10% by comparison.
2017-01-24 21:51:12 +00:00
martinsumner
d225f4d7f5 Add use of leveled_tree to sst summary 2017-01-23 22:58:51 +00:00
martinsumner
c99c50ce6e Fix-up message exchange on confirm delete 2017-01-17 11:18:58 +00:00
martinsumner
c32fd3fb4c Change to use manifest_entry not straight PID in unit test 2017-01-17 10:14:40 +00:00