Commit graph

71 commits

Author SHA1 Message Date
martinsumner
a1c49b668a Fix empty file again
No special definition of empty required, as now an empty list when empty
2017-03-14 00:17:09 +00:00
martinsumner
2b0ec1d9cc Don't double-loop on slots
Previous version built a list of slots, then iterated over it to build a
list of binaries.  This converts the slot to a binary before building
the list
2017-03-13 23:51:48 +00:00
martinsumner
4f0622d2ac Merge remote-tracking branch 'refs/remotes/origin/mas-sstblock-i42' into mas-sstblockv2-i42 2017-03-13 21:09:13 +00:00
martinsumner
54534e725f Experiment with smaller scan width
When testing with large numbers of 2i terms (and hence more Riak
Metadata), there is a surge in slow response times when there are
multiple concurrent merge events.

This could be veyr short term CPU starvation because of the merge
process.  Perhaps it is delays waiting for the scan to complete -
smaller scanwidth may mena more interleaving and less latency?
2017-03-13 19:53:12 +00:00
martinsumner
f2cd9b3f33 Consistency of empty slotlist references
Need to return an empty slotlist in a consistent way
2017-03-11 13:04:55 +00:00
martinsumner
1f8de798bd Fix empty slot issue 2017-03-11 12:41:30 +00:00
martinsumner
a07770a3df Unit tets of lookup over-size issue
A mistake meant resetting to lookup on a skipped key would cause issues
if the skipped key ocurred under a no_lookup slot after the ?SLOT_SIZE
had been reached.  This caused the slot to switch to lookup, but beyond
the maximum size
2017-03-11 00:03:55 +00:00
martinsumner
4e4f498f20 Correctly set no_lookup on skip_key
Otherwise could change to lookup after the size limit has been reached
2017-03-10 23:48:17 +00:00
martinsumner
1813317121 Correctly identify empty slotlist 2017-03-10 22:49:00 +00:00
martinsumner
b2f3d882a9 Draft of branch to condense range_only keys 2017-03-10 20:43:37 +00:00
martinsumner
730ab2ec48 tidy out io:format 2017-03-10 11:10:15 +00:00
martinsumner
601f43de3d Merge remote-tracking branch 'refs/remotes/origin/master' into mas-sstblock-i42 2017-03-10 10:24:51 +00:00
martinsumner
d7eee2f9c9 Remove rogue log 2017-03-09 22:24:11 +00:00
martinsumner
4c59342600 Change SST reference to split filename
The manifest and the logs are bloated by having the full file path for
every filename in there - given the root path is constant.

Could also cause issues if the mount point is ever changed.
2017-03-09 21:23:09 +00:00
martinsumner
04cfb453c4 Fetch specific block only
Rely on CRC check in zlib.  Still need to catch on failure
2017-03-07 20:19:11 +00:00
martinsumner
bc5388710b Update SST comments 2017-03-04 20:47:46 +00:00
martinsumner
19534122a2 Coverage checks 2017-02-26 21:37:47 +00:00
martinsumner
7320b34681 Comment update 2017-01-25 12:38:33 +00:00
martinsumner
d57b74d967 Re-introduce tinybloom to SST
This had been removed due to the CPU cost of adding - however then the
tinybloom wa simplemented by directly manipulating bits through binary
comprehension - rather than applying bor band bsl bsr operations.

With these operations the cost of producing and checking the bloom is
<10% by comparison.
2017-01-24 21:51:12 +00:00
martinsumner
d225f4d7f5 Add use of leveled_tree to sst summary 2017-01-23 22:58:51 +00:00
martinsumner
c99c50ce6e Fix-up message exchange on confirm delete 2017-01-17 11:18:58 +00:00
martinsumner
c32fd3fb4c Change to use manifest_entry not straight PID in unit test 2017-01-17 10:14:40 +00:00
martinsumner
9832ecc369 Manifest now back to a simple list
This has refactored code with the implementation of the manifest
isolated in to a seperate module, and the pure async relationship
between penciller and their clerk.  However, the manifest is just a
simple list at each level.
2017-01-17 10:12:15 +00:00
martinsumner
13c81f0ed1 Basic working
Some basic tests working - but still outstanding issues.
2017-01-14 19:41:09 +00:00
martinsumner
5a88565c08 Switch to binary index in pmem
Remove the ets index in pmem and use a binary index instead.  This may
be slower, but avoids the bulk upload to ets, and means that matches
know of position (so only skiplists with a match need be tried).

Also stops the discrepancy between snapshots and non-snapshots - as
previously the snapshots were always slowed by not having access to the
ETS table.
2017-01-05 21:58:33 +00:00
martinsumner
2f8ff640a9 Test coverage
Add some furthe runit tests to improve test coverage
2017-01-04 21:36:59 +00:00
martinsumner
7d95fa6bbc Switch summary index
Simplify the summayr index implementation
2017-01-04 14:26:11 +00:00
martinsumner
2f3eb18548 Re-add usort
Change one thing at a time
2017-01-03 18:26:54 +00:00
martinsumner
c4ebaa9f57 Tidy Up All Hashes
As we're no longer generating a summayr bloom - no need to collect a big
list of hashes whilst building the sst file
2017-01-03 18:20:28 +00:00
martinsumner
e1d843a2eb Remove lastfetch cache
It appears to have some benefit at lower levels, but overall has less
benefit at higher levels.  Probably not worth having unless it cna be
controlled to go in at the basement only.
2017-01-03 15:26:44 +00:00
martinsumner
b6ae0e1af5 Fix broken SST cache 2017-01-03 13:03:59 +00:00
martinsumner
d28e5d639c Remove SST blooms 2017-01-03 09:12:41 +00:00
martinsumner
5b4c903d53 Check before update on bloom 2017-01-02 20:02:49 +00:00
martinsumner
31d4346806 Log improvements
Log on bad CRC, and also not seeing SST timing logs, so log these more
frequently
2017-01-02 18:54:19 +00:00
martinsumner
b3e189b012 Protect against div by 0
Make sure that blooms are always at least 1 slot in size
2017-01-02 18:38:14 +00:00
martinsumner
baa644383d Make tinybloom size configurable
Allow the bloom size to vary depending on how many fetchable keys there
are - so ther eis no large bloom held if most of the keys are index
entries for example
2017-01-02 18:29:15 +00:00
Martin Sumner
2079fff7f8 Switched to indexed blocks as slot implementation
Prior to this refactor, the slot and been made up of four blocks with
an external binary index.  Although the form of the index has changed
again, micro-benchmarking once again showed that this was a relatively
efficient mechanism.
2017-01-02 10:47:04 +00:00
Martin Sumner
c0d959beff Five alternatives explored 2016-12-29 22:22:13 +00:00
martinsumner
b509e81cfd Ongoing timing tests 2016-12-29 14:14:09 +00:00
martinsumner
b855401696 Experiment
Want to experiemnt with different datatypes for the slot - maybe use a
raw list but with a mini hashtree index like the CDB file
2016-12-29 14:11:05 +00:00
martinsumner
41ee90a2ef OTP16 compatability 2016-12-29 12:10:12 +00:00
martinsumner
a261d4793b Increase test size
Be able to read more into sample-based output
2016-12-29 12:01:42 +00:00
martinsumner
4784f8521a Entropy fiddle
Try and increase efefctiveness of bloom by combing Magic Hash with
phash2
2016-12-29 11:59:07 +00:00
martinsumner
fb75a26497 Handle mismatch on expanding pointer
Remove the nasty legacy of hard-coding for a scan width of 1
2016-12-29 10:46:12 +00:00
martinsumner
8f0bf8b892 Fix overlapping _ references 2016-12-29 10:34:53 +00:00
martinsumner
e01b310d20 Handle production of empty file 2016-12-29 05:09:47 +00:00
martinsumner
55386622f7 Fixed issues
Two issues - when the key range falls in-between two marks in the
summary, we didn't pick up any mark.  then when trimming both right and
left, the left trim was being discarded.
2016-12-29 04:37:49 +00:00
martinsumner
5b9e68df99 Add some crash protection for empty return from to_range
Not clear though why it would occur.
2016-12-29 03:04:10 +00:00
martinsumner
3f3b36597a Add timer for SST creation 2016-12-29 02:55:28 +00:00
martinsumner
a665b8ea4f Tidy-up unused variable 2016-12-29 02:41:02 +00:00