Commit graph

413 commits

Author SHA1 Message Date
martinsumner
0204a23a58 Refactor - STILL BROKEN
Will at least compile, but in need of a massive eunit rewrite and
associated debug to get back to a potentially verifiable state again
2017-01-13 18:23:57 +00:00
martinsumner
08641e05cf Manifest changes - BROKEN
Going to abandond this branch for now.  The change is beoming
excessively time consuming, and it is not clear that a smaller change
might not achieve more of the objectives.

All this is broken - but perhaps could get picke dup another day.
2017-01-12 13:48:43 +00:00
martinsumner
439ddfa4eb Add support for initiate
The penciller requires a starting manifest SQN as well as a list of
filenames to open on startup.  It is assumed that the penciller will
keep the mappings between fileames and PIDs outside of the manifest.
2017-01-09 15:19:11 +00:00
martinsumner
ed27a53452 New manifest code
The manifest had previously been a list for eveyr leevl of the manifest,
and keys were found by folding over the list.  By Level 4 the list will
be 4096 items long, and so the fold would be expensive, and would be
required many times.

To make this less expensive an ETS table is to use.  However, the ETS
table needs to be shared between snapshots and so in order to use the
ETS the entries to the table need to support multi-versioning - whereby
each clone can see a version of the table at the Manifest SQN the clone
is supporting.
2017-01-09 14:52:26 +00:00
martinsumner
b2bb4ce73e Merge pull request #15 from martinsumner/mas-altmem
Mas altmem
2017-01-06 10:30:03 +00:00
martinsumner
83be8041f0 Update comments to reflect changes 2017-01-06 10:09:15 +00:00
martinsumner
24ec918eb1 Handle no-lookup hash in retain strategy 2017-01-05 22:42:59 +00:00
martinsumner
a617f8fb66 Fix snapshot_store - add index to clone
Clone was not getting the updated index
2017-01-05 22:17:30 +00:00
martinsumner
5a88565c08 Switch to binary index in pmem
Remove the ets index in pmem and use a binary index instead.  This may
be slower, but avoids the bulk upload to ets, and means that matches
know of position (so only skiplists with a match need be tried).

Also stops the discrepancy between snapshots and non-snapshots - as
previously the snapshots were always slowed by not having access to the
ETS table.
2017-01-05 21:58:33 +00:00
martinsumner
1d3fb18df7 Resolve snapshotting issue
Need to make sure the extract from ets happens at the point the snapshot
is taken.
2017-01-05 18:43:55 +00:00
martinsumner
2c828b8eca Fix snapshot issue 2017-01-05 17:55:27 +00:00
martinsumner
e6270d288f Half-way to ets for Bookie mem
A half-way implementation with use of ETS as the bookie's memory
2017-01-05 17:00:12 +00:00
martinsumner
bbdb35ae03 Add ordered_set conversion
Can we go from an ets table to a skiplist
2017-01-05 14:09:39 +00:00
martinsumner
34a25bdb88 Improve from_list in skiplist
form_list had taken a suprrising amount of time - so improved the
efficiency of this
2017-01-05 13:57:38 +00:00
martinsumner
c43014a0ee Merge pull request #14 from martinsumner/mas-altsst
Mas altsst
2017-01-04 23:30:42 +00:00
martinsumner
2f8ff640a9 Test coverage
Add some furthe runit tests to improve test coverage
2017-01-04 21:36:59 +00:00
martinsumner
6e8f8a9c86 Strip out extra stuff from skiplist 2017-01-04 17:19:27 +00:00
martinsumner
f1d26e279c Merge branch 'mas-altsst' of https://github.com/martinsumner/eleveleddb into mas-altsst 2017-01-04 14:26:19 +00:00
martinsumner
7d95fa6bbc Switch summary index
Simplify the summayr index implementation
2017-01-04 14:26:11 +00:00
Martin Sumner
8289c3b783 full reversion 2017-01-04 00:26:52 +00:00
Martin Sumner
85aaccfe31 Revert to non-split tinybloom 2017-01-03 23:53:57 +00:00
Martin Sumner
be1d678d85 Revert to two hash tiny bloom 2017-01-03 23:43:43 +00:00
martinsumner
2f3eb18548 Re-add usort
Change one thing at a time
2017-01-03 18:26:54 +00:00
martinsumner
6ab9f72d8c Merge branch 'mas-altsst' of https://github.com/martinsumner/eleveleddb into mas-altsst 2017-01-03 18:20:36 +00:00
martinsumner
c4ebaa9f57 Tidy Up All Hashes
As we're no longer generating a summayr bloom - no need to collect a big
list of hashes whilst building the sst file
2017-01-03 18:20:28 +00:00
Martin Sumner
fba70edc94 Stop sort
sort probably doesn’t help
2017-01-03 17:08:40 +00:00
martinsumner
70c6e52fa7 Remove logs for slot_cache 2017-01-03 15:27:28 +00:00
martinsumner
e1d843a2eb Remove lastfetch cache
It appears to have some benefit at lower levels, but overall has less
benefit at higher levels.  Probably not worth having unless it cna be
controlled to go in at the basement only.
2017-01-03 15:26:44 +00:00
martinsumner
b6ae0e1af5 Fix broken SST cache 2017-01-03 13:03:59 +00:00
martinsumner
d28e5d639c Remove SST blooms 2017-01-03 09:12:41 +00:00
martinsumner
5b4c903d53 Check before update on bloom 2017-01-02 20:02:49 +00:00
martinsumner
31d4346806 Log improvements
Log on bad CRC, and also not seeing SST timing logs, so log these more
frequently
2017-01-02 18:54:19 +00:00
martinsumner
b3e189b012 Protect against div by 0
Make sure that blooms are always at least 1 slot in size
2017-01-02 18:38:14 +00:00
martinsumner
baa644383d Make tinybloom size configurable
Allow the bloom size to vary depending on how many fetchable keys there
are - so ther eis no large bloom held if most of the keys are index
entries for example
2017-01-02 18:29:15 +00:00
martinsumner
972aa85012 Try three hash tinybloom
Improved fpr in three hash bloom - so examine performance
2017-01-02 18:09:36 +00:00
Martin Sumner
2079fff7f8 Switched to indexed blocks as slot implementation
Prior to this refactor, the slot and been made up of four blocks with
an external binary index.  Although the form of the index has changed
again, micro-benchmarking once again showed that this was a relatively
efficient mechanism.
2017-01-02 10:47:04 +00:00
Martin Sumner
c0d959beff Five alternatives explored 2016-12-29 22:22:13 +00:00
martinsumner
b509e81cfd Ongoing timing tests 2016-12-29 14:14:09 +00:00
martinsumner
b855401696 Experiment
Want to experiemnt with different datatypes for the slot - maybe use a
raw list but with a mini hashtree index like the CDB file
2016-12-29 14:11:05 +00:00
martinsumner
41ee90a2ef OTP16 compatability 2016-12-29 12:10:12 +00:00
martinsumner
a261d4793b Increase test size
Be able to read more into sample-based output
2016-12-29 12:01:42 +00:00
martinsumner
4784f8521a Entropy fiddle
Try and increase efefctiveness of bloom by combing Magic Hash with
phash2
2016-12-29 11:59:07 +00:00
martinsumner
fb75a26497 Handle mismatch on expanding pointer
Remove the nasty legacy of hard-coding for a scan width of 1
2016-12-29 10:46:12 +00:00
martinsumner
8f0bf8b892 Fix overlapping _ references 2016-12-29 10:34:53 +00:00
martinsumner
afb28aa7d6 Switch iterator scan width to macro
And 4 seems a more reasonable number than 1
2016-12-29 10:21:57 +00:00
martinsumner
7049aaf5ca Better attempt to handle empty file being generated 2016-12-29 09:35:58 +00:00
martinsumner
0c543ae3ec Remove legacy logs 2016-12-29 05:10:11 +00:00
martinsumner
e01b310d20 Handle production of empty file 2016-12-29 05:09:47 +00:00
martinsumner
55386622f7 Fixed issues
Two issues - when the key range falls in-between two marks in the
summary, we didn't pick up any mark.  then when trimming both right and
left, the left trim was being discarded.
2016-12-29 04:37:49 +00:00
martinsumner
5b9e68df99 Add some crash protection for empty return from to_range
Not clear though why it would occur.
2016-12-29 03:04:10 +00:00