Commit graph

90 commits

Author SHA1 Message Date
martinsumner
d988c66ac6 Enhance unit tests for corruped segment filters 2016-10-24 11:44:28 +01:00
martinsumner
c78b5bca7d Basement Tombstones
Further progress towards the tidying up of basement tombstones in the
Ledger, with support added for key-listing to help with testing (and as
a potentially required feature).

The test is incomplete, but committing at this stage as the last commit
broke some tests (within the test code).

There are some outstanding questions about the handling of tombstones in
the Journal during compaction.  There exists a condition whereby values
could return if a recent journal is compacted and tombstones are removed
(as they are no longer present), but older journals have not been
compacted.  Now on stop/start - if the Ledger is wiped the removal of
the keys will be forgotten but the original PUTs would still remain.

The safest thing maybe to have rule that tombstones are never deleted
from the Inker's Journal - and accept the build-up of garbage.  Or there
could be an addition to the compaction process that checks back through
all the inker files to check that the Key of a tombstone is not present
in the past, before it is removed in the compaction.
2016-10-23 22:45:43 +01:00
martinsumner
e9c568a8b3 Test fix-up
There was a test that failed to close down a bookie and that caused some
issues.  The issues are double-reoslved, the close down was tidied as
well as the forgotten close being added back in.

There is some generla tidy around in anticipation of TTL support.
2016-10-21 21:26:28 +01:00
martinsumner
0a2053b557 Improved unit test of CRC chekcing in bloom filter
Confirm the impact of bit-flipping in the bloom filter
2016-10-21 16:08:41 +01:00
martinsumner
3710d09fbf Reuse codec key comparison
There was duplication of key comparison logic between leveled_codec and
leveled_sft.  Now both use the leveled_codec key_dominates function
2016-10-21 15:30:53 +01:00
martinsumner
b2089baa1e Correct tombstone handling
Prepare SFT files for handling tombstones correctly (without expiry
dates).

Also some work as it can be seen from tests that some SFT files ar enot
be cleared out correctly.  Pausing before trying t clear out the fles to
experiment and trial the possibility that there is a timing issue.
2016-10-21 15:21:37 +01:00
martinsumner
3ad9e42b61 Changed SFT shutdown to cast-based
The SFT shutdown process ahs become a series of casts to-and-from
between Penciller and SFT to stop the two processes syncronously making
requests on each other
2016-10-21 12:18:06 +01:00
martinsumner
c431bf3b0a Broken snapshot test
The test confirming that deleting sft files wer eheld open whilst
snapshots were registered was actually broken.  This test has now been
fixed, as well as the logic in registring snapshots which had used
ledger_sqn mistakenly rather than manifest_sqn.
2016-10-21 11:38:30 +01:00
martinsumner
caa8d26e3e Stop file check
File check now covered by measure in the sft_new path, whihc will backup
any existing file before moving.

This gets triggered by incomplete changes on shutdown.
2016-10-20 19:18:49 +01:00
martinsumner
5c2029668d Tombstone preperation
Some initial code changes preparing for the test and implementation of
tombstones and tombstone reaping
2016-10-20 16:00:08 +01:00
martinsumner
0324edd6f6 Rotating object tests
Recent fixes have been made to problems associated with rapidly changing
objexts especially on re-opening of the bookie.  Test of rotating
objects from both an index query and a fetch perspective added to better
detect such issues in the future.
2016-10-20 12:16:17 +01:00
martinsumner
cf66431c8e Smoother handling of back-pressure
The Penciller had two problems in previous commits:
- If it had a push_mem soon after a L0 file had been created, the
push_mem would stall waiting for the L0 file to complete - and this
count take 100-200ms
- The penciller's clerk favoured L0 work, but was lazy about asking for
other work in-between, so often the L1 layer was bursting over capacity
and the clerk was doing nothing but merging more L0 files in (with those
merges getting more and more expensive as they had to cover more and
more files)

There are some partial resolutions to this.  There is now an aggressive
timeout when checking whther the L0 file is ready on a push_mem, and if
the timeout is breached the error is caught and a 'returned' message
goes back to the Bookie.  the Bookie doesn't now empty its cache, it
carrie son filling it, but on some probability it will keep trying to
push_mem on future pushes.  This increases Jitter around the expensive
operation and split out the L0 delay into defined chunks.

The penciller's clerk is now more aggressive in asking for work.  There
is also some simplification of the relationship between clerk timeouts
and penciller back-pressure.

Also resolved is an issue of inconcistency between the loader and the on
startup (replaying the transaction log) and the standard push_mem
process.  The loader was not correctly de-duplicating by adding first
(in order) to a tree before outputting the list from the tree.

Some thought will be given later as to whether non-L0 work can be safely
prioritised if the merge process still keeps getting behind.
2016-10-20 02:23:45 +01:00
martinsumner
7319b8f415 Redundant clauses
Remove some redundant clauses, and fix up some logging
2016-10-19 20:51:30 +01:00
martinsumner
12fe1d01bd Penciller Manifest and Locking
The penciller had the concept of a manifest_lock - but it wasn't clear
what the purpose of it was.

The updating of the manifest has now been updated to reduce the code and
make the process cleaner and more obvious.  Now the committed manifest
only covers non-L0 levels.  A clerk can work concurrently on a manifest
change whilst the Penciller is accepting a new L0 file.

On startup the manifets is opened as well as any L0 file.  There is a
possible race condition with killing process where there may be a L0
file which is merged but undeleted - and this is believed to be inert.

There is some outstanding work still.  Currently the whole store is
paused if a push_mem is received by the Penciller, and the writing of a
L0 sft file has not been completed.  The creation of a L0 file appears
to take about 300ms, so if the ledger_cache fills in this period a pause
will occurr (perhaps due to objects with lots of index entries).  It
would be preferable to pause more elegantly in this situation.  Perhaps
there should be a harsh timeout on the call to check the SFT complete,
and catching it should cause a refused response.  The next PUT will then
wait, but a any queued GETs can progress.
2016-10-19 17:34:58 +01:00
martinsumner
f16f71ae81 Revert ominshambles performance refactoring
To try and improve performance index entries had been removed from the
Ledger Cache, and a shadow list of the LedgerCache (in SQN order) was
kept to avoid gb_trees:to_list on push_mem.

This did not go well.  The issue was that ets does not deal with
duplicate keys in the list when inserting (it will only insert one, but
it is not clear which one).

This has been reverted back out.

The ETS parameters have been changed to [set, private].  It is not used
as an iterator, and is no longer passed out of the process (the
memtable_copy is sent instead).  This also avoids the tab2list function
being called.
2016-10-19 00:10:48 +01:00
martinsumner
8f29a6c40f Complete 2i work - some refactoring
The 2i work now has tests for removals as well as regex etc.

Some initial refactoring work has also been tried - to try and take some
tasks of the critical path of push_mem.  The primary change has been to
avoid putting index keys into the gb_tree, and building the KeyChanges
list in parallel to the gb_tree (now known as ObjectTree) within the
Ledger Cache.

Some initial experiments done as to changing the ETS table in the
Penciller now that it will now be used for iterating - but that has been
reverted for now.
2016-10-18 19:41:33 +01:00
martinsumner
905b712764 2i query test
The 2i query test added in the previous commit didn't correctly test
regex queries.  This has now been improved.
2016-10-18 09:42:33 +01:00
martinsumner
3e475f46e8 Support for 2i query part1
Added basic support for 2i query.  This involved some refactoring of the
test code to share functions between suites.

There is sill a need for a Part 2 as no tests currently cover removal of
index entries.
2016-10-18 01:59:18 +01:00
Russell Brown
ac0504e79e Merge pull request #1 from martinsumner/rdb/fix-test-include
Fix include target
2016-10-17 14:25:27 +01:00
Russell Brown
59ea46120e Fix include target 2016-10-17 14:24:32 +01:00
martinsumner
8653e9d90d Improve inker unit test
Change in filename labelling had stopped a unit test from covering
stratup correctly.  Now offering better coverage
2016-10-16 16:58:55 +01:00
martinsumner
e3ce372f31 Delete
Add functionality to delete keys.  No tombstone reaping yet.
2016-10-16 15:41:09 +01:00
martinsumner
ed17e44f52 Improve test coverage
Some additional tests following previous refactoring for abstraction,
primarily to make manifest print safer an dprove co-existence of Riak
and non-Riak objects.
2016-10-14 22:58:01 +01:00
martinsumner
7eb5a16899 Supporting Tags - Improving abstraction between Riak and non-Riak workloads
The object tag "o" which was taken from eleveldb has been an extended to
allow for specific functions to be triggered for different object types,
in particular when extracting metadata for stroing in the Ledger.

There is now a riak tag (o_rkv@v1), and in theory other tags can be
added and used, as long as their is an appropriate set of functions in
the leveled_codec.
2016-10-14 18:43:16 +01:00
martinsumner
9be0f96406 Or process calculation of the Hash Table
When the journal CDB file is called to roll it now starts a new clerk to
perform the hashtable calculation (which may take many seconds).  This
stops the store from getting blocked if there is an attempt to GET from
the journal that has just been rolled.

The journal file process now has  anumber fo distinct states (reading,
writing, pending_roll, closing).  A future refactor may look to make
leveled_cdb a gen_fsm rather than a gen_server.
2016-10-14 13:36:12 +01:00
martinsumner
bbdac65f8d Split out key codec
Aplit out key codec, and also saner approach to key comparison (although
still awkward).
2016-10-13 21:02:15 +01:00
martinsumner
de54a28328 Load and Count test
This test exposed two bugs:
- Yet another set of off-by-one errors (really stupidly scanning the
Manifest from Level 1 not Level 0)
- The return of an old issue related to scanning the journal on load
whereby we fail to go back to the previous file before the current SQN
2016-10-13 17:51:47 +01:00
martinsumner
2d981cb2e7 FAdvise
Add fadvise magic to SFT files.  Also delete unnecessary rice modeule
2016-10-12 18:10:47 +01:00
martinsumner
938cc0fc16 Re-add tests
Oops - committed with tests commented out
2016-10-12 17:35:32 +01:00
martinsumner
0a08867280 Iterator support
Add iterator support, used initially only for retrieving bucket
statistics.

The iterator is supported by exporting a function, and when the function
is claled it will take a snapshot of the ledger, run the iterator and
hten close the snapshot.

This required a numbe rof underlying changes, in particular to get key
comparison to work as "expected".  The code had previously misunderstood
how comparison worked between Erlang terms, and in particular did not
account for tuple length being compared first by size of the tuple (and
not just by each element in order).
2016-10-12 17:12:49 +01:00
martinsumner
d2cc07a9eb Doc update and clerk<->penciller changes
Reviewing code to update comments revealed a weakness in the sequence of
events between penciller and clerk committing a manifest change wherby
an ill-timed crash could lead to files being deleted without the
manifest changing.

A different, and safer pattern now used between theses two actors.
2016-10-09 22:33:45 +01:00
martinsumner
4a8a2c1555 Code reduction refactor
An attempt to refactor out more complex code.

The Penciller clerk and Penciller have been re-shaped so that there
relationship is much simpler, and also to make sure that they shut down
much more neatly when the clerk is busy to avoid crashdumps in ct tests.

The CDB now has a binary_mode - so that we don't do binary_to_term twice
... although this may have made things slower ??!!?  Perhaps the
is_binary check now required on read is an overhead.  Perhaps it is some
other mystery.

There is now a more effiicient fetching of the size on pcl_load now as
well.
2016-10-08 22:15:48 +01:00
martinsumner
8dfeb520ef Inker Refactor
Inker refactored to block on manifest write.  If this is inefficient the
manifets write can be converted ot an append only operation.

Waiting on the manifest write makes the logic at startup much easier to
manage.
2016-10-07 18:07:03 +01:00
martinsumner
2055f8ed3f Add more complex snapshot test
This exposed another off-by-one error on startup.

This commit also includes an unsafe change to reply early from a rolling
CDB file (with lots of objects writing the hash table can take too
long).  This is bad, but will be resolved through a refactor of the
manifest writing:  essentially we deferred writing of the manifest
update which was an unnecessary performance optimisation.  If instead we
wait on this, the process is made substantially simpler, and it is safer
to perform the roll of the complete CDB journal asynchronously.  If the
manifest update takes too long, an append-only log may be used instead.
2016-10-07 10:04:48 +01:00
martinsumner
f58f4d0ea5 Mini Refactor
Thought about the mess, thought about swithcing to a FSM, throught about
just sorting a bit of the mess instead.
2016-10-06 13:23:20 +01:00
martinsumner
ad5aebe93e Further work on system tests
Another issue exposed with laziness in the using an incomplete ledger
when checking for presence during compaction.
2016-10-05 18:28:31 +01:00
martinsumner
d903f184fd Add initial end-to-end common tests
These tests highlighted some logical issues when scanning over databases
on startup, so fixes are wrapped in here.
2016-10-05 09:54:53 +01:00
martinsumner
507428bd0b Add initial system test
Add some initial system tests.  This highlighted issues:
- That files deleted by compaction would be left orphaned and not close,
and would not in fact delete (now deleted by closure only)
- There was an issue on stratup that the first few keys in each journal
would not be re-loaded into the ledger
2016-10-03 23:34:28 +01:00
martinsumner
15f57a0b4a Further Journal compaction tests
Improved unit testing
2016-09-28 18:26:52 +01:00
martinsumner
50b50ba486 Inker Clerk - Further Testing
Expanded the unit tetsing of the Inker Clerk actor.  Still WIP
2016-09-28 11:41:56 +01:00
martinsumner
d24b100aa6 Initial work on Journal Compaction
Largely untested work at this stage to allow for the Inker to request
the Inker's clerk to perform a single round of compact based on the best
run of files it can find.
2016-09-27 14:58:26 +01:00
martinsumner
e2bb09b873 Snapshot testing
Work to test the checking of sequence numbers in snapshots as required
by the inkers clerk to calculate the percentage of a file which is
compactable
2016-09-26 10:55:08 +01:00
martinsumner
c64d67d9fb Snapshot Work - Interim Commit
Some initial work to get snapshots going.

Changes required, as need to snapshot through the Bookie to ensure that
there is no race between extracting the Bookie's in-memory view and the
Penciller's view if a push_to_mem has occurred inbetween.

A lot still outstanding, especially around Inker snapshots, and handling
timeouts
2016-09-23 18:50:29 +01:00
martinsumner
d3e985ed80 Refactor Penciller Push
Two aspects of pushing to the penciller have been refactored:
1 - Allow the penciller to respond before the ETS table has been updated
to unlock the Bookie sooner.
2 - Change the way the copy of the memtable is stored to work more
effectively with snapshots wihtout locking the Penciller any further on
a snapshot or push request
2016-09-21 18:31:42 +01:00
martinsumner
66d6db4e11 Support for random sampling
Makes the ability to get positions and the fetch directly by position
more generic - supporting the fetch of different flavours of
combinations, and requesting a sample of positions not just all
2016-09-20 18:24:05 +01:00
martinsumner
aa7d235c4d Rename clerk and CDB Speed-Up
CDB did many "bitty" reads/writes when scanning or writing hash tables -
change these to bult reads and writes to speed up.

CDB also added capabilities to fetch positions and get keys by position
to help with iclerk role.
2016-09-20 16:13:36 +01:00
martinsumner
c10eaa75cb Dialyzer changes
Some chnages to improve dialyzer pass rate
2016-09-20 10:17:24 +01:00
martinsumner
4e28e4173c Rebar and eunit changes
Initial rebar compile - which exposed eunit tets failures associated
with changes to file structures and filename references
2016-09-19 18:50:11 +01:00
martinsumner
a1c970a66a Manifest ordering
Be more explicit about manifest ordering to stop keys being laoded in
incorrect order
2016-09-19 15:56:35 +01:00
martinsumner
7c28ffbd96 Further bookie test - CDB optimisation and Inker manifest correction
Additional bookie test revealed that the persisting/reading of inker
manifests was inconsistent and buggy.

Also, the CDB files were inffeciently writing the top index table -
needed to be improved as this is blokicng on a roll
2016-09-19 15:31:26 +01:00