The current mechanism of re-loading data from the Journla to the Ledger
from any potential SQN is not safe when combined with Journla
compaction.
This commit doesn't resolve thes eproblems, but starts the groundwork
for resolving by introducing Inker Key Types. These types would
differentiate between objects which are standard Key/Value pairs,
objects which are tombstones for keys, and objects whihc represent Key
Changes only.
The idea is that there will be flexible reload strategies based on
object tags
- retain (retain a key change object when compacting a standard object)
- recalc (allow key changes to be recalculated from objects and ledger
state when loading the Ledger from the journal
- recover (allow for the potential loss of data on loss within the
perisste dpart of the ledger, potentially due to recovery through
externla anti-entropy operations).
Added basic support for 2i query. This involved some refactoring of the
test code to share functions between suites.
There is sill a need for a Part 2 as no tests currently cover removal of
index entries.
The object tag "o" which was taken from eleveldb has been an extended to
allow for specific functions to be triggered for different object types,
in particular when extracting metadata for stroing in the Ledger.
There is now a riak tag (o_rkv@v1), and in theory other tags can be
added and used, as long as their is an appropriate set of functions in
the leveled_codec.
When the journal CDB file is called to roll it now starts a new clerk to
perform the hashtable calculation (which may take many seconds). This
stops the store from getting blocked if there is an attempt to GET from
the journal that has just been rolled.
The journal file process now has anumber fo distinct states (reading,
writing, pending_roll, closing). A future refactor may look to make
leveled_cdb a gen_fsm rather than a gen_server.
An attempt to refactor out more complex code.
The Penciller clerk and Penciller have been re-shaped so that there
relationship is much simpler, and also to make sure that they shut down
much more neatly when the clerk is busy to avoid crashdumps in ct tests.
The CDB now has a binary_mode - so that we don't do binary_to_term twice
... although this may have made things slower ??!!? Perhaps the
is_binary check now required on read is an overhead. Perhaps it is some
other mystery.
There is now a more effiicient fetching of the size on pcl_load now as
well.
Inker refactored to block on manifest write. If this is inefficient the
manifets write can be converted ot an append only operation.
Waiting on the manifest write makes the logic at startup much easier to
manage.
This exposed another off-by-one error on startup.
This commit also includes an unsafe change to reply early from a rolling
CDB file (with lots of objects writing the hash table can take too
long). This is bad, but will be resolved through a refactor of the
manifest writing: essentially we deferred writing of the manifest
update which was an unnecessary performance optimisation. If instead we
wait on this, the process is made substantially simpler, and it is safer
to perform the roll of the complete CDB journal asynchronously. If the
manifest update takes too long, an append-only log may be used instead.
Add some initial system tests. This highlighted issues:
- That files deleted by compaction would be left orphaned and not close,
and would not in fact delete (now deleted by closure only)
- There was an issue on stratup that the first few keys in each journal
would not be re-loaded into the ledger
Largely untested work at this stage to allow for the Inker to request
the Inker's clerk to perform a single round of compact based on the best
run of files it can find.
Makes the ability to get positions and the fetch directly by position
more generic - supporting the fetch of different flavours of
combinations, and requesting a sample of positions not just all
CDB did many "bitty" reads/writes when scanning or writing hash tables -
change these to bult reads and writes to speed up.
CDB also added capabilities to fetch positions and get keys by position
to help with iclerk role.
Additional bookie test revealed that the persisting/reading of inker
manifests was inconsistent and buggy.
Also, the CDB files were inffeciently writing the top index table -
needed to be improved as this is blokicng on a roll
Make scanning over a CDB file generic rather than specific to read-in of
active nursery log - open to be called as an external function to
support other scanning behaviour.
Add test to show inker rolling journal. to achieve needs to make CDB
size an option, and also alter the manifest sorting so that
find_in_manifest actually works!
An attempt to get a first inker that can build a ledger from a manifest
as well as support simple get and put operations. Basic tests surround
the building of manifests only at this stage - more work required for
get and put.
CDB was failing tests (was it always this way?). There has been a
little bit of a patch-up of the test, but there are still some
potentially outstanding issues with scanning over a file when attempting
to read beyond the end of the file.
Tabbing reformatting and general tidy.
Concierge documentation development ongoing.