Correct docs
Following feedback from RDB
This commit is contained in:
parent
5d8095018f
commit
d6a45d7754
5 changed files with 24 additions and 10 deletions
|
@ -46,7 +46,7 @@ Both the Penciller and the Inker each make use of their own dedicated clerk for
|
|||
|
||||
### File Clerks
|
||||
|
||||
Every file within the store has is owned by its own dedicated process (modelled as a finite state machine). Files are never created or accessed by the Inker or the Penciller, interactions with the files are managed through messages sent to the File Clerk processes which own the files.
|
||||
Every file within the store has its own dedicated process (modelled as a finite state machine). Files are never created or accessed by the Inker or the Penciller, interactions with the files are managed through messages sent to the File Clerk processes which own the files.
|
||||
|
||||
The File Clerks themselves are ignorant to their context within the store. For example a file in the Ledger does not know what level of the Tree it resides in. The state of the store is represented by the Manifest which maintains a picture of the store, and contains the process IDs of the file clerks which represent the files.
|
||||
|
||||
|
@ -58,11 +58,11 @@ File clerks spend a short initial portion of their life in a writable state. On
|
|||
|
||||
## Clones
|
||||
|
||||
Both the Penciller and the Inker can be cloned, to provide a snapshot of the database at a point in time for long-running process that can be run concurrently to other database actions. Clones are used for Journal compaction, but also for scanning queries in the penciller (for example to support 2i queries or hashtree rebuilds in Riak).
|
||||
Both the Penciller and the Inker can be cloned, to provide a snapshot of the database at a point in time. A long running process may then use this clone to query the database concurrently to other database actions. Clones are used for Journal compaction, but also for scanning queries in the penciller (for example to support 2i queries or hashtree rebuilds in Riak).
|
||||
|
||||
The snapshot process is simple. The necessary loop-state is requested from the real worker, in particular the manifest and any immutable in-memory caches, and a new gen_server work is started with the loop state. The clone registers itself as a snapshot with the real worker, with a timeout that will allow the snapshot to expire if the clone silently terminates. The clone will then perform its work, making requests to the file workers referred to in the manifest. Once the work is complete the clone should remove itself from the snapshot register in the real worker before closing.
|
||||
The snapshot process is simple. The necessary loop-state is requested from the real worker, in particular the manifest and any immutable in-memory caches, and a new gen_server worker is started with the loop state. The clone registers itself as a snapshot with the real worker, with a timeout that will allow the snapshot to expire if the clone silently terminates. The clone will then perform its work, making requests to the file workers referred to in the manifest. Once the work is complete the clone should remove itself from the snapshot register in the real worker before closing.
|
||||
|
||||
The snapshot register is used by the real worker when file workers are placed in the delete_pending state after they have been replaced in the current manifest. Checking the list of registered snapshots allows the Penciller or Inker to inform the File Clerk if they remove themselves permanently - as no remaining clones may expect them to be present (except those who have timed out).
|
||||
The snapshot register is used by the real worker when file workers are placed in the delete_pending state. File processes enter this state when they have been replaced in the current manifest, but access may still be needed by a cloned process using an older version of the manifest. The file process should poll the Penciller/Inker in this state to check if deletion can be completed, and the Penciller/Inker should check the register of snapshots to confirm that no active snapshot has a potential dependency on the file before responding to proceed with deletion.
|
||||
|
||||
Managing the system in this way requires that ETS tables are used sparingly for holding in-memory state, and immutable and hence shareable objects are used instead. The exception to the rule is the small in-memory state of recent additions kept by the Bookie - which must be exported to a list on every snapshot request.
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ The following section is a brief overview of some of the motivating factors behi
|
|||
|
||||
## A Paper To Love
|
||||
|
||||
The concept of a Log Structured Merge Tree is described within the 1996 paper ["The Log Structured Merge Tree"](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.2782&rep=rep1&type=pdf) by Patrick O'Neil et al. The paper is not specific on precisely how a LSM-Tree should be implemented, proposing a series of potential options. The paper's focus is on framing the justification for design decisions in the context of hardware economics.
|
||||
The concept of a Log Structured Merge Tree is described within the 1996 paper ["The Log Structured Merge Tree"](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.2782&rep=rep1&type=pdf) by Patrick O'Neil et al. The paper is not specific on precisely how an LSM-Tree should be implemented, proposing a series of potential options. The paper's focus is on framing the justification for design decisions in the context of hardware economics.
|
||||
|
||||
The hardware economics at the time of the paper were:
|
||||
|
||||
|
@ -24,7 +24,7 @@ Based on the experience of running Riak at scale in production for the NHS, what
|
|||
|
||||
The purchase costs of disk though, do not accurately reflect the running costs of disks - because disks fail, and fail often. Also the hardware choices made by the NHS for the Spine programme, do not necessarily reflect the generic industry choices and costs.
|
||||
|
||||
To get an up-to-date objective measure of the overall exists are, assuming data centres are run with a high degree of efficiency and savings of scale - the price list for Amazon EC2 instances can assist, by directly comparing the current prices of different instances.
|
||||
To get an up-to-date and objective measure of what the overall costs are, the Amazon price list can assist if we assume their data centres are managed with a high-degree of efficiency due to their scale. Assumptions on the costs of individual components can be made by examining differences between specific instance prices.
|
||||
|
||||
As at 26th January 2017, the [on-demand instance pricing](https://aws.amazon.com/ec2/pricing/on-demand/) for servers is:
|
||||
|
||||
|
|
16
docs/WHY.md
16
docs/WHY.md
|
@ -2,7 +2,7 @@
|
|||
|
||||
## Why not just use RocksDB?
|
||||
|
||||
Well that wouldn't be interesting.
|
||||
Well that wouldn't have been as interesting.
|
||||
|
||||
All LSM-trees which evolve off the back of leveldb are trying to improve leveldb in a particular context. I'm considering larger values, with need for iterators and object time-to-lives, optimisations by supporting HEAD requests and also the problem of running multiple isolated nodes in parallel on a single server.
|
||||
|
||||
|
@ -44,6 +44,20 @@ From the start I decided that fadvise would be my friend, in part as:
|
|||
|
||||
Ultimately though, sophisticated memory management is hard, and beyond my capability in the timescale available.
|
||||
|
||||
The design may make some caching strategies relatively easy to implement in the future though. Each file process has its own LoopData, and to that LoopData independent caches can be added. This is currently used for caching bloom filters and hash index tables, but could be used in a more flexible way.
|
||||
|
||||
## Why make this backwards compatible with OTP16?
|
||||
|
||||
Yes why, why do I have to do this?
|
||||
|
||||
## Why name things this way?
|
||||
|
||||
The names used in the Actor model are loosely correlated with names used for on-course bookmakers (e.g. Bookie, Clerk, Penciller).
|
||||
|
||||

|
||||
|
||||
There is no specific reason for drawing this specific link, other than the generic sense that this group represents a tight-nit group of workers passing messages from a front-man (the bookie) to keep a local view of state (a book) for a queue of clients, and where normally these groups are working in a loosely-coupled orchestration with a number of other bookmakers to form a betting market that is converging towards an eventually consistent price.
|
||||
|
||||

|
||||
|
||||
There were some broad parallels between bookmakers in a market and vnodes in a Riak database, and using the actor names just stuck, even though the analogy is imperfect.
|
BIN
docs/pics/ascot_bookies.jpg
Normal file
BIN
docs/pics/ascot_bookies.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 114 KiB |
BIN
docs/pics/betting_market.jpg
Normal file
BIN
docs/pics/betting_market.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 15 KiB |
Loading…
Add table
Add a link
Reference in a new issue