From af69f946cf9acbbf009e65f9bc6b2704b688a121 Mon Sep 17 00:00:00 2001 From: martinsumner Date: Tue, 6 Jun 2017 16:44:05 +0100 Subject: [PATCH] Add further compaction comments Link to the compaction branch and add further description --- docs/FUTURE.md | 14 ++++++++++++++ docs/VOLUME.md | 4 ++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/docs/FUTURE.md b/docs/FUTURE.md index 8fe3c22..301afc3 100644 --- a/docs/FUTURE.md +++ b/docs/FUTURE.md @@ -125,3 +125,17 @@ Description: The riak_kv_sweeper which is part of the post-2.2 develop branch controls folds over objects so that multiple functions can be applied to a single fold. The only aspect of the Riak system that uses this feature at present is AAE hashtree rebuilds. This branch modifies the kv_sweeper so that if the capability exists, and unless a sweeper has explicitly stated a requirement not to allow this feature, the sweeper can defer the fetching of the objects. This means that the sweeper will fold over the "heads" of the objects returning a specially crafter Riak Object which contains a reference to the body rather than the actual body - so that the object body can be fetched if and only if access to the object contents is requested via the riak_object module. + +### Journal compaction + +Branch: [mas-leveled-autocompact](https://github.com/martinsumner/riak_kv/tree/mas-leveled-autocompact) + +Branched-From: [mas-leveled-scanner-i649](https://github.com/martinsumner/riak_kv/tree/mas-leveled-scanner-i649) + +Description: + +Allows for the hours of day in which compaction of the Journal compaction will be run to be configurable. Also configurable, is the number of times (approximately) each vnode should run journal compaction each day. + +The number of times this will need to be run will depend on the distribution of updates - most specifically what proportion of PUTs are changes as opposed to new data. + +Cuttlefish config is still broken, so changes to config should be made through the riak_kv_leveled.hrl include file. diff --git a/docs/VOLUME.md b/docs/VOLUME.md index a5d9df0..7056350 100644 --- a/docs/VOLUME.md +++ b/docs/VOLUME.md @@ -210,7 +210,7 @@ The sweeper mechanism is a new facility in the riak_kv develop branch, and has a If the same test is run with a leveldb backend but with the pre-sweeper fold mechanism, then total throughput across the is improved by 8.9%. However, this throughput reduction comes at the cost of a 90% reduction in the number of rebuilds completed within the test. -## Riak Cluster Test - Phase 3 - Compaction +## Riak Cluster Test - Phase 3 - Journal Compaction When first developing the issue of compacting the value store was left to one side from a performance perspective, under the assumption that compaction would occur in some out-of-hours window. Bitcask is configurable in this way, but also manages to do continuous compaction without major performance issues. @@ -223,7 +223,7 @@ This was tested with: - no sync on write, - 5 x i2.2x nodes, - 12 hour duration, -- 200M keys with a pareto distribution (and hence significant value rotation in the 20%). +- 200M keys with a pareto distribution (and hence significant value rotation in the most commonly accessed keys). With 10 compaction events per day, after the 12 hour test 155GB per node had been compacted out of the value store during the test. In the 12 hours following the test, a similar amount was compacted - to the point there was rough equivalence in node volumes between the closing state of the leveled test and the closing state of the leveldb test.