Add journal compaction testing

This commit is contained in:
martinsumner 2017-06-06 16:30:02 +01:00
parent 4cf3eea1eb
commit 94f3e036ea
4 changed files with 37 additions and 4 deletions

View file

@ -29,8 +29,6 @@ There is some work required before LevelEd could be considered production ready:
- Introduction of property-based testing.
- Amend compaction scheduling to ensure that all vnodes do not try to concurrently compact during a single window.
- Improved handling of corrupted files.
- A way of identifying the partition in each log to ease the difficulty of tracing activity when multiple stores are run in parallel.

View file

@ -212,9 +212,44 @@ If the same test is run with a leveldb backend but with the pre-sweeper fold mec
## Riak Cluster Test - Phase 3 - Compaction
to be completed ..
When first developing the issue of compacting the value store was left to one side from a performance perspective, under the assumption that compaction would occur in some out-of-hours window. Bitcask is configurable in this way, but also manages to do continuous compaction without major performance issues.
For this phase, a new compaction feature was added to allow for "continuous" compaction of the value store (Journal). This means that each vnode will schedule approximately N compaction attempts through the day, rather than wait for a compaction window to occur.
This was tested with:
- 8KB value,
- 80 workers,
- no sync on write,
- 5 x i2.2x nodes,
- 12 hour duration,
- 200M keys with a pareto distribution (and hence significant value rotation in the 20%).
With 10 compaction events per day, after the 12 hour test 155GB per node had been compacted out of the value store during the test. In the 12 hours following the test, a similar amount was compacted - to the point there was rough equivalence in node volumes between the closing state of the leveled test and the closing state of the leveldb test.
As before, the Riak + leveled test had substantially lower tail latency, and achieved higher (and more consistent) throughput. There was an increased volatility in throughput when compared to non-compacting tests, but the volatility is still negligible when compared with leveldb tests.
Riak + leveled | Riak + leveldb
:-------------------------:|:-------------------------:
![](../test/volume/cluster_journalcompact/output/summary_leveled_5n_80t_i2_nosync_jc.png "LevelEd") | ![](../test/volume/cluster_journalcompact/output/summary_leveldb_5n_80t_i2_nosync.png "LevelDB")
The throughput difference by hour of the test was:
| Throughput | leveldb comparison
:-------------------------|------------:|------------:|
Hour 1 | 20,692.02 | 112.73%
Hour 2 | 16,147.89 | 106.37%
Hour 3 | 14,190.78 | 115.78%
Hour 4 | 12,740.58 | 123.36%
Hour 5 | 11,939.17 | 137.70%
Hour 6 | 11,549.50 | 144.42%
Hour 7 | 10,948.01 | 142.05%
Hour 8 | 10,625.46 | 138.90%
Hour 9 | 10,119.73 | 137.53%
Hour 10 | 9,965.14 | 143.52%
Hour 11 | 10,112.84 | 149.13%
Hour 12 | 10,266.02 | 144.63%
Testing during a journal compaction window
## Riak Cluster Test - Phase 4 - 2i

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB