diff --git a/README.md b/README.md index b7bf510..a160501 100644 --- a/README.md +++ b/README.md @@ -47,12 +47,13 @@ The target at inception was to do something interesting, to re-think certain key The delta in the table below is the comparison in Riak performance between the identical test run with a Leveled backend in comparison to Leveldb. -Test Description | Hardware | Duration |Avg TPS | Delta (Overall) | Delta (Last Hour) +Test Description | Hardware | Duration |Avg TPS | TPS Delta (Overall) | TPS Delta (Last Hour) :---------------------------------|:-------------|:--------:|----------:|-----------------:|-------------------: 8KB value, 60 workers, sync | 5 x i2.2x | 4 hr | 12,679.91 | + 70.81% | + 63.99% 8KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,100.19 | + 16.15% | + 35.92% 8KB value, 50 workers, no_sync | 5 x d2.2x | 4 hr | 10,400.29 | + 8.37% | + 23.51% -4KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,993.95 | - 10.44% | - 4.48% +4KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,993.95 | - 10.44% | - 4.48% +16KB value, 60 workers, no_sync | 5 x i2.2x | 6 hr | 11,167.44 | + 80.48% | + 113.55% Tests generally show a 5:1 improvement in tail latency for LevelEd. @@ -75,14 +76,24 @@ As a general rule though, the most interesting thing is the potential to enable Further volume test scenarios are the immediate priority, in particular volume test scenarios with: -- Alternative object sizes; - - Significant use of secondary indexes; - Use of newly available [EC2 hardware](https://aws.amazon.com/about-aws/whats-new/2017/02/now-available-amazon-ec2-i3-instances-next-generation-storage-optimized-high-i-o-instances/) which potentially is a significant changes to assumptions about hardware efficiency and cost. - Create riak_test tests for new Riak features enabled by Leveled. +However a number of other changes are planned in the next month to (my branch of) riak_kv to better use leveled: + +- Support for rapid rebuild of hashtrees + +- Fixes to priority issues + +- Experiments with flexible sync on write settings + +- A cleaner and easier build of Riak with leveled included, including cuttlefish configuration support + +More information can be found in the [future section](docs/FUTURE.md). + ## Feedback Please create an issue if you have any suggestions. You can ping me @masleeds if you wish @@ -99,4 +110,17 @@ This will start a new Bookie. It will start and look for existing data files, u The book_start method should respond once startup is complete. The leveled_bookie module includes the full API for external use of the store. -Read through the [end_to_end test suites](test/end_to_end/) for further guidance. +It should run anywhere that OTP will run - it has been tested on Ubuntu 14, MAC OS X and Windows 10. + +Running in Riak requires one of the branches of riak_kv referenced [here](docs/FUTURE.md). There is a [Riak branch](https://github.com/martinsumner/riak/tree/mas-leveleddb) intended to support the automatic build of this, and the configuration via cuttlefish. However, the auto-build fails due to other dependencies (e.g. riak_search) bringing in an alternative version of riak_kv, and the configuration via cuttlefish is broken for reasons unknown. + +Building this from source as part of Riak will require a bit of fiddling around. + +- build [riak](https://github.com/martinsumner/riak/tree/mas-leveleddb) +- cd deps, rm -rf riak_kv +- git clone -b mas-leveled-putfm --single-branch https://github.com/martinsumner/riak_kv.git +- cd .. +- make rel +- remember to set the storage backend to leveled in riak.conf + +To help with the breakdown of cuttlefish, leveled parameters can be set via riak_kv/include/riak_kv_leveled.hrl - although a new make will be required for these changes to take effect. \ No newline at end of file diff --git a/docs/FUTURE.md b/docs/FUTURE.md index 9146f7a..831a27d 100644 --- a/docs/FUTURE.md +++ b/docs/FUTURE.md @@ -114,4 +114,6 @@ The other n-1 vnodes must also do a local GET before the vnode PUT (so as not to This branch changes the behaviour slightly at the non-coordinating vnodes. These vnodes will now try a HEAD request before the local PUT (not a GET request), and if the HEAD request contains a vclock which is dominated by the updated PUT, it will not attempt to fetch the whole object for the syntactic merge. -This should save two object fetches (where n=3) in most circumstances. \ No newline at end of file +This should save two object fetches (where n=3) in most circumstances. + +Note, although the branch name refers to the put fsm - the actual fsm is unchanged by this, all of the changes are within vnode_put \ No newline at end of file diff --git a/docs/VOLUME.md b/docs/VOLUME.md index 77ce178..00f3e94 100644 --- a/docs/VOLUME.md +++ b/docs/VOLUME.md @@ -32,6 +32,7 @@ This test has the following specific characteristics - 60 concurrent basho_bench workers running at 'max' - i2.2xlarge instances - allow_mult=false, lww=false +- sync_on_write = on Comparison charts for this test: @@ -47,6 +48,7 @@ This test has the following specific characteristics - 100 concurrent basho_bench workers running at 'max' - i2.2xlarge instances - allow_mult=false, lww=false +- sync_on_write = off Comparison charts for this test: @@ -60,8 +62,9 @@ This test has the following specific characteristics - An 8KB value size (based on crypto:rand_bytes/1 - so cannot be effectively compressed) - 50 concurrent basho_bench workers running at 'max' -- d2.2xlarge instances +- d2.2xlarge instances - allow_mult=false, lww=false +- sync_on_write = off Comparison charts for this test: @@ -74,11 +77,37 @@ This is the stage when the volume of data has begun to exceed the volume support ### Half-Size Object, SSDs, No Sync-On-Write -to be completed +This test has the following specific characteristics + +- A 4KB value size (based on crypto:rand_bytes/1 - so cannot be effectively compressed) +- 100 concurrent basho_bench workers running at 'max' +- i2.2xlarge instances +- allow_mult=false, lww=false +- sync_on_write = off + +Comparison charts for this test: + +Riak + leveled | Riak + eleveldb +:-------------------------:|:-------------------------: +![](../test/volume/cluster_four/output/summary_leveled_5n_100t_i2_4KB_nosync.png "LevelEd") | ![](../test/volume/cluster_four/output/summary_leveldb_5n_100t_i2_4KB_nosync.png "LevelDB") + ### Double-Size Object, SSDs, No Sync-On-Write -to be completed +This test has the following specific characteristics + +- A 16KB value size (based on crypto:rand_bytes/1 - so cannot be effectively compressed) +- 60 concurrent basho_bench workers running at 'max' +- i2.2xlarge instances +- allow_mult=false, lww=false +- sync_on_write = off + +Comparison charts for this test: + +Riak + leveled | Riak + eleveldb +:-------------------------:|:-------------------------: +![](../test/volume/cluster_five/output/summary_leveled_5n_60t_i2_16KB_nosync.png "LevelEd") | ![](../test/volume/cluster_five/output/summary_leveldb_5n_60t_i2_16KB_nosync.png "LevelDB") + ### Lies, damned lies etc @@ -90,11 +119,14 @@ Both leveled and leveldb are optimised for finding non-presence through the use So it is better to focus on the results at the tail of the tests, as at the tail the results are a more genuine reflection of behaviour against the advertised test parameters. + Test Description | Hardware | Duration |Avg TPS | Delta (Overall) | Delta (Last Hour) :---------------------------------|:-------------|:--------:|----------:|-----------------:|-------------------: 8KB value, 60 workers, sync | 5 x i2.2x | 4 hr | 12,679.91 | + 70.81% | + 63.99% 8KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,100.19 | + 16.15% | + 35.92% -8KB value, 50 workers, no_sync | 5 x d2.2x | 6 hr | 10,400.29 | + 8.37% | + 23.51% +8KB value, 50 workers, no_sync | 5 x d2.2x | 4 hr | 10,400.29 | + 8.37% | + 23.51% +4KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,993.95 | - 10.44% | - 4.48% +16KB value, 60 workers, no_sync | 5 x i2.2x | 6 hr | 11,167.44 | + 80.48% | + 113.55% Leveled, like bitcask, will defer compaction work until a designated compaction window, and these tests were run outside of that compaction window. So although the throughput of leveldb is lower, it has no deferred work at the end of the test. Future testing work is scheduled to examine leveled throughput during a compaction window. diff --git a/test/volume/cluster_five/output/summary_leveldb_5n_60t_i2_16KB_nosync.png b/test/volume/cluster_five/output/summary_leveldb_5n_60t_i2_16KB_nosync.png new file mode 100644 index 0000000..b411e98 Binary files /dev/null and b/test/volume/cluster_five/output/summary_leveldb_5n_60t_i2_16KB_nosync.png differ diff --git a/test/volume/cluster_five/output/summary_leveled_5n_60t_i2_16KB_nosync.png b/test/volume/cluster_five/output/summary_leveled_5n_60t_i2_16KB_nosync.png new file mode 100644 index 0000000..ca6bdcb Binary files /dev/null and b/test/volume/cluster_five/output/summary_leveled_5n_60t_i2_16KB_nosync.png differ diff --git a/test/volume/cluster_four/output/summary_leveldb_5n_100t_i2_4KB_nosync.png b/test/volume/cluster_four/output/summary_leveldb_5n_100t_i2_4KB_nosync.png new file mode 100644 index 0000000..ef83156 Binary files /dev/null and b/test/volume/cluster_four/output/summary_leveldb_5n_100t_i2_4KB_nosync.png differ diff --git a/test/volume/cluster_four/output/summary_leveled_5n_100t_i2_4KB_nosync.png b/test/volume/cluster_four/output/summary_leveled_5n_100t_i2_4KB_nosync.png new file mode 100644 index 0000000..3631d8d Binary files /dev/null and b/test/volume/cluster_four/output/summary_leveled_5n_100t_i2_4KB_nosync.png differ