Mas i335 otp24 (#336)
* Address OTP24 warnings, ct and eunit paths * Reorg to add OTP 24 support * Update VOLUME.md * Correct broken refs * Update README.md * CI on all main branches Co-authored-by: Ulf Wiger <ulf@wiger.net>
10
.github/workflows/erlang.yml
vendored
|
@ -2,9 +2,15 @@ name: Erlang CI
|
||||||
|
|
||||||
on:
|
on:
|
||||||
push:
|
push:
|
||||||
branches: [ develop-3.0 ]
|
branches:
|
||||||
|
- develop-3.1
|
||||||
|
- develop-3.0
|
||||||
|
- develop-2.9
|
||||||
pull_request:
|
pull_request:
|
||||||
branches: [ develop-3.0 ]
|
branches:
|
||||||
|
- develop-3.1
|
||||||
|
- develop-3.0
|
||||||
|
- develop-2.9
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
|
|
||||||
|
|
52
README.md
|
@ -44,48 +44,13 @@ For more details on the store:
|
||||||
|
|
||||||
- There is also a ["Why"](docs/WHY.md) section looking at lower level design choices and the rationale that supports them.
|
- There is also a ["Why"](docs/WHY.md) section looking at lower level design choices and the rationale that supports them.
|
||||||
|
|
||||||
## Is this interesting?
|
|
||||||
|
|
||||||
Making a positive contribution to this space is hard - given the superior brainpower and experience of those that have contributed to the KV store problem space in general, and the Riak backend space in particular.
|
|
||||||
|
|
||||||
The target at inception was to do something interesting, to re-think certain key assumptions and trade-offs, and prove through working software the potential for improvements to be realised.
|
|
||||||
|
|
||||||
[Initial volume tests](docs/VOLUME.md) indicate that it is at least interesting. With improvements in throughput for multiple configurations, with this improvement becoming more marked as the test progresses (and the base data volume becomes more realistic).
|
|
||||||
|
|
||||||
The delta in the table below is the comparison in Riak throughput between the identical test run with a leveled backend in comparison to leveldb. The realism of the tests increase as the test progresses - so focus is given to the throughput delta in the last hour of the test.
|
|
||||||
|
|
||||||
Test Description | Hardware | Duration |Avg TPS | TPS Delta (Overall) | TPS Delta (Last Hour)
|
|
||||||
:---------------------------------|:-------------|:--------:|----------:|-----------------:|-------------------:
|
|
||||||
8KB value, 60 workers, sync | 5 x i2.2x | 4 hr | 12,679.91 | <b>+ 70.81%</b> | <b>+ 63.99%</b>
|
|
||||||
8KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,100.19 | <b>+ 16.15%</b> | <b>+ 35.92%</b>
|
|
||||||
8KB value, 50 workers, no_sync | 5 x d2.2x | 4 hr | 10,400.29 | <b>+ 8.37%</b> | <b>+ 23.51%</b>
|
|
||||||
4KB value, 100 workers, no_sync | 5 x i2.2x | 6 hr | 14,993.95 | - 10.44% | - 4.48%
|
|
||||||
16KB value, 60 workers, no_sync | 5 x i2.2x | 6 hr | 11,167.44 | <b>+ 80.48%</b> | <b>+ 113.55%</b>
|
|
||||||
8KB value, 80 workers, no_sync, 2i queries | 5 x i2.2x | 6 hr | 9,855.96 | <b>+ 4.48%</b> | <b>+ 22.36%</b>
|
|
||||||
|
|
||||||
Tests generally show a 5:1 improvement in tail latency for leveled.
|
|
||||||
|
|
||||||
All tests have in common:
|
|
||||||
|
|
||||||
- Target Key volume - 200M with pareto distribution of load
|
|
||||||
- 5 GETs per 1 update
|
|
||||||
- RAID 10 (software) drives
|
|
||||||
- allow_mult=false, lww=false
|
|
||||||
- modified riak optimised for leveled used in leveled tests
|
|
||||||
|
|
||||||
The throughput in leveled is generally CPU-bound, whereas in comparative tests for leveledb the throughput was disk bound. This potentially makes capacity planning simpler, and opens up the possibility of scaling out to equivalent throughput at much lower cost (as CPU is relatively low cost when compared to disk space at high I/O) - [offering better alignment between resource constraints and the cost of resource](docs/INTRO.md).
|
|
||||||
|
|
||||||
More information can be found in the [volume testing section](docs/VOLUME.md).
|
|
||||||
|
|
||||||
As a general rule though, the most interesting thing is the potential to enable [new features](docs/FUTURE.md). The tagging of different object types, with an ability to set different rules for both compaction and metadata creation by tag, is a potential enabler for further change. Further, having a separate key/metadata store which can be scanned without breaking the page cache or working against mitigation for write amplifications, is also potentially an enabler to offer features to both the developer and the operator.
|
|
||||||
|
|
||||||
## Feedback
|
## Feedback
|
||||||
|
|
||||||
Please create an issue if you have any suggestions. You can ping me <b>@masleeds</b> if you wish
|
Please create an issue if you have any suggestions. You can ping me <b>@masleeds</b> if you wish
|
||||||
|
|
||||||
## Running Leveled
|
## Running Leveled
|
||||||
|
|
||||||
Unit and current tests in leveled should run with rebar3. Leveled has been tested in OTP18, but it can be started with OTP16 to support Riak (although tests will not work as expected).
|
Unit and current tests in leveled should run with rebar3.
|
||||||
|
|
||||||
A new database can be started by running
|
A new database can be started by running
|
||||||
|
|
||||||
|
@ -99,13 +64,18 @@ The book_start method should respond once startup is complete. The [leveled_boo
|
||||||
|
|
||||||
Running in Riak requires Riak 2.9 or beyond, which is available from January 2019.
|
Running in Riak requires Riak 2.9 or beyond, which is available from January 2019.
|
||||||
|
|
||||||
|
There are three main branches:
|
||||||
|
|
||||||
|
[`develop-3.1` - default](https://github.com/martinsumner/leveled/tree/develop-3.1): Target for the Riak 3.1 release with support for OTP 22 and OTP 24;
|
||||||
|
|
||||||
|
[`develop-3.0`](https://github.com/martinsumner/leveled/tree/develop-3.0): Used in the Riak 3.0 release with support for OTP 20 and OTP 22;
|
||||||
|
|
||||||
|
[`develop-2.9`](https://github.com/martinsumner/leveled/tree/develop-2.9): Used in the Riak 2.9 release with support for OTP R16 through to OTP 20.
|
||||||
|
|
||||||
### Contributing
|
### Contributing
|
||||||
|
|
||||||
In order to contribute to leveled, fork the repository, make a branch
|
In order to contribute to leveled, fork the repository, make a branch for your changes, and open a pull request. The acceptance criteria for updating leveled is that it passes rebar3 dialyzer, xref, eunit, and ct with 100% coverage.
|
||||||
for your changes, and open a pull request. The acceptance criteria for
|
|
||||||
updating leveled is that it passes rebar3 dialyzer, xref, eunit, and
|
|
||||||
ct with 100% coverage.
|
|
||||||
|
|
||||||
To have rebar3 execute the full set of tests, run:
|
To have rebar3 execute the full set of tests, run:
|
||||||
|
|
||||||
`rebar3 as test do cover --reset, eunit --cover, ct --cover, cover --verbose`
|
`rebar3 as test do xref, dialyzer, cover --reset, eunit --cover, ct --cover, cover --verbose`
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
## Parallel Node Testing
|
## Parallel Node Testing
|
||||||
|
|
||||||
Initial volume tests have been [based on standard basho_bench eleveldb test](../test/volume/single_node/examples) to run multiple stores in parallel on the same node and and subjecting them to concurrent pressure.
|
Initial volume tests have been [based on standard basho_bench eleveldb test](volume/single_node/examples) to run multiple stores in parallel on the same node and and subjecting them to concurrent pressure.
|
||||||
|
|
||||||
This showed a [relative positive performance for leveled](VOLUME_PRERIAK.md) for both population and load. This also showed that although the leveled throughput was relatively stable, it was still subject to fluctuations related to CPU constraints - especially as compaction of the ledger was a CPU intensive activity. Prior to moving on to full Riak testing, a number of changes where then made to leveled to reduce the CPU load during these merge events.
|
This showed a [relative positive performance for leveled](VOLUME_PRERIAK.md) for both population and load. This also showed that although the leveled throughput was relatively stable, it was still subject to fluctuations related to CPU constraints - especially as compaction of the ledger was a CPU intensive activity. Prior to moving on to full Riak testing, a number of changes where then made to leveled to reduce the CPU load during these merge events.
|
||||||
|
|
||||||
|
@ -38,7 +38,7 @@ Comparison charts for this test:
|
||||||
|
|
||||||
Riak + leveled | Riak + eleveldb
|
Riak + leveled | Riak + eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
### Mid-Size Object, SSDs, No Sync-On-Write
|
### Mid-Size Object, SSDs, No Sync-On-Write
|
||||||
|
|
||||||
|
@ -54,7 +54,7 @@ Comparison charts for this test:
|
||||||
|
|
||||||
Riak + leveled | Riak + eleveldb
|
Riak + leveled | Riak + eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
### Mid-Size Object, HDDs, No Sync-On-Write
|
### Mid-Size Object, HDDs, No Sync-On-Write
|
||||||
|
|
||||||
|
@ -70,7 +70,7 @@ Comparison charts for this test:
|
||||||
|
|
||||||
Riak + leveled | Riak + eleveldb
|
Riak + leveled | Riak + eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
Note that there is a clear inflexion point when throughput starts to drop sharply at about the hour mark into the test.
|
Note that there is a clear inflexion point when throughput starts to drop sharply at about the hour mark into the test.
|
||||||
This is the stage when the volume of data has begun to exceed the volume supportable in cache, and so disk activity begins to be required for GET operations with increasing frequency.
|
This is the stage when the volume of data has begun to exceed the volume supportable in cache, and so disk activity begins to be required for GET operations with increasing frequency.
|
||||||
|
@ -89,7 +89,7 @@ Comparison charts for this test:
|
||||||
|
|
||||||
Riak + leveled | Riak + eleveldb
|
Riak + leveled | Riak + eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
|
|
||||||
### Double-Size Object, SSDs, No Sync-On-Write
|
### Double-Size Object, SSDs, No Sync-On-Write
|
||||||
|
@ -106,14 +106,14 @@ Comparison charts for this test:
|
||||||
|
|
||||||
Riak + leveled | Riak + eleveldb
|
Riak + leveled | Riak + eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
|
|
||||||
### Lies, damned lies etc
|
### Lies, damned lies etc
|
||||||
|
|
||||||
The first thing to note about the test is the impact of the pareto distribution and the start from an empty store, on what is actually being tested. At the start of the test there is a 0% chance of a GET request actually finding an object. Normally, it will be 3 hours into the test before a GET request will have a 50% chance of finding an object.
|
The first thing to note about the test is the impact of the pareto distribution and the start from an empty store, on what is actually being tested. At the start of the test there is a 0% chance of a GET request actually finding an object. Normally, it will be 3 hours into the test before a GET request will have a 50% chance of finding an object.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Both leveled and leveldb are optimised for finding non-presence through the use of bloom filters, so the comparison is not unduly influenced by this. However, the workload at the end of the test is both more realistic (in that objects are found), and harder if the previous throughput had been greater (in that more objects are found).
|
Both leveled and leveldb are optimised for finding non-presence through the use of bloom filters, so the comparison is not unduly influenced by this. However, the workload at the end of the test is both more realistic (in that objects are found), and harder if the previous throughput had been greater (in that more objects are found).
|
||||||
|
|
||||||
|
@ -152,7 +152,7 @@ These tests have been completed using the following static characteristics which
|
||||||
- 5 x i2.2x nodes,
|
- 5 x i2.2x nodes,
|
||||||
- 6 hour duration.
|
- 6 hour duration.
|
||||||
|
|
||||||
This is [a test used in Phase 1](https://github.com/martinsumner/leveled/blob/master/docs/VOLUME.md#mid-size-object-ssds-no-sync-on-write). Note that since Phase 1 was completed a number of performance improvements have been made in leveled, so that the starting gap between Riak/leveled and Riak/leveldb has widened.
|
This is [a test used in Phase 1](VOLUME.md#mid-size-object-ssds-no-sync-on-write). Note that since Phase 1 was completed a number of performance improvements have been made in leveled, so that the starting gap between Riak/leveled and Riak/leveldb has widened.
|
||||||
|
|
||||||
The tests have been run using the new riak_kv_sweeper facility within develop. This feature is an alternative approach to controlling and scheduling rebuilds, allowing for other work to be scheduled into the same fold. As the test is focused on hashtree rebuilds, the test was run with:
|
The tests have been run using the new riak_kv_sweeper facility within develop. This feature is an alternative approach to controlling and scheduling rebuilds, allowing for other work to be scheduled into the same fold. As the test is focused on hashtree rebuilds, the test was run with:
|
||||||
|
|
||||||
|
@ -173,7 +173,7 @@ The comparison between leveled and leveldb shows a marked difference in throughp
|
||||||
|
|
||||||
Riak + leveled | Riak + leveldb
|
Riak + leveled | Riak + leveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
The differences between the two tests are:
|
The differences between the two tests are:
|
||||||
|
|
||||||
|
@ -231,7 +231,7 @@ As before, the Riak + leveled test had substantially lower tail latency, and ach
|
||||||
|
|
||||||
Riak + leveled | Riak + leveldb
|
Riak + leveled | Riak + leveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
The throughput difference by hour of the test was:
|
The throughput difference by hour of the test was:
|
||||||
|
|
||||||
|
@ -271,11 +271,9 @@ The secondary index test was built on a test which sent
|
||||||
The query load is relatively light compared to GET/PUT load in-line with Basho recommendations (decline from 350 queries per second to 120 queries per second through the test). The queries
|
The query load is relatively light compared to GET/PUT load in-line with Basho recommendations (decline from 350 queries per second to 120 queries per second through the test). The queries
|
||||||
return o(1000) results maximum towards the tail of the test and o(1) results at the start of the test.
|
return o(1000) results maximum towards the tail of the test and o(1) results at the start of the test.
|
||||||
|
|
||||||
Further details on the implementation of the secondary indexes for volume tests can be found in the [driver file](https://github.com/martinsumner/basho_bench/blob/mas-nhsload/src/basho_bench_driver_riakc_pb.erl) for the test.
|
|
||||||
|
|
||||||
Riak + leveled | Riak + leveldb
|
Riak + leveled | Riak + leveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
The results are similar as to previous tests. Although the test is on infrastructure with optimised disk throughput (and with no flushing to disk on write from Riak to minimise direct pressure from Riak), when running the tests with leveldb disk busyness rapidly becomes a constraining factor - and the reaction to that is volatility in throughput. Riak combined with leveldb is capable in short bursts of greater throughput than Riak + leveled, however when throttled within the cluster by a node or nodes with busy disks, the reaction is extreme.
|
The results are similar as to previous tests. Although the test is on infrastructure with optimised disk throughput (and with no flushing to disk on write from Riak to minimise direct pressure from Riak), when running the tests with leveldb disk busyness rapidly becomes a constraining factor - and the reaction to that is volatility in throughput. Riak combined with leveldb is capable in short bursts of greater throughput than Riak + leveled, however when throttled within the cluster by a node or nodes with busy disks, the reaction is extreme.
|
||||||
|
|
||||||
|
@ -307,7 +305,7 @@ Here is a side-by-side on a standard Phase 1 test on i2, without sync, and with
|
||||||
|
|
||||||
Riak + leveled | Riak + bitcask
|
Riak + leveled | Riak + bitcask
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
In the first hour of the test, bitcask throughput is <b>39.13%</b> greater than leveled. Over the whole test, the bitcask-backed cluster achieves <b>16.48%</b> more throughput than leveled, but in the last hour this advantage is just <b>0.34%</b>.
|
In the first hour of the test, bitcask throughput is <b>39.13%</b> greater than leveled. Over the whole test, the bitcask-backed cluster achieves <b>16.48%</b> more throughput than leveled, but in the last hour this advantage is just <b>0.34%</b>.
|
||||||
|
|
||||||
|
|
|
@ -2,17 +2,17 @@
|
||||||
|
|
||||||
## Parallel Node Testing - Non-Riak
|
## Parallel Node Testing - Non-Riak
|
||||||
|
|
||||||
Initial volume tests have been [based on standard basho_bench eleveldb test](../test/volume/single_node/examples) to run multiple stores in parallel on the same node and and subjecting them to concurrent pressure.
|
Initial volume tests have been [based on standard basho_bench eleveldb test](volume/single_node/examples) to run multiple stores in parallel on the same node and and subjecting them to concurrent pressure.
|
||||||
|
|
||||||
This showed a relative positive performance for leveled for both population and load.
|
This showed a relative positive performance for leveled for both population and load.
|
||||||
|
|
||||||
Populate leveled | Populate eleveldb
|
Populate leveled | Populate eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
Load leveled | Load eleveldb
|
Load leveled | Load eleveldb
|
||||||
:-------------------------:|:-------------------------:
|
:-------------------------:|:-------------------------:
|
||||||
 | 
|
 | 
|
||||||
|
|
||||||
This test was a positive comparison for LevelEd, but also showed that although the LevelEd throughput was relatively stable it was still subject to fluctuations related to CPU constraints. Prior to moving on to full Riak testing, a number of changes where then made to LevelEd to reduce the CPU load in particular during merge events.
|
This test was a positive comparison for LevelEd, but also showed that although the LevelEd throughput was relatively stable it was still subject to fluctuations related to CPU constraints. Prior to moving on to full Riak testing, a number of changes where then made to LevelEd to reduce the CPU load in particular during merge events.
|
||||||
|
|
||||||
|
|
Before Width: | Height: | Size: 129 KiB After Width: | Height: | Size: 129 KiB |
Before Width: | Height: | Size: 102 KiB After Width: | Height: | Size: 102 KiB |
Before Width: | Height: | Size: 115 KiB After Width: | Height: | Size: 115 KiB |
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 78 KiB |
Before Width: | Height: | Size: 93 KiB After Width: | Height: | Size: 93 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 83 KiB After Width: | Height: | Size: 83 KiB |
Before Width: | Height: | Size: 97 KiB After Width: | Height: | Size: 97 KiB |
Before Width: | Height: | Size: 79 KiB After Width: | Height: | Size: 79 KiB |
Before Width: | Height: | Size: 98 KiB After Width: | Height: | Size: 98 KiB |
Before Width: | Height: | Size: 81 KiB After Width: | Height: | Size: 81 KiB |
Before Width: | Height: | Size: 109 KiB After Width: | Height: | Size: 109 KiB |
Before Width: | Height: | Size: 73 KiB After Width: | Height: | Size: 73 KiB |
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 99 KiB |
Before Width: | Height: | Size: 76 KiB After Width: | Height: | Size: 76 KiB |
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 41 KiB |
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 99 KiB |
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 78 KiB |
Before Width: | Height: | Size: 316 KiB After Width: | Height: | Size: 316 KiB |
Before Width: | Height: | Size: 315 KiB After Width: | Height: | Size: 315 KiB |
Before Width: | Height: | Size: 333 KiB After Width: | Height: | Size: 333 KiB |
Before Width: | Height: | Size: 274 KiB After Width: | Height: | Size: 274 KiB |
12
rebar.config
|
@ -9,14 +9,24 @@
|
||||||
|
|
||||||
{xref_checks, [undefined_function_calls,undefined_functions]}.
|
{xref_checks, [undefined_function_calls,undefined_functions]}.
|
||||||
|
|
||||||
|
{cover_excl_mods,
|
||||||
|
[testutil,
|
||||||
|
appdefined_SUITE, basic_SUITE, iterator_SUITE,
|
||||||
|
perf_SUITE, recovery_SUITE, riak_SUITE, tictac_SUITE]}.
|
||||||
|
|
||||||
{eunit_opts, [verbose]}.
|
{eunit_opts, [verbose]}.
|
||||||
|
|
||||||
{profiles,
|
{profiles,
|
||||||
[{eqc, [{deps, [meck, fqc]},
|
[{eqc, [{deps, [meck, fqc]},
|
||||||
{erl_opts, [debug_info, {parse_transform, eqc_cover}]},
|
{erl_opts, [debug_info, {parse_transform, eqc_cover}]},
|
||||||
{extra_src_dirs, ["test"]}]}
|
{extra_src_dirs, ["test"]}]},
|
||||||
|
{test, [
|
||||||
|
{eunit_compile_opts, [{src_dirs, ["src", "test/end_to_end"]}]}
|
||||||
|
]}
|
||||||
]}.
|
]}.
|
||||||
|
|
||||||
{deps, [
|
{deps, [
|
||||||
{lz4, ".*", {git, "https://github.com/martinsumner/erlang-lz4", {tag, "0.2.5"}}}
|
{lz4, ".*", {git, "https://github.com/martinsumner/erlang-lz4", {tag, "0.2.5"}}}
|
||||||
]}.
|
]}.
|
||||||
|
|
||||||
|
{ct_opts, [{dir, ["test/end_to_end"]}]}.
|
||||||
|
|
BIN
rebar3
|
@ -2835,9 +2835,9 @@ foldobjects_vs_hashtree_testto() ->
|
||||||
fun(B, K, ProxyV, Acc) ->
|
fun(B, K, ProxyV, Acc) ->
|
||||||
{proxy_object,
|
{proxy_object,
|
||||||
MD,
|
MD,
|
||||||
_Size,
|
_Size1,
|
||||||
_Fetcher} = binary_to_term(ProxyV),
|
_Fetcher} = binary_to_term(ProxyV),
|
||||||
{Hash, _Size, _UserDefinedMD} = MD,
|
{Hash, _Size0, _UserDefinedMD} = MD,
|
||||||
[{B, K, Hash}|Acc]
|
[{B, K, Hash}|Acc]
|
||||||
end,
|
end,
|
||||||
|
|
||||||
|
|
|
@ -1594,7 +1594,7 @@ read_integerpairs(<<Int1:32, Int2:32, Rest/binary>>, Pairs) ->
|
||||||
%% false - don't check the CRC before returning key & value
|
%% false - don't check the CRC before returning key & value
|
||||||
%% loose_presence - confirm that the hash of the key is present
|
%% loose_presence - confirm that the hash of the key is present
|
||||||
search_hash_table(_Handle,
|
search_hash_table(_Handle,
|
||||||
{_, _, _TotalSlots, _TotalSlots},
|
{_, _, TotalSlots, TotalSlots},
|
||||||
_Hash, _Key,
|
_Hash, _Key,
|
||||||
_QuickCheck, _BinaryMode, Timings) ->
|
_QuickCheck, _BinaryMode, Timings) ->
|
||||||
% We have done the full loop - value must not be present
|
% We have done the full loop - value must not be present
|
||||||
|
|
|
@ -1407,7 +1407,7 @@ compact_journal_testto(WRP, ExpectedFiles) ->
|
||||||
build_dummy_journal(fun test_ledgerkey/1),
|
build_dummy_journal(fun test_ledgerkey/1),
|
||||||
{ok, Ink1} = ink_start(InkOpts),
|
{ok, Ink1} = ink_start(InkOpts),
|
||||||
|
|
||||||
{ok, NewSQN1, _ObjSize} = ink_put(Ink1,
|
{ok, NewSQN1, ObjSize} = ink_put(Ink1,
|
||||||
test_ledgerkey("KeyAA"),
|
test_ledgerkey("KeyAA"),
|
||||||
"TestValueAA",
|
"TestValueAA",
|
||||||
{[], infinity}),
|
{[], infinity}),
|
||||||
|
@ -1427,7 +1427,7 @@ compact_journal_testto(WRP, ExpectedFiles) ->
|
||||||
{SQN, test_ledgerkey(PK)}
|
{SQN, test_ledgerkey(PK)}
|
||||||
end,
|
end,
|
||||||
FunnyLoop),
|
FunnyLoop),
|
||||||
{ok, NewSQN2, _ObjSize} = ink_put(Ink1,
|
{ok, NewSQN2, ObjSize} = ink_put(Ink1,
|
||||||
test_ledgerkey("KeyBB"),
|
test_ledgerkey("KeyBB"),
|
||||||
"TestValueBB",
|
"TestValueBB",
|
||||||
{[], infinity}),
|
{[], infinity}),
|
||||||
|
|
|
@ -1036,10 +1036,8 @@ sst_getfilteredslots(Pid, SlotList, SegList, LowLastMod) ->
|
||||||
non_neg_integer()) -> list(non_neg_integer()).
|
non_neg_integer()) -> list(non_neg_integer()).
|
||||||
%% @doc
|
%% @doc
|
||||||
%% Find a list of positions where there is an element with a matching segment
|
%% Find a list of positions where there is an element with a matching segment
|
||||||
%% ID to the expected segments (which cna either be a single segment, a list of
|
%% ID to the expected segments (which can either be a single segment, a list of
|
||||||
%% segments or a set of segments depending on size.
|
%% segments or a set of segments depending on size.
|
||||||
find_pos(<<>>, _Hash, PosList, _Count) ->
|
|
||||||
PosList;
|
|
||||||
find_pos(<<1:1/integer, PotentialHit:15/integer, T/binary>>,
|
find_pos(<<1:1/integer, PotentialHit:15/integer, T/binary>>,
|
||||||
Checker, PosList, Count) ->
|
Checker, PosList, Count) ->
|
||||||
case member_check(PotentialHit, Checker) of
|
case member_check(PotentialHit, Checker) of
|
||||||
|
@ -1049,7 +1047,12 @@ find_pos(<<1:1/integer, PotentialHit:15/integer, T/binary>>,
|
||||||
find_pos(T, Checker, PosList, Count + 1)
|
find_pos(T, Checker, PosList, Count + 1)
|
||||||
end;
|
end;
|
||||||
find_pos(<<0:1/integer, NHC:7/integer, T/binary>>, Checker, PosList, Count) ->
|
find_pos(<<0:1/integer, NHC:7/integer, T/binary>>, Checker, PosList, Count) ->
|
||||||
find_pos(T, Checker, PosList, Count + NHC + 1).
|
find_pos(T, Checker, PosList, Count + NHC + 1);
|
||||||
|
find_pos(_BinRem, _Hash, PosList, _Count) ->
|
||||||
|
%% Expect this to be <<>> - i.e. at end of binary, but if there is
|
||||||
|
%% corruption, could be some other value - so return as well in this
|
||||||
|
%% case
|
||||||
|
PosList.
|
||||||
|
|
||||||
|
|
||||||
-spec member_check(non_neg_integer(),
|
-spec member_check(non_neg_integer(),
|
||||||
|
@ -2949,16 +2952,16 @@ tombcount_test() ->
|
||||||
OptsSST =
|
OptsSST =
|
||||||
#sst_options{press_method=native,
|
#sst_options{press_method=native,
|
||||||
log_options=leveled_log:get_opts()},
|
log_options=leveled_log:get_opts()},
|
||||||
{ok, SST1, _KD, _BB} = sst_newmerge(RP, Filename,
|
{ok, SST1, KD, BB} = sst_newmerge(RP, Filename,
|
||||||
KVL1, KVL2, false, 2,
|
KVL1, KVL2, false, 2,
|
||||||
N, OptsSST, false, false),
|
N, OptsSST, false, false),
|
||||||
?assertMatch(not_counted, sst_gettombcount(SST1)),
|
?assertMatch(not_counted, sst_gettombcount(SST1)),
|
||||||
ok = sst_close(SST1),
|
ok = sst_close(SST1),
|
||||||
ok = file:delete(filename:join(RP, Filename ++ ".sst")),
|
ok = file:delete(filename:join(RP, Filename ++ ".sst")),
|
||||||
|
|
||||||
{ok, SST2, _KD, _BB} = sst_newmerge(RP, Filename,
|
{ok, SST2, KD, BB} = sst_newmerge(RP, Filename,
|
||||||
KVL1, KVL2, false, 2,
|
KVL1, KVL2, false, 2,
|
||||||
N, OptsSST, false, true),
|
N, OptsSST, false, true),
|
||||||
|
|
||||||
?assertMatch(ExpectedCount, sst_gettombcount(SST2)),
|
?assertMatch(ExpectedCount, sst_gettombcount(SST2)),
|
||||||
ok = sst_close(SST2),
|
ok = sst_close(SST2),
|
||||||
|
@ -3079,9 +3082,9 @@ indexed_list_mixedkeys2_test() ->
|
||||||
indexed_list_allindexkeys_test() ->
|
indexed_list_allindexkeys_test() ->
|
||||||
Keys = lists:sublist(lists:ukeysort(1, generate_indexkeys(150)),
|
Keys = lists:sublist(lists:ukeysort(1, generate_indexkeys(150)),
|
||||||
?LOOK_SLOTSIZE),
|
?LOOK_SLOTSIZE),
|
||||||
{{HeaderT, FullBinT, _HL, _LK}, no_timing} =
|
{{HeaderT, FullBinT, HL, LK}, no_timing} =
|
||||||
generate_binary_slot(lookup, Keys, native, true, no_timing),
|
generate_binary_slot(lookup, Keys, native, true, no_timing),
|
||||||
{{HeaderF, FullBinF, _HL, _LK}, no_timing} =
|
{{HeaderF, FullBinF, HL, LK}, no_timing} =
|
||||||
generate_binary_slot(lookup, Keys, native, false, no_timing),
|
generate_binary_slot(lookup, Keys, native, false, no_timing),
|
||||||
EmptySlotSize = ?LOOK_SLOTSIZE - 1,
|
EmptySlotSize = ?LOOK_SLOTSIZE - 1,
|
||||||
LMD = ?FLIPPER32,
|
LMD = ?FLIPPER32,
|
||||||
|
@ -3172,6 +3175,9 @@ indexed_list_allindexkeys_trimmed_test() ->
|
||||||
?assertMatch(R3, O3).
|
?assertMatch(R3, O3).
|
||||||
|
|
||||||
|
|
||||||
|
findposfrag_test() ->
|
||||||
|
?assertMatch([], find_pos(<<128:8/integer>>, 1, [], 0)).
|
||||||
|
|
||||||
indexed_list_mixedkeys_bitflip_test() ->
|
indexed_list_mixedkeys_bitflip_test() ->
|
||||||
KVL0 = lists:ukeysort(1, generate_randomkeys(1, 50, 1, 4)),
|
KVL0 = lists:ukeysort(1, generate_randomkeys(1, 50, 1, 4)),
|
||||||
KVL1 = lists:sublist(KVL0, 33),
|
KVL1 = lists:sublist(KVL0, 33),
|
||||||
|
@ -3564,7 +3570,7 @@ simple_persisted_tester(SSTNewFun) ->
|
||||||
KVList1 = lists:ukeysort(1, KVList0),
|
KVList1 = lists:ukeysort(1, KVList0),
|
||||||
[{FirstKey, _FV}|_Rest] = KVList1,
|
[{FirstKey, _FV}|_Rest] = KVList1,
|
||||||
{LastKey, _LV} = lists:last(KVList1),
|
{LastKey, _LV} = lists:last(KVList1),
|
||||||
{ok, Pid, {FirstKey, LastKey}, _Bloom} =
|
{ok, Pid, {FirstKey, LastKey}, Bloom} =
|
||||||
SSTNewFun(RP, Filename, Level, KVList1, length(KVList1), native),
|
SSTNewFun(RP, Filename, Level, KVList1, length(KVList1), native),
|
||||||
|
|
||||||
B0 = check_binary_references(Pid),
|
B0 = check_binary_references(Pid),
|
||||||
|
@ -3632,7 +3638,7 @@ simple_persisted_tester(SSTNewFun) ->
|
||||||
?assertMatch(SubKVList1L, length(FetchedList2)),
|
?assertMatch(SubKVList1L, length(FetchedList2)),
|
||||||
?assertMatch(SubKVList1, FetchedList2),
|
?assertMatch(SubKVList1, FetchedList2),
|
||||||
|
|
||||||
{Eight000Key, _v800} = lists:nth(800, KVList1),
|
{Eight000Key, V800} = lists:nth(800, KVList1),
|
||||||
SubKVListA1 = lists:sublist(KVList1, 10, 791),
|
SubKVListA1 = lists:sublist(KVList1, 10, 791),
|
||||||
SubKVListA1L = length(SubKVListA1),
|
SubKVListA1L = length(SubKVListA1),
|
||||||
FetchListA2 = sst_getkvrange(Pid, TenthKey, Eight000Key, 2),
|
FetchListA2 = sst_getkvrange(Pid, TenthKey, Eight000Key, 2),
|
||||||
|
@ -3664,7 +3670,7 @@ simple_persisted_tester(SSTNewFun) ->
|
||||||
Eight000Key,
|
Eight000Key,
|
||||||
4),
|
4),
|
||||||
FetchedListB4 = lists:foldl(FoldFun, [], FetchListB4),
|
FetchedListB4 = lists:foldl(FoldFun, [], FetchListB4),
|
||||||
?assertMatch([{Eight000Key, _v800}], FetchedListB4),
|
?assertMatch([{Eight000Key, V800}], FetchedListB4),
|
||||||
|
|
||||||
B1 = check_binary_references(Pid),
|
B1 = check_binary_references(Pid),
|
||||||
|
|
||||||
|
@ -3673,7 +3679,7 @@ simple_persisted_tester(SSTNewFun) ->
|
||||||
io:format(user, "Reopen SST file~n", []),
|
io:format(user, "Reopen SST file~n", []),
|
||||||
OptsSST = #sst_options{press_method=native,
|
OptsSST = #sst_options{press_method=native,
|
||||||
log_options=leveled_log:get_opts()},
|
log_options=leveled_log:get_opts()},
|
||||||
{ok, OpenP, {FirstKey, LastKey}, _Bloom} =
|
{ok, OpenP, {FirstKey, LastKey}, Bloom} =
|
||||||
sst_open(RP, Filename ++ ".sst", OptsSST, Level),
|
sst_open(RP, Filename ++ ".sst", OptsSST, Level),
|
||||||
|
|
||||||
B2 = check_binary_references(OpenP),
|
B2 = check_binary_references(OpenP),
|
||||||
|
|
|
@ -244,8 +244,8 @@ alter_segment(Segment, Hash, Tree) ->
|
||||||
%% Returns a list of segment IDs which hold differences between the state
|
%% Returns a list of segment IDs which hold differences between the state
|
||||||
%% represented by the two trees.
|
%% represented by the two trees.
|
||||||
find_dirtyleaves(SrcTree, SnkTree) ->
|
find_dirtyleaves(SrcTree, SnkTree) ->
|
||||||
_Size = SrcTree#tictactree.size,
|
Size = SrcTree#tictactree.size,
|
||||||
_Size = SnkTree#tictactree.size,
|
Size = SnkTree#tictactree.size,
|
||||||
|
|
||||||
IdxList = find_dirtysegments(fetch_root(SrcTree), fetch_root(SnkTree)),
|
IdxList = find_dirtysegments(fetch_root(SrcTree), fetch_root(SnkTree)),
|
||||||
SrcLeaves = fetch_leaves(SrcTree, IdxList),
|
SrcLeaves = fetch_leaves(SrcTree, IdxList),
|
||||||
|
|
|
@ -103,11 +103,11 @@ magichashperf_test() ->
|
||||||
{K, X}
|
{K, X}
|
||||||
end,
|
end,
|
||||||
KL = lists:map(KeyFun, lists:seq(1, 1000)),
|
KL = lists:map(KeyFun, lists:seq(1, 1000)),
|
||||||
{TimeMH, _HL1} = timer:tc(lists, map, [fun(K) -> magic_hash(K) end, KL]),
|
{TimeMH, HL1} = timer:tc(lists, map, [fun(K) -> magic_hash(K) end, KL]),
|
||||||
io:format(user, "1000 keys magic hashed in ~w microseconds~n", [TimeMH]),
|
io:format(user, "1000 keys magic hashed in ~w microseconds~n", [TimeMH]),
|
||||||
{TimePH, _Hl2} = timer:tc(lists, map, [fun(K) -> erlang:phash2(K) end, KL]),
|
{TimePH, _Hl2} = timer:tc(lists, map, [fun(K) -> erlang:phash2(K) end, KL]),
|
||||||
io:format(user, "1000 keys phash2 hashed in ~w microseconds~n", [TimePH]),
|
io:format(user, "1000 keys phash2 hashed in ~w microseconds~n", [TimePH]),
|
||||||
{TimeMH2, _HL1} = timer:tc(lists, map, [fun(K) -> magic_hash(K) end, KL]),
|
{TimeMH2, HL1} = timer:tc(lists, map, [fun(K) -> magic_hash(K) end, KL]),
|
||||||
io:format(user, "1000 keys magic hashed in ~w microseconds~n", [TimeMH2]).
|
io:format(user, "1000 keys magic hashed in ~w microseconds~n", [TimeMH2]).
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -169,18 +169,7 @@ encode_maybe_binary(Bin) ->
|
||||||
%% =================================================
|
%% =================================================
|
||||||
|
|
||||||
sync_strategy() ->
|
sync_strategy() ->
|
||||||
case erlang:system_info(otp_release) of
|
none.
|
||||||
"17" ->
|
|
||||||
sync;
|
|
||||||
"18" ->
|
|
||||||
sync;
|
|
||||||
"19" ->
|
|
||||||
sync;
|
|
||||||
_ ->
|
|
||||||
% running the sync strategy with OTP16 on macbook is
|
|
||||||
% super slow. So revert to no sync
|
|
||||||
none
|
|
||||||
end.
|
|
||||||
|
|
||||||
book_riakput(Pid, RiakObject, IndexSpecs) ->
|
book_riakput(Pid, RiakObject, IndexSpecs) ->
|
||||||
leveled_bookie:book_put(Pid,
|
leveled_bookie:book_put(Pid,
|
||||||
|
|
|
@ -1,323 +0,0 @@
|
||||||
-module(lookup_test).
|
|
||||||
|
|
||||||
-export([go_dict/1,
|
|
||||||
go_ets/1,
|
|
||||||
go_gbtree/1,
|
|
||||||
go_arrayofdict/1,
|
|
||||||
go_arrayofgbtree/1,
|
|
||||||
go_arrayofdict_withcache/1,
|
|
||||||
create_blocks/3,
|
|
||||||
size_testblocks/1,
|
|
||||||
test_testblocks/2]).
|
|
||||||
|
|
||||||
-define(CACHE_SIZE, 512).
|
|
||||||
|
|
||||||
hash(Key) ->
|
|
||||||
H = 5381,
|
|
||||||
hash1(H,Key) band 16#FFFFFFFF.
|
|
||||||
|
|
||||||
hash1(H,[]) ->H;
|
|
||||||
hash1(H,[B|Rest]) ->
|
|
||||||
H1 = H * 33,
|
|
||||||
H2 = H1 bxor B,
|
|
||||||
hash1(H2,Rest).
|
|
||||||
|
|
||||||
% Get the least significant 8 bits from the hash.
|
|
||||||
hash_to_index(Hash) ->
|
|
||||||
Hash band 255.
|
|
||||||
|
|
||||||
|
|
||||||
%%
|
|
||||||
%% Timings (microseconds):
|
|
||||||
%%
|
|
||||||
%% go_dict(200000) : 1569894
|
|
||||||
%% go_dict(1000000) : 17191365
|
|
||||||
%% go_dict(5000000) : forever
|
|
||||||
|
|
||||||
go_dict(N) ->
|
|
||||||
go_dict(dict:new(), N, N).
|
|
||||||
|
|
||||||
go_dict(_, 0, _) ->
|
|
||||||
{erlang:memory(), statistics(garbage_collection)};
|
|
||||||
go_dict(D, N, M) ->
|
|
||||||
% Lookup a random key - which may not be present
|
|
||||||
LookupKey = lists:concat(["key-", leveled_rand:uniform(M)]),
|
|
||||||
LookupHash = hash(LookupKey),
|
|
||||||
dict:find(LookupHash, D),
|
|
||||||
|
|
||||||
% Add a new key - which may be present so value to be appended
|
|
||||||
Key = lists:concat(["key-", N]),
|
|
||||||
Hash = hash(Key),
|
|
||||||
case dict:find(Hash, D) of
|
|
||||||
error ->
|
|
||||||
go_dict(dict:store(Hash, [N], D), N-1, M);
|
|
||||||
{ok, List} ->
|
|
||||||
go_dict(dict:store(Hash, [N|List], D), N-1, M)
|
|
||||||
end.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
%%
|
|
||||||
%% Timings (microseconds):
|
|
||||||
%%
|
|
||||||
%% go_ets(200000) : 609119
|
|
||||||
%% go_ets(1000000) : 3520757
|
|
||||||
%% go_ets(5000000) : 19974562
|
|
||||||
|
|
||||||
go_ets(N) ->
|
|
||||||
go_ets(ets:new(ets_test, [private, bag]), N, N).
|
|
||||||
|
|
||||||
go_ets(_, 0, _) ->
|
|
||||||
{erlang:memory(), statistics(garbage_collection)};
|
|
||||||
go_ets(Ets, N, M) ->
|
|
||||||
% Lookup a random key - which may not be present
|
|
||||||
LookupKey = lists:concat(["key-", leveled_rand:uniform(M)]),
|
|
||||||
LookupHash = hash(LookupKey),
|
|
||||||
ets:lookup(Ets, LookupHash),
|
|
||||||
|
|
||||||
% Add a new key - which may be present so value to be appended
|
|
||||||
Key = lists:concat(["key-", N]),
|
|
||||||
Hash = hash(Key),
|
|
||||||
ets:insert(Ets, {Hash, N}),
|
|
||||||
go_ets(Ets, N - 1, M).
|
|
||||||
|
|
||||||
%%
|
|
||||||
%% Timings (microseconds):
|
|
||||||
%%
|
|
||||||
%% go_gbtree(200000) : 1393936
|
|
||||||
%% go_gbtree(1000000) : 8430997
|
|
||||||
%% go_gbtree(5000000) : 45630810
|
|
||||||
|
|
||||||
go_gbtree(N) ->
|
|
||||||
go_gbtree(gb_trees:empty(), N, N).
|
|
||||||
|
|
||||||
go_gbtree(_, 0, _) ->
|
|
||||||
{erlang:memory(), statistics(garbage_collection)};
|
|
||||||
go_gbtree(Tree, N, M) ->
|
|
||||||
% Lookup a random key - which may not be present
|
|
||||||
LookupKey = lists:concat(["key-", leveled_rand:uniform(M)]),
|
|
||||||
LookupHash = hash(LookupKey),
|
|
||||||
gb_trees:lookup(LookupHash, Tree),
|
|
||||||
|
|
||||||
% Add a new key - which may be present so value to be appended
|
|
||||||
Key = lists:concat(["key-", N]),
|
|
||||||
Hash = hash(Key),
|
|
||||||
case gb_trees:lookup(Hash, Tree) of
|
|
||||||
none ->
|
|
||||||
go_gbtree(gb_trees:insert(Hash, [N], Tree), N - 1, M);
|
|
||||||
{value, List} ->
|
|
||||||
go_gbtree(gb_trees:update(Hash, [N|List], Tree), N - 1, M)
|
|
||||||
end.
|
|
||||||
|
|
||||||
|
|
||||||
%%
|
|
||||||
%% Timings (microseconds):
|
|
||||||
%%
|
|
||||||
%% go_arrayofidict(200000) : 1266931
|
|
||||||
%% go_arrayofidict(1000000) : 7387219
|
|
||||||
%% go_arrayofidict(5000000) : 49511484
|
|
||||||
|
|
||||||
go_arrayofdict(N) ->
|
|
||||||
go_arrayofdict(array:new(256, {default, dict:new()}), N, N).
|
|
||||||
|
|
||||||
go_arrayofdict(_, 0, _) ->
|
|
||||||
% dict:to_list(array:get(0, Array)),
|
|
||||||
% dict:to_list(array:get(1, Array)),
|
|
||||||
% dict:to_list(array:get(2, Array)),
|
|
||||||
% dict:to_list(array:get(3, Array)),
|
|
||||||
% dict:to_list(array:get(4, Array)),
|
|
||||||
% dict:to_list(array:get(5, Array)),
|
|
||||||
% dict:to_list(array:get(6, Array)),
|
|
||||||
% dict:to_list(array:get(7, Array)),
|
|
||||||
% dict:to_list(array:get(8, Array)),
|
|
||||||
% dict:to_list(array:get(9, Array)),
|
|
||||||
{erlang:memory(), statistics(garbage_collection)};
|
|
||||||
go_arrayofdict(Array, N, M) ->
|
|
||||||
% Lookup a random key - which may not be present
|
|
||||||
LookupKey = lists:concat(["key-", leveled_rand:uniform(M)]),
|
|
||||||
LookupHash = hash(LookupKey),
|
|
||||||
LookupIndex = hash_to_index(LookupHash),
|
|
||||||
dict:find(LookupHash, array:get(LookupIndex, Array)),
|
|
||||||
|
|
||||||
% Add a new key - which may be present so value to be appended
|
|
||||||
Key = lists:concat(["key-", N]),
|
|
||||||
Hash = hash(Key),
|
|
||||||
Index = hash_to_index(Hash),
|
|
||||||
D = array:get(Index, Array),
|
|
||||||
case dict:find(Hash, D) of
|
|
||||||
error ->
|
|
||||||
go_arrayofdict(array:set(Index,
|
|
||||||
dict:store(Hash, [N], D), Array), N-1, M);
|
|
||||||
{ok, List} ->
|
|
||||||
go_arrayofdict(array:set(Index,
|
|
||||||
dict:store(Hash, [N|List], D), Array), N-1, M)
|
|
||||||
end.
|
|
||||||
|
|
||||||
%%
|
|
||||||
%% Timings (microseconds):
|
|
||||||
%%
|
|
||||||
%% go_arrayofgbtree(200000) : 1176224
|
|
||||||
%% go_arrayofgbtree(1000000) : 7480653
|
|
||||||
%% go_arrayofgbtree(5000000) : 41266701
|
|
||||||
|
|
||||||
go_arrayofgbtree(N) ->
|
|
||||||
go_arrayofgbtree(array:new(256, {default, gb_trees:empty()}), N, N).
|
|
||||||
|
|
||||||
go_arrayofgbtree(_, 0, _) ->
|
|
||||||
% gb_trees:to_list(array:get(0, Array)),
|
|
||||||
% gb_trees:to_list(array:get(1, Array)),
|
|
||||||
% gb_trees:to_list(array:get(2, Array)),
|
|
||||||
% gb_trees:to_list(array:get(3, Array)),
|
|
||||||
% gb_trees:to_list(array:get(4, Array)),
|
|
||||||
% gb_trees:to_list(array:get(5, Array)),
|
|
||||||
% gb_trees:to_list(array:get(6, Array)),
|
|
||||||
% gb_trees:to_list(array:get(7, Array)),
|
|
||||||
% gb_trees:to_list(array:get(8, Array)),
|
|
||||||
% gb_trees:to_list(array:get(9, Array)),
|
|
||||||
{erlang:memory(), statistics(garbage_collection)};
|
|
||||||
go_arrayofgbtree(Array, N, M) ->
|
|
||||||
% Lookup a random key - which may not be present
|
|
||||||
LookupKey = lists:concat(["key-", leveled_rand:uniform(M)]),
|
|
||||||
LookupHash = hash(LookupKey),
|
|
||||||
LookupIndex = hash_to_index(LookupHash),
|
|
||||||
gb_trees:lookup(LookupHash, array:get(LookupIndex, Array)),
|
|
||||||
|
|
||||||
% Add a new key - which may be present so value to be appended
|
|
||||||
Key = lists:concat(["key-", N]),
|
|
||||||
Hash = hash(Key),
|
|
||||||
Index = hash_to_index(Hash),
|
|
||||||
Tree = array:get(Index, Array),
|
|
||||||
case gb_trees:lookup(Hash, Tree) of
|
|
||||||
none ->
|
|
||||||
go_arrayofgbtree(array:set(Index,
|
|
||||||
gb_trees:insert(Hash, [N], Tree), Array), N - 1, M);
|
|
||||||
{value, List} ->
|
|
||||||
go_arrayofgbtree(array:set(Index,
|
|
||||||
gb_trees:update(Hash, [N|List], Tree), Array), N - 1, M)
|
|
||||||
end.
|
|
||||||
|
|
||||||
|
|
||||||
%%
|
|
||||||
%% Timings (microseconds):
|
|
||||||
%%
|
|
||||||
%% go_arrayofdict_withcache(200000) : 1432951
|
|
||||||
%% go_arrayofdict_withcache(1000000) : 9140169
|
|
||||||
%% go_arrayofdict_withcache(5000000) : 59435511
|
|
||||||
|
|
||||||
go_arrayofdict_withcache(N) ->
|
|
||||||
go_arrayofdict_withcache({array:new(256, {default, dict:new()}),
|
|
||||||
array:new(256, {default, dict:new()})}, N, N).
|
|
||||||
|
|
||||||
go_arrayofdict_withcache(_, 0, _) ->
|
|
||||||
{erlang:memory(), statistics(garbage_collection)};
|
|
||||||
go_arrayofdict_withcache({MArray, CArray}, N, M) ->
|
|
||||||
% Lookup a random key - which may not be present
|
|
||||||
LookupKey = lists:concat(["key-", leveled_rand:uniform(M)]),
|
|
||||||
LookupHash = hash(LookupKey),
|
|
||||||
LookupIndex = hash_to_index(LookupHash),
|
|
||||||
dict:find(LookupHash, array:get(LookupIndex, CArray)),
|
|
||||||
dict:find(LookupHash, array:get(LookupIndex, MArray)),
|
|
||||||
|
|
||||||
% Add a new key - which may be present so value to be appended
|
|
||||||
Key = lists:concat(["key-", N]),
|
|
||||||
Hash = hash(Key),
|
|
||||||
Index = hash_to_index(Hash),
|
|
||||||
Cache = array:get(Index, CArray),
|
|
||||||
case dict:find(Hash, Cache) of
|
|
||||||
error ->
|
|
||||||
UpdCache = dict:store(Hash, [N], Cache);
|
|
||||||
{ok, _} ->
|
|
||||||
UpdCache = dict:append(Hash, N, Cache)
|
|
||||||
end,
|
|
||||||
case dict:size(UpdCache) of
|
|
||||||
?CACHE_SIZE ->
|
|
||||||
UpdCArray = array:set(Index, dict:new(), CArray),
|
|
||||||
UpdMArray = array:set(Index, dict:merge(fun merge_values/3, UpdCache, array:get(Index, MArray)), MArray),
|
|
||||||
go_arrayofdict_withcache({UpdMArray, UpdCArray}, N - 1, M);
|
|
||||||
_ ->
|
|
||||||
UpdCArray = array:set(Index, UpdCache, CArray),
|
|
||||||
go_arrayofdict_withcache({MArray, UpdCArray}, N - 1, M)
|
|
||||||
end.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
merge_values(_, Value1, Value2) ->
|
|
||||||
lists:append(Value1, Value2).
|
|
||||||
|
|
||||||
|
|
||||||
%% Some functions for testing options compressing term_to_binary
|
|
||||||
|
|
||||||
create_block(N, BlockType) ->
|
|
||||||
case BlockType of
|
|
||||||
keylist ->
|
|
||||||
create_block(N, BlockType, []);
|
|
||||||
keygbtree ->
|
|
||||||
create_block(N, BlockType, gb_trees:empty())
|
|
||||||
end.
|
|
||||||
|
|
||||||
create_block(0, _, KeyStruct) ->
|
|
||||||
KeyStruct;
|
|
||||||
create_block(N, BlockType, KeyStruct) ->
|
|
||||||
Bucket = <<"pdsRecord">>,
|
|
||||||
case N of
|
|
||||||
20 ->
|
|
||||||
Key = lists:concat(["key-20-special"]);
|
|
||||||
_ ->
|
|
||||||
Key = lists:concat(["key-", N, "-", leveled_rand:uniform(1000)])
|
|
||||||
end,
|
|
||||||
SequenceNumber = leveled_rand:uniform(1000000000),
|
|
||||||
Indexes = [{<<"DateOfBirth_int">>, leveled_rand:uniform(10000)}, {<<"index1_bin">>, lists:concat([leveled_rand:uniform(1000), "SomeCommonText"])}, {<<"index2_bin">>, <<"RepetitionRepetitionRepetition">>}],
|
|
||||||
case BlockType of
|
|
||||||
keylist ->
|
|
||||||
Term = {o, Bucket, Key, {Indexes, SequenceNumber}},
|
|
||||||
create_block(N-1, BlockType, [Term|KeyStruct]);
|
|
||||||
keygbtree ->
|
|
||||||
create_block(N-1, BlockType, gb_trees:insert({o, Bucket, Key}, {Indexes, SequenceNumber}, KeyStruct))
|
|
||||||
end.
|
|
||||||
|
|
||||||
|
|
||||||
create_blocks(N, Compression, BlockType) ->
|
|
||||||
create_blocks(N, Compression, BlockType, 10000, []).
|
|
||||||
|
|
||||||
create_blocks(_, _, _, 0, BlockList) ->
|
|
||||||
BlockList;
|
|
||||||
create_blocks(N, Compression, BlockType, TestLoops, BlockList) ->
|
|
||||||
NewBlock = term_to_binary(create_block(N, BlockType), [{compressed, Compression}]),
|
|
||||||
create_blocks(N, Compression, BlockType, TestLoops - 1, [NewBlock|BlockList]).
|
|
||||||
|
|
||||||
size_testblocks(BlockList) ->
|
|
||||||
size_testblocks(BlockList,0).
|
|
||||||
|
|
||||||
size_testblocks([], Acc) ->
|
|
||||||
Acc;
|
|
||||||
size_testblocks([H|T], Acc) ->
|
|
||||||
size_testblocks(T, Acc + byte_size(H)).
|
|
||||||
|
|
||||||
test_testblocks([], _) ->
|
|
||||||
true;
|
|
||||||
test_testblocks([H|T], BlockType) ->
|
|
||||||
Block = binary_to_term(H),
|
|
||||||
case findkey("key-20-special", Block, BlockType) of
|
|
||||||
true ->
|
|
||||||
test_testblocks(T, BlockType);
|
|
||||||
not_found ->
|
|
||||||
false
|
|
||||||
end.
|
|
||||||
|
|
||||||
findkey(_, [], keylist) ->
|
|
||||||
not_found;
|
|
||||||
findkey(Key, [H|T], keylist) ->
|
|
||||||
case H of
|
|
||||||
{o, <<"pdsRecord">>, Key, _} ->
|
|
||||||
true;
|
|
||||||
_ ->
|
|
||||||
findkey(Key,T, keylist)
|
|
||||||
end;
|
|
||||||
findkey(Key, Tree, keygbtree) ->
|
|
||||||
case gb_trees:lookup({o, <<"pdsRecord">>, Key}, Tree) of
|
|
||||||
none ->
|
|
||||||
not_found;
|
|
||||||
_ ->
|
|
||||||
true
|
|
||||||
end.
|
|
||||||
|
|
|
@ -1,51 +0,0 @@
|
||||||
-module(member_test).
|
|
||||||
|
|
||||||
-export([test_membership/0]).
|
|
||||||
|
|
||||||
-define(SEGMENTS_TO_CHECK, 32768). % a whole SST file
|
|
||||||
-define(MEMBERSHIP_LENGTHS, [8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]).
|
|
||||||
|
|
||||||
segments(Length) ->
|
|
||||||
AllSegs = lists:seq(1, ?SEGMENTS_TO_CHECK),
|
|
||||||
AllSegsBin =
|
|
||||||
lists:foldl(fun(I, Acc) -> <<Acc/binary, (I - 1):16/integer>> end,
|
|
||||||
<<>>,
|
|
||||||
AllSegs),
|
|
||||||
StartPos = leveled_rand:uniform(length(AllSegs) - Length),
|
|
||||||
{<<AllSegsBin/binary, AllSegsBin/binary,
|
|
||||||
AllSegsBin/binary, AllSegsBin/binary>>,
|
|
||||||
lists:sublist(AllSegs, StartPos, Length)}.
|
|
||||||
|
|
||||||
test_membership(Length) ->
|
|
||||||
{AllSegsBin, TestList} = segments(Length),
|
|
||||||
ExpectedOutput =
|
|
||||||
lists:reverse(TestList ++ TestList ++ TestList ++ TestList),
|
|
||||||
|
|
||||||
SW0 = os:timestamp(),
|
|
||||||
TestListFun = fun(I) -> lists:member(I, TestList) end,
|
|
||||||
true = test_binary(AllSegsBin, [], TestListFun) == ExpectedOutput,
|
|
||||||
ListT = timer:now_diff(os:timestamp(), SW0) / 131072,
|
|
||||||
|
|
||||||
SW1 = os:timestamp(),
|
|
||||||
TestSet = sets:from_list(TestList),
|
|
||||||
TestSetsFun = fun(I) -> sets:is_element(I, TestSet) end,
|
|
||||||
true = test_binary(AllSegsBin, [], TestSetsFun) == ExpectedOutput,
|
|
||||||
SetsT = timer:now_diff(os:timestamp(), SW1) / 131072,
|
|
||||||
|
|
||||||
io:format("Test with segment count ~w ..."
|
|
||||||
++ " took ~w ms per 1000 checks with list ..."
|
|
||||||
++ " took ~w ms per 1000 checks with set~n", [Length, ListT, SetsT]).
|
|
||||||
|
|
||||||
|
|
||||||
test_binary(<<>>, Acc, _TestFun) ->
|
|
||||||
Acc;
|
|
||||||
test_binary(<<0:1/integer, TestSeg:15/integer, Rest/binary>>, Acc, TestFun) ->
|
|
||||||
case TestFun(TestSeg) of
|
|
||||||
true ->
|
|
||||||
test_binary(Rest, [TestSeg|Acc], TestFun);
|
|
||||||
false ->
|
|
||||||
test_binary(Rest, Acc, TestFun)
|
|
||||||
end.
|
|
||||||
|
|
||||||
test_membership() ->
|
|
||||||
lists:foreach(fun(I) -> test_membership(I) end, ?MEMBERSHIP_LENGTHS).
|
|
|
@ -1,59 +0,0 @@
|
||||||
%% Test performance and accuracy of rice-encoded bloom filters
|
|
||||||
%%
|
|
||||||
%% Calling check_negative(2048, 1000000) should return about 122 false
|
|
||||||
%% positives in around 11 seconds, with a size below 4KB
|
|
||||||
%%
|
|
||||||
%% The equivalent positive check is check_positive(2048, 488) and this
|
|
||||||
%% should take around 6 seconds.
|
|
||||||
%%
|
|
||||||
%% So a blooom with 2048 members should support o(100K) checks per second
|
|
||||||
%% on a modern CPU, whilst requiring 2 bytes per member.
|
|
||||||
|
|
||||||
-module(rice_test).
|
|
||||||
|
|
||||||
-export([check_positive/2, check_negative/2, calc_hash/2]).
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
check_positive(KeyCount, LoopCount) ->
|
|
||||||
KeyList = produce_keylist(KeyCount),
|
|
||||||
Bloom = leveled_rice:create_bloom(KeyList),
|
|
||||||
check_positive(KeyList, Bloom, LoopCount).
|
|
||||||
|
|
||||||
check_positive(_, Bloom, 0) ->
|
|
||||||
{ok, byte_size(Bloom)};
|
|
||||||
check_positive(KeyList, Bloom, LoopCount) ->
|
|
||||||
true = leveled_rice:check_keys(KeyList, Bloom),
|
|
||||||
check_positive(KeyList, Bloom, LoopCount - 1).
|
|
||||||
|
|
||||||
|
|
||||||
produce_keylist(KeyCount) ->
|
|
||||||
KeyPrefix = lists:concat(["PositiveKey-", leveled_rand:uniform(KeyCount)]),
|
|
||||||
produce_keylist(KeyCount, [], KeyPrefix).
|
|
||||||
|
|
||||||
produce_keylist(0, KeyList, _) ->
|
|
||||||
KeyList;
|
|
||||||
produce_keylist(KeyCount, KeyList, KeyPrefix) ->
|
|
||||||
Key = lists:concat([KeyPrefix, KeyCount]),
|
|
||||||
produce_keylist(KeyCount - 1, [Key|KeyList], KeyPrefix).
|
|
||||||
|
|
||||||
|
|
||||||
check_negative(KeyCount, CheckCount) ->
|
|
||||||
KeyList = produce_keylist(KeyCount),
|
|
||||||
Bloom = leveled_rice:create_bloom(KeyList),
|
|
||||||
check_negative(Bloom, CheckCount, 0).
|
|
||||||
|
|
||||||
check_negative(Bloom, 0, FalsePos) ->
|
|
||||||
{byte_size(Bloom), FalsePos};
|
|
||||||
check_negative(Bloom, CheckCount, FalsePos) ->
|
|
||||||
Key = lists:concat(["NegativeKey-", CheckCount, leveled_rand:uniform(CheckCount)]),
|
|
||||||
case leveled_rice:check_key(Key, Bloom) of
|
|
||||||
true -> check_negative(Bloom, CheckCount - 1, FalsePos + 1);
|
|
||||||
false -> check_negative(Bloom, CheckCount - 1, FalsePos)
|
|
||||||
end.
|
|
||||||
|
|
||||||
calc_hash(_, 0) ->
|
|
||||||
ok;
|
|
||||||
calc_hash(Key, Count) ->
|
|
||||||
erlang:phash2(lists:concat([Key, Count, "sometxt"])),
|
|
||||||
calc_hash(Key, Count -1).
|
|