Update VOLUME.md

Minor edits for clarity
This commit is contained in:
Martin Sumner 2017-04-24 11:56:11 +01:00 committed by GitHub
parent b0c91172f4
commit e4c6dc7119

View file

@ -152,7 +152,7 @@ These tests have been completed using the following static characteristics which
- 5 x i2.2x nodes,
- 6 hour duration.
This is the test used in Phase 1. Note that since Phase 1 was completed a number of performance improvements have been made in leveled, so that the starting gap between Riak/leveled and Riak/leveldb has widened.
This is [a test used in Phase 1](https://github.com/martinsumner/leveled/blob/master/docs/VOLUME.md#mid-size-object-ssds-no-sync-on-write). Note that since Phase 1 was completed a number of performance improvements have been made in leveled, so that the starting gap between Riak/leveled and Riak/leveldb has widened.
The tests have been run using the new riak_kv_sweeper facility within develop. This feature is an alternative approach to controlling and scheduling rebuilds, allowing for other work to be scheduled into the same fold. As the test is focused on hashtree rebuilds, the test was run with:
@ -163,9 +163,9 @@ The 3-hour rebuild timer is not a recommended configuration, it is an artificial
In the current Riak develop branch all sweeps use the Mod:fold_objects/4 function in the backend behaviour. In the testing of Riak/leveled this was changed to allow use of the new Mod:fold_heads/4 function available in the leveled backend (which can be used if the backend supports the fold_heads capability).
In clusters which have fully migrated to Riak 2.2, the hashtrees are built from a hash of the vector clock, not the object - handling the issue of consistent hashing without canonicalisation of riak objects. This means that hashtrees can be rebuilt without knowledge of the object itself. However, the purpose of rebuilding the hashtree is to ensure that the hashtree represents the data that is still present in the store, as opposed to the assumed state of the store based on the history of changes. Rebuilding hashtrees is part of the defence against accidental deletion (e.g. through user error), and data corruption within the store where no read-repair is other wise triggered. So although leveled can make hashtree rebuilds faster by only folding over the heads, this only answers part of the problem. A rebuild based on heads only proves deletion/corruption has not occurred in the Ledger, but doesn't rule out the possibility that deletion/corruption has occurred in the Journal.
In clusters which have fully migrated to Riak 2.2, the hashtrees are built from a hash of the vector clock, not the object - handling the issue of consistent hashing without canonicalisation of riak objects. This means that hashtrees can be rebuilt without knowledge of the object itself. However, the purpose of rebuilding the hashtree is to ensure that the hashtree represents the data that is still present in the store, as opposed to the assumed state of the store based on the history of changes. Rebuilding hashtrees is part of the defence against accidental deletion (e.g. through user error), and data corruption within the store where no read-repair is other wise triggered. So although leveled can make hashtree rebuilds faster by only folding over the heads, this only answers part of the problem: a rebuild based on heads only proves deletion/corruption has not occurred in the Ledger, but doesn't rule out the possibility that deletion/corruption has occurred in the Journal.
The fold_heads/4 implementation in leveled partially answers the challenge of Journal deletion or corruption by checking for presence in the Journal as part of the fold. Presence checking means that the object sequence number is in the Journal manifest, and the hash of the Key & sequence number is the lookup tree for that Journal. It is expected that corruption of blocks within journal files will be handled as part of the compaction work to be tested in Phase 3.
The fold_heads/4 implementation in leveled partially answers the challenge of Journal deletion or corruption by checking for presence in the Journal as part of the fold. Presence checking means that the object sequence number is in the Journal manifest, and the hash of the Key & sequence number is the lookup tree for that Journal. It is expected that corruption of blocks within journal files will be handled as part of the compaction work to be tested in Phase 3. The branch tested here falls short of CRC checking the object value itself as stored on disk, which would be checked naturally as part of the fold within leveldb.
### Leveled AAE rebuild with journal check