diff --git a/docs/ANTI_ENTROPY.md b/docs/ANTI_ENTROPY.md index e7a89d7..a2f165e 100644 --- a/docs/ANTI_ENTROPY.md +++ b/docs/ANTI_ENTROPY.md @@ -330,6 +330,32 @@ These rates change with object size with leveldb, but not with leveled. so when The intention is though, not to use list_keys operations in AAE - but instead to dynamically produce TicTac Trees either by folding objects (or heads with leveled) or folding over specialist indexes. More relevant tests will follow in the next phase. +### Some Implementation Considerations + +One thing that deserves debate with regards to multi-data-centre replication, is should the existing AAE process just be abandoned. There have been three considerations that have driven this implementation towards considering abandonment: + +- The bad history of unexplained problems. Issues with AAE commonly been associated with [hashing of unsorted objects](https://github.com/basho/riak_kv/issues/1189), but issues continue to be seen, especially after node joins and leaves even when running versions that have this resolved. It is feared that the knowledge of the existence of this issue has led to other AAE issues not being investigated, as it was more efficient to simply assume it was issue 1189. + +- The problems of upgrading the hashtree. A software change which alters the form of the existing hashtree is not going to be easy to implement, and includes a non-functional risk associated with handling two hashtrees in parallel for a transition period. + +- A general sense of complexity with regards to existing AAE, especially around locking and rebuilding. Also a general fear of the complexity and long-term supportability of eleveldb: although the historical stability of the project indicates it wis well written, and each branch is documented to a high standard, it requires a significant context shift for those used to working with Erlang. + +With a leveled backend, and the efficient support for dynamically building a TicTac tree without a separate AAE database - the AAE database appears now redundant and its disadvantages means an legacy-AAE-free approach seems optimal. However, there are problems with trying the same new approach with eleveldb and bitcask backends: + +- the slow fold times highlighted in [phase 1 testing](ANTI_ENTROPY.md#phase-1---initial-test-of-folds-with-core-node_worker_pool) when using leveldb backends. + +- the difficulty of managing bitcask snapshots, as they appear to require the database to be re-opened by another process - and so may require all keys to be held in memory twice during the life of the snapshot). Presumably bitcask folds will also be similarly slow. + +It should also be noted that: + +- there is no advantage of removing eleveldb as a dependency, if it is still in use as a backend. + +- there may be a history of false accusations of AAE problems, as the AAE throttle is applied in Riak even when AAE is not disabled - so the correlation of AAE throttling announcements in logs and short-term Riak performance issues may have created a false impression of this stability being AAE related. the riak_kv_entropy_manager applies the throttle whenever there are node connectivity issues, regardless of AAE activity. + +- if a consistent hashing algorithm is forced (between AAE merkle trees and TicTac trees), it should be possible to scan a legacy AAE store and produce a mergeable TicTac tree. There may be a simple transition from Merkle trees as-is to TicTac trees to-be, which will allow for mixed ring-size multi-data-centre comparison (i.e. by running a coverage fold over the AAE trees). + +So moving forward to phase 2, consideration is being given to having a ``native_aaetree`` capability, that would be supported by the Leveled backend. If this capability exists, then when handling a coverage fold for an AAE report, the riak_kv_vnode would request a ``{queue, Folder}`` directly from the backend to be sent to the core_node_worker_pool. If such a capability doesn't exist, the ``{queue, Folder}`` would instead be requested from the hashtree PID for that vnode, not the actual backend. + ### Phase 2 tbc