diff --git a/docs/FUTURE.md b/docs/FUTURE.md index 6d88bad..ac58bde 100644 --- a/docs/FUTURE.md +++ b/docs/FUTURE.md @@ -93,19 +93,17 @@ The feature will not at present work safely with legacy vclocks. This branch gen In tests, the benefit of this may not be that significant - as the primary resource saved is disk/network, and so if these are not the resources under pressure, the gain may not be significant. In tests bound by CPU not disks, only a 10% improvement has so far been measured with this feature. -#### 1 GET n-1 HEAD +#### 'n HEADs 1 GET' or '1 GET n-1 HEADs' -A potential alternative to perfoming n HEADS and then 1 GET, would be for the FSM to make 1 GET request and in parallel make n-1 HEAD requests. This would, in an under-utilised cluster, require less resource and is likely to be lower latency. Perhaps the 1 GET request vnode could be selected in a similar style to the PUT FSM put coordinator by starting the FSM on a node in the preflist and using the local vnode as the GET vnode, and the remote vnodes would be chosen as the HEAD request nodes for clock comparison. +A potential alternative to perfoming n HEADs and then 1 GET, would be for the FSM to make 1 GET request and in parallel make n-1 HEAD requests. This would, in an under-utilised cluster, require less resource and is likely to be lower latency. Perhaps the 1 GET request vnode could be selected in a similar style to the PUT FSM put coordinator by starting the FSM on a node in the preflist and using the local vnode as the GET vnode, and the remote vnodes would be chosen as the HEAD request nodes for clock comparison. -The primary reason why this approach has not been chosen, and the n HEADS followed by 1 GET mode has been preferred, is to do with variable vnode mailbox lengths. +The primary reason why this approach has not been chosen, and the n HEADs followed by 1 GET mode has been preferred, is to do with the potential for variable vnode mailbox lengths. -When a cluster is under heavy pressure, especially when a cluster has been expanded so that there are a low number of vnodes per node, vnode mailbox sizes can vary and some vnodes may go into overload status (as the mailbox is not unbounded). At this stage the client must back-off, and the cluster must run at the pace of the slowest vnode. Ideally, if a vnode is slow, it should be given less work. This is a positive advantage of the n HEAD requests followed by 1 GET, the first responder is the one elected to perform the GET, and the slower responders miss out on that workload. This means that naturally slower vnodes (such as those with longer mailbox queues), are given less work by avoiding the expensive GET requests, and the overload scenario is likely to be avoided. +When a cluster is under heavy pressure, especially when a cluster has been expanded so that there are a low number of vnodes per node, vnode mailbox sizes can vary and some vnodes may go into overload status (as the mailbox is bounded by the overload size). At the overload response stage the client must back-off, and the cluster must run at the pace of the slowest vnode. Ideally, if a vnode is slow, it should be given less work. This is a positive advantage of the n HEAD requests followed by 1 GET, the first responder is the one elected to perform the GET, and the slower responders miss out on that workload. This means that naturally slower vnodes (such as those with longer mailbox queues), are given less work by avoiding the expensive GET requests, and the overload scenario is likely to be avoided. -The issue with performing 1 GET and n-1 HEAD requests, what if the slow vnode (say one with a mailbox 2,000 long) is selected for the GET request (assuming the GET vnode now has to be selected at random, as there has been no test HEAD message to calibrate which vnode to use). Once the n-1 HEAD requests have completed, how long should the FSM wait for the response to the GET request, especially as the GET request may be delayed naturally due to the value being large rather than due to the vnode being slow. If the FSM times out aggresively, then larger object requests are more likely to be made more than once. If the timeout is loose, then there will be many delays caused by one slow vnode, even when that vnode is not in the overload state. With the n HEAD 1 GET approach we have evidence that the vnode chosen for the GET is active and fast, as it is the first responder, and so the FSM can wait until the FSM timeout (at risk of failing a request when the fialure occurrs between the HEAD and the GET request). The 1 GET and n-1 HEAD requests doesn't avoid the slow vnode problem, and required seeming unresolvable reasoning about timeouts if the chosen GET node does not respond quickly. +The issue with performing 1 GET and n-1 HEAD requests, what if the slow vnode (say one with a mailbox 2,000 long) is selected for the GET request (assuming the GET vnode now has to be selected at random, as there has been no test HEAD message to calibrate which vnode to use), the request may last the duration of this response. Once the n-1 HEAD requests have completed, how long should the FSM wait for the response to the GET request, especially as the GET request may be delayed naturally due to the value being large rather than due to the vnode being slow? If the FSM times out aggresively, then larger object requests are more likely to be made more than once - the most expensive requests don't gain the benefit of the optimisation. If the timeout is loose, then there will be many delays caused by one slow vnode, even when that vnode is not in the overload state. With the n HEAD 1 GET approach the FSM has evidence that the vnode chosen for the GET is active and fast, as it is the first responder, and so the FSM can wait until the FSM timeout (at risk of failing a request when the fialure occurrs between the HEAD and the GET request). The 1 GET and n-1 HEAD requests doesn't avoid the slow vnode problem, and required seeming unresolvable reasoning about timeouts if the chosen GET node does not respond quickly. -During transition, there will be a state where some vnodes support HEAD requests, and others still use eleveldb and so do not. In this case with n HEAD requests, some vnodes will respond with the object, as the vnode will revert back to a GET request if the "head" capability is not present in the backend. If one of the quorum HEAD responses is an object, and there is no sibling detected, then the GET request will be avoided. - -One potential optimisation for leveled where some testing has been performed, was caching positive responses to HEAD requests for a short period in the sst files. There are no currency issues with this cache as it is at an immutable file level. This means that the second GET request can take advantage of that caching at a SST level, and should be slightly faster. However, in volume tests this did not appear to make a noticeable difference - perhaps due the relative cost of all the failed checks to the recent request cache. +One potential optimisation for leveled where some testing has been performed, was caching positive responses to HEAD requests for a short period in the SST files. There are no currency issues with this cache as it is at an immutable file level. This means that the second GET request can take advantage of that caching at a SST level, and should be slightly faster. However, in volume tests this did not appear to make a noticeable difference - perhaps due the relative cost of all the failed checks to the recent request cache. ### PUT -> Using HEAD