Edited for clarity

2017-09-29 10:28:35 +01:00 · 2017-09-29 10:28:35 +01:00 · 056e65ff74
commit 056e65ff74
parent 855b1d3ad8
1 changed files with 3 additions and 3 deletions
--- a/docs/FUTURE.md
+++ b/docs/FUTURE.md
@ -95,13 +95,13 @@ In tests, the benefit of this may not be that significant - as the primary resou

 #### 'n HEADs 1 GET' or '1 GET n-1 HEADs'

-A potential alternative to perfoming n HEADs and then 1 GET, would be for the FSM to make 1 GET request and in parallel make n-1 HEAD requests.  This would, in an under-utilised cluster, require less resource and is likely to be lower latency.  Perhaps the 1 GET request vnode could be selected in a similar style to the PUT FSM put coordinator by starting the FSM on a node in the preflist and using the local vnode as the GET vnode, and the remote vnodes would be chosen as the HEAD request nodes for clock comparison.
+A potential alternative to perfoming n HEADs and then 1 GET, would be for the FSM to make 1 GET request and in parallel make n-1 HEAD requests.  This would, in an under-utilised cluster, require less resource and is likely to return a response with lower average latency.  The 1 GET request vnode could be selected in a similar style to the PUT FSM put coordinator by starting the FSM on a node in the preflist and using the local vnode as the GET vnode, and the remote vnodes would be chosen as the HEAD request nodes for clock comparison.  

 The primary reason why this approach has not been chosen, and the n HEADs followed by 1 GET mode has been preferred, is to do with the potential for variable vnode mailbox lengths.  

-When a cluster is under heavy pressure, especially when a cluster has been expanded so that there are a low number of vnodes per node, vnode mailbox sizes can vary and some vnodes may go into overload status (as the mailbox is bounded by the overload size).  At the overload response stage the client must back-off, and the cluster must run at the pace of the slowest vnode.  Ideally, if a vnode is slow, it should be given less work.  This is a positive advantage of the n HEAD requests followed by 1 GET, the first responder is the one elected to perform the GET, and the slower responders miss out on that workload.  This means that naturally slower vnodes (such as those with longer mailbox queues), are given less work by avoiding the expensive GET requests, and the overload scenario is likely to be avoided.
+When a cluster is under heavy pressure, especially when a cluster has been expanded so that there are a low number of vnodes per node, vnode mailbox sizes can vary and some vnodes may go into overload status (as the mailbox is bounded by the overload size).  At the overload response stage the client must back-off, and the cluster must run at the pace of the slowest vnode.  Ideally, if a vnode is slow, it should be given less work.  This is a positive advantage of the n HEAD requests followed by 1 GET, the first responder is the one elected to perform the GET, and the slower responders miss out on that workload.  This means that naturally slower vnodes (such as those with longer mailbox queues), are given less work by avoiding the expensive GET requests, and the overload scenario is likely to be avoided.  The approach is designed to work better when one or more vnodes are running slower - perhaps due to the presence of a background process or a hardware failure.

-The issue with performing 1 GET and n-1 HEAD requests, what if the slow vnode (say one with a mailbox 2,000 long) is selected for the GET request (assuming the GET vnode now has to be selected at random, as there has been no test HEAD message to calibrate which vnode to use), the request may last the duration of this response.  Once the n-1 HEAD requests have completed, how long should the FSM wait for the response to the GET request, especially as the GET request may be delayed naturally due to the value being large rather than due to the vnode being slow?  If the FSM times out aggresively, then larger object requests are more likely to be made more than once - the most expensive requests don't gain the benefit of the optimisation.  If the timeout is loose, then there will be many delays caused by one slow vnode, even when that vnode is not in the overload state.  With the n HEAD 1 GET approach the FSM has evidence that the vnode chosen for the GET is active and fast, as it is the first responder, and so the FSM can wait until the FSM timeout (at risk of failing a request when the fialure occurrs between the HEAD and the GET request).  The 1 GET and n-1 HEAD requests doesn't avoid the slow vnode problem, and required seeming unresolvable reasoning about timeouts if the chosen GET node does not respond quickly. 
+The issue with performing 1 GET and n-1 HEAD requests, what if the slow vnode (say one with a mailbox 2,000 long) is selected for the GET request (given the GET vnode has to be selected without a test HEAD message to calibrate which vnode to use), the request may last the duration of this response.  Once the n-1 HEAD requests have completed, how long should the FSM wait for the response to the GET request, especially as the GET request may be delayed naturally due to the value being large rather than due to the vnode being slow or down?  If the FSM times out aggresively, then larger object requests are more likely to be made more than once - the most expensive requests don't gain the benefit of the optimisation.  If the timeout is loose, then there will be many delays caused by one slow vnode, even when that vnode is not in the overload state.  With the n HEAD 1 GET approach the FSM has evidence that the vnode chosen for the GET is active and fast, as it is the first responder, and so the FSM can wait until the FSM timeout (at risk of failing a request when the failure occurrs between the HEAD and the GET request).  The 1 GET and n-1 HEAD requests doesn't avoid the slow vnode problem, and required difficult reasoning about timeouts if the chosen GET node does not respond quickly. 

 One potential optimisation for leveled where some testing has been performed, was caching positive responses to HEAD requests for a short period in the SST files.  There are no currency issues with this cache as it is at an immutable file level.  This means that the second GET request can take advantage of that caching at a SST level, and should be slightly faster.  However, in volume tests this did not appear to make a noticeable difference - perhaps due the relative cost of all the failed checks to the recent request cache.