6.3 KiB
Can we take the most basic protocol that satisfies the most basic liveness property (obstruction freedom) and "upgrade" it to bounded wait freedom?
Contention manager: is an object that allows progress of processes by providing contention-free periods for completing their invocations. It provides 2 operations:
need_help(i)
: invoked byp_i
when it discovers that there is contentionstop_help(i)
: invoked byp_{i}
when it terminates its current invocation
Enriched implementation: when a process realizes that there is contention, it invokes need_help; when it completes its current operation, it invokes stop_help.
[!question] Why is it different from lock/unlock? Because this allows failures, and they can also happen in the contention-free period.
PROBLEM: to distinguish a failure from a long delay, we need objects called failure detectors (FDs), that provide processes information on the failed processes of the system. According to the type/quality of the info, several FDs can be defined.
Eventually restricted leadership: given a non-empty set of process IDs X, the failure detector \Omega_{X}
provides each process a local variable ev_leader(X)
such that:
- (Validity)
ev_leader(x)
always contains a process ID - (Eventual leadership) eventually, all
ev_leader(X)
of all non-crashed processes of X for ever contain the same process ID, that is one of them
REMARK: the moment in which all variables contain the same leader is unknown.
From obstruction-freedom to non-blocking
NEED_HELP[1..n] : SWMR atomic R/W boolean registers init at false
need_help(i) :=
NEED_HELP[i] <- true
repeat
X <- {j : NEED_HELP[j]}
until ev_leader(X) = i # loopa finché non è lui stesso il leader (PER TUTTI)
stop_help(i) :=
NEED_HELP[i] <- false
Theorem: the contention manager just seen transforms an obstr.-free implementation into a non-blocking enriched implementation.
Proof:
By contr., \exists \tau
s.t. \exists
many (> 0) op.'s invoked concurrently that never terminate.
Let Q be the set of proc.'s that performed these invocations.
- by enrichment (def. at the beginning), eventually
NEED_HELP[i]=true
forever (\forall i\in Q
) - since crashes are fail-stop, eventually
NEED_HELP[j]
is no longer modified (\forall j \not \in Q
)\exists \tau' \geq \tau
where all proc.'s in Q compute the same X
Observation: Q \subseteq X
(it is possible that p_j
sets NEED_HELP[j]
and then fails)
By definition of \Omega_{X}, \exists \tau'' \geq t'
s.t. all proc.'s in Q have the same ev_leader(X)
- the leader belongs to Q, since
- it is involved in the contention
- it can't be failed (by definition of
ev_leader
)
- this is the only process allowed to proceed
- because run in isolation, it eventually terminates (because of obstruction freedom)
On implementing \Omega
It can be proved that there exists no wait-free implementation of \Omega
in an asynchronous system with atomic R/W registers and any number of crashes
- crashes are indistinguishable from long delays
- need of timing constraints
-
\exists
time\tau_{1}
, time interval\nabla
and correct processp_{L}
s.t. after\tau_{1}
every two consecutive writes to a specific SWMR atomic R/W byp_{L}
are at most\nabla
time units apart one from the other (in pratica esiste un processo che non fallisce che aggiorna un registro almeno ogni\nabla
, che ci permette di essere sicuri che non sia crashato ma stia facendo le sue cose) -
let t be an upper bound on the number of possible failing processes and f the real number of process failed (hence,
0\leq f\leq t\leq n-1
, with f unknown and t known in advance). Then, there are at leastt-f
correct processes different fromp_L
with a timer s.t.\exists
time\tau_{2}
for each time interval\delta
, if their timer is set to\delta
after\tau_{2}
, it expires at least after\delta
. (stiamo dicendo che il timer scade sicuramente dopo\delta
, il che ci permette di non considerare erroneamente come fallito il processo. Perché non esattamente a\delta
? Perché è un sistema asincrono e non c'è un clock globale)
[!warning] Remark
\tau_{1}, \tau_{2}, \nabla
andp_L
are all unknown :/
IDEA:
PROGRESS[1..n]
is an array of SWMR atomic registers used by proc’s to signal that they’re alivep_{i}
suspectsp_j
if pi doesn’t see any progress ofp_{j}
after a proper time interval (to be guessed) set in its timer- the leader is the least suspected process, or the one with smallest/biggest ID among the least suspected ones (if there are more than one)
- this changes in time, but not forever
Guessing the time duration for suspecting a process:
- SUSPECT[i,j] = #times pi has suspected pj
- For all k, take the t+1 minimum values in SUSPECT[1..n , k]
- Sum them, to obtain Sk
- The interval to use in the timers is the minimum Sk
- it can be proved that this eventually becomes ≥
\nabla
- it can be proved that this eventually becomes ≥
From obstruction-freedom to wait-freedom
Eventually perfect: failure detector ♢P provides each process p_i
a local variable suspected_i
such that
- (Eventual completeness) eventually,
suspended_{i}
contains all the indexes of crashed processes, for all correctp_i
- (Eventual accuracy) eventually,
suspected_{i}
contains only indexes of crashed processes, for all correctp_{i}
.
Definition: A failure detector FD1 is stronger than a failure detector FD2 if there exists an algorithm that builds FD2 from instances of FD1 and atomic R/W registers.
Proposition: ♢P is stronger than \Omega_{X}
Proof:
Forall i
- i ∉ X
\to
ev_leader_i(X)
is any ID (and may change in time) i \in X \to
ev_leader_i(X)
= min\left(( Π \setminus suspected_{i}) \cap X \right)
whereΠ
denotes the set of all proc. IDs.
\Omega_{X}
is not stronger than ♢P (so, ♢P is strictly stronger)
The formal proof consists in showing that if \Omega
was stronger than ♢P, then consensus would be possible in an asynchronous system with crashes and atomic R/W registers.
From obstruction-freedom to wait-freedom
We assume a weak timestamp generator, i.e. a function such that, if it returns a positive value t to some process, only a finite number of invocations can obtain a timestamp smaller than or equal to t
TS[1..n] : SWMR atomic R/W registers init at 0
...