Can we take the most basic protocol that satisfies the most basic liveness property (obstruction freedom) and "upgrade" it to bounded wait freedom? **Contention manager:** is an object that allows progress of processes by providing contention-free periods for completing their invocations. It provides 2 operations: - `need_help(i)`: invoked by $p_i$ when it discovers that there is contention - `stop_help(i)`: invoked by $p_{i}$ when it terminates its current invocation **Enriched implementation:** when a process realizes that there is contention, it invokes need_help; when it completes its current operation, it invokes stop_help. >[!question] Why is it different from lock/unlock? >Because this allows failures, and they can also happen in the contention-free period. **PROBLEM:** to distinguish a failure from a long delay, we need objects called ***failure detectors*** (FDs), that provide processes information on the failed processes of the system. According to the type/quality of the info, several FDs can be defined. **Eventually restricted leadership:** given a non-empty set of process IDs X, the failure detector $\Omega_{X}$ provides each process a local variable `ev_leader(X)` such that: 1. *(Validity)* `ev_leader(x)` always contains a process ID 2. *(Eventual leadership)* eventually, all `ev_leader(X)` of all non-crashed processes of X for ever contain the same process ID, that is one of them REMARK: the moment in which all variables contain the same leader is unknown. ### From obstruction-freedom to non-blocking ``` NEED_HELP[1..n] : SWMR atomic R/W boolean registers init at false need_help(i) := NEED_HELP[i] <- true repeat X <- {j : NEED_HELP[j]} until ev_leader(X) = i # loopa finché ev_leader non uguale per tutti stop_help(i) := NEED_HELP[i] <- false ``` **Theorem:** the contention manager just seen transforms an obstr.-free implementation into a non-blocking enriched implementation. *Proof:* By contr., $\exists \tau$ s.t. $\exists$ many (> 0) op.'s invoked concurrently that never terminate. Let Q be the set of proc.'s that performed these invocations. - by enrichment, eventually `NEED_HELP[i]=true` forever ($\forall i\in Q$) - since crashes are fail-stop, eventually `NEED_HELP[j]` is no longer modified ($\forall j \not \in Q$) - $\exists \tau' \geq \tau$ where all proc.'s in Q compute the same X **Observation:** $Q \subseteq X$ (it is possible that $p_j$ sets `NEED_HELP[j]` and then fails) By definition of $\Omega_{X}, \exists \tau'' \geq t'$ s.t. all proc.'s in Q have the same `ev_leader(X)` - the leader belongs to Q, since it cannot be failed - this is the only process allowed to proceed - because run in isolation, it eventually terminates (because of obstruction freedom) #### On implementing $\Omega$ It can be proved that there exists no wait-free implementation of $\Omega$ in an asynchronous system with atomic R/W registers and any number of crashes - crashes are indistinguishable from long delays - need of timing constraints 1. $\exists$ time $\tau_{1}$, time interval $\nabla$ and correct process $p_{L}$ s.t. after $\tau_{1}$ every two consecutive writes to a specific SWMR atomic R/W by $p_{L}$ are at most $\nabla$ time units apart one from the other 2. let t be an upper bound on the number of possible failing processes and f the real number of process failed (hence, $0\leq f\leq t\leq n-1$, with f unknown and t known in advance). Then, there are at least $t-f$ correct processes different from $p_L$ with a timer s.t. $\exists$ time $\tau_{2} \forall$ time interval $\delta$, if their timer is set to $\delta$ after $\tau_{2}$, it expires at least after $\delta$. REMARK: $\tau_{1}, \tau_{2}, \nabla$ and $p_L$ are all unknown. IDEA: - `PROGRESS[1..n]` is an array of SWMR atomic registers used by proc’s to signal that they’re alive - pi suspects pj if pi doesn’t see any progress of pj after a proper time interval (to be guessed) set in its timer - the leader is the least suspected process, or the one with smallest/biggest ID among the least suspected ones (if there are more than one) - this changes in time, but not forever Guessing the time duration for suspecting a process: - SUSPECT[i,j] = #times pi has suspected pj - For all k, take the t+1 minimum values in SUSPECT[1..n , k] - Sum them, to obtain Sk - The interval to use in the timers is the minimum Sk - it can be proved that this eventually becomes ≥ $\nabla$ ### From obstruction-freedom to wait-freedom **Eventually perfect:** failure detector ♢P provides each process $p_i$ a local variable $suspected_i$ such that 1. *(Eventual completeness)* eventually, $suspended_{i}$ contains all the indexes of crashed processes, for all correct $p_i$ 2. (*Eventual accuracy*) eventually, $suspected_{i}$ contains only indexes of crashed processes, for all correct $p_{i}$. **Definition:** A failure detector FD1 is **stronger** than a failure detector FD2 if there exists an algorithm that builds FD2 from instances of FD1 and atomic R/W registers. **Proposition:** ♢P is stronger than $\Omega_{X}$ *Proof:* Forall i - i ∉ X $\to$ `ev_leader_i(X)` is any ID (and may change in time) - $i \in X \to$ `ev_leader_i(X)` $= min\left(( Π \setminus suspected_{i}) \cap X \right)$ where $Π$ denotes the set of all proc. IDs. $\Omega_{X}$ is not stronger than ♢P (so, ♢P is strictly stronger) The formal proof consists in showing that if $\Omega$ was stronger than ♢P, then consensus would be possible in an asynchronous system with crashes and atomic R/W registers. #### From obstruction-freedom to wait-freedom We assume a weak timestamp generator, i.e. a function such that, if it returns a positive value t to some process, only a finite number of invocations can obtain a timestamp smaller than or equal to t ``` TS[1..n] : SWMR atomic R/W registers init at 0 ... ```