master-degree-notes/Concurrent Systems/notes/8 - Enhancing Liveness Properties.md

142 lines
No EOL
8.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Can we take the most basic protocol that satisfies the most basic liveness property (obstruction freedom) and "upgrade" it to bounded wait freedom?
**Contention manager:** is an object that allows progress of processes by providing contention-free periods for completing their invocations. It provides 2 operations:
- `need_help(i)`: invoked by $p_i$ when it discovers that there is contention
- `stop_help(i)`: invoked by $p_{i}$ when it terminates its current invocation
**Enriched implementation:** when a process realizes that there is contention, it invokes need_help; when it completes its current operation, it invokes stop_help.
>[!question] Why is it different from lock/unlock?
>Because this allows failures, and they can also happen in the contention-free period.
**PROBLEM:** to distinguish a failure from a long delay, we need objects called ***failure detectors*** (FDs), that provide processes information on the failed processes of the system. According to the type/quality of the info, several FDs can be defined.
**Eventually restricted leadership:** given a non-empty set of process IDs X, the failure detector $\Omega_{X}$ provides each process a local variable `ev_leader(X)` such that:
1. *(Validity)* `ev_leader(x)` always contains a process ID
2. *(Eventual leadership)* eventually, all `ev_leader(X)` of all non-crashed processes of X for ever contain the same process ID, that is one of them
REMARK: the moment in which all variables contain the same leader is unknown.
### From obstruction-freedom to non-blocking
```
NEED_HELP[1..n] : SWMR atomic R/W boolean registers init at false
need_help(i) :=
NEED_HELP[i] <- true
repeat
X <- {j : NEED_HELP[j]}
until ev_leader(X) = i # loopa finché non è lui stesso il leader (PER TUTTI)
stop_help(i) :=
NEED_HELP[i] <- false
```
**Theorem:** the contention manager just seen transforms an obstr.-free implementation into a non-blocking enriched implementation.
*Proof:*
By contr., $\exists \tau$ s.t. $\exists$ many (> 0) op.'s invoked concurrently that never terminate.
Let Q be the set of proc.'s that performed these invocations.
- by enrichment (def. at the beginning), eventually `NEED_HELP[i]=true` forever ($\forall i\in Q$)
- since crashes are fail-stop, eventually `NEED_HELP[j]` is no longer modified ($\forall j \not \in Q$)
- $\exists \tau' \geq \tau$ where all proc.'s in Q compute the same X
**Observation:** $Q \subseteq X$ (it is possible that $p_j$ sets `NEED_HELP[j]` and then fails)
By definition of $\Omega_{X}, \exists \tau'' \geq t'$ s.t. all proc.'s in Q have the same `ev_leader(X)`
- the leader belongs to Q, since
- it is involved in the contention
- it can't be failed (by definition of `ev_leader`)
- this is the only process allowed to proceed
- because run in isolation, it eventually terminates (because of obstruction freedom)
#### On implementing $\Omega$
It can be proved that there exists no wait-free implementation of $\Omega$ in an asynchronous system with atomic R/W registers and any number of crashes
- crashes are indistinguishable from long delays
- need of timing constraints
1. $\exists$ time $\tau_{1}$, time interval $\nabla$ and correct process $p_{L}$ s.t. after $\tau_{1}$ every two consecutive writes to a specific SWMR atomic R/W by $p_{L}$ are at most $\nabla$ time units apart one from the other
(in pratica esiste un processo che non fallisce che aggiorna un registro almeno ogni $\nabla$, che ci permette di essere sicuri che non sia crashato ma stia facendo le sue cose)
2. let t be an upper bound on the number of possible failing processes and f the real number of process failed (hence, $0\leq f\leq t\leq n-1$, with f unknown and t known in advance).
Then, there are at least $t-f$ correct processes different from $p_L$ with a timer s.t. $\exists$ time $\tau_{2}$ for each time interval $\delta$, if their timer is set to $\delta$ after $\tau_{2}$, it expires at least after $\delta$.
(stiamo dicendo che il timer scade sicuramente dopo $\delta$, il che ci permette di non considerare erroneamente come fallito il processo. Perché non esattamente a $\delta$? Perché è un sistema asincrono e non c'è un clock globale)
>[!warning] Remark
$\tau_{1}, \tau_{2}, \nabla$ and $p_L$ are all unknown :/
IDEA:
- `PROGRESS[1..n]` is an array of SWMR atomic registers used by procs to signal that theyre alive
- $p_{i}$ suspects $p_j$ if pi doesnt see any progress of $p_{j}$ after a proper time interval (to be guessed) set in its timer
- the leader is the least suspected process, or the one with smallest/biggest ID among the least suspected ones (if there are more than one)
- this changes in time, but not forever (can be proved, but it's not covered here)
Guessing the time duration for suspecting a process:
- `SUSPECT[i,j]` = # of times $p_i$ has suspected $p_j$
- For all k, take the t+1 minimum values in `SUSPECT[1..n , k]`
- remember: t is the max number of processes that can fail (it is known)
- so doing this we ensure considering at least a process that doesn't fail
- Sum them, to obtain $S_{k}$ (for each process k)
- The interval to use in the timers is the minimum $S_{k}$
- it can be proved that this eventually becomes ≥ $\nabla$
### From obstruction-freedom to wait-freedom
**Eventually perfect:** failure detector ♢P provides each process $p_i$ a local variable $suspected_i$ such that
1. *(Eventual completeness)* eventually, $suspected_{i}$ contains all the indexes of crashed processes, for all correct $p_i$
2. (*Eventual accuracy*) eventually, $suspected_{i}$ contains only indexes of crashed processes, for all correct $p_{i}$.
**Definition:** A failure detector FD1 is **stronger** than a failure detector FD2 if there exists an algorithm that builds FD2 from instances of FD1 and atomic R/W registers.
**Proposition:** ♢P is stronger than $\Omega_{X}$
*Proof:*
Forall i
- i ∉ X $\to$ `ev_leader_i(X)` is any ID (and may change in time)
- $i \in X \to$ `ev_leader_i(X)` $= min\left(( Π \setminus suspected_{i}) \cap X \right)$ where $Π$ denotes the set of all proc. IDs.
$\Omega_{X}$ is not stronger than ♢P (so, ♢P is strictly stronger)
The formal proof consists in showing that if $\Omega_{X}$ was stronger than ♢P, then consensus would be possible in an asynchronous system with crashes and atomic R/W registers.
(because ♢P can be used to solve consensus)
#### A contention manager for ♢P
We assume a weak timestamp generator, i.e. a function such that, if it returns a positive value t to some process, only a finite number of invocations can obtain a timestamp smaller than or equal to t
```
TS[1..n] : SWMR atomic R/W registers init at 0
need_help(i) :=
TS[i] <- weak_ts()
repeat
competing <- {j : TS[j] != 0 and j ∉ suspected_i}
⟨t,j⟩ <- min{⟨TS[x],x⟩ | x ∈ competing}
until j = i
stop_help(i) :=
TS[i] <- 0
```
##### Theorem:
the contention manager just seen transforms an obstruction-free implementation into a wait-free enriched implementation.
*Proof:*
By contradiction, $\exists$ an invocation of a correct $p_i$ that never terminates.
Let $ti$ be its timestamp.
- choose the minimum of such ⟨ti,i⟩
By constraint of weak_ts(), the set of invocations smaller than ⟨ti,i⟩ (call it I) is finite
- for every invocation ∈ I from a process $p_j$ that crashed during its execution
- $p_i$ will eventually and forever suspect $p_j$ (i.e. j ∈ `suspected_i`)
- eventually j will not be anymore in completing
- $p_{j}$ will not anymore prevent $p_i$ from proceeding
Since ⟨ti,i⟩ is the minimum index of a non-terminating invocation
- all invocations in I of correct processes terminate
- if such processes invoke need_help() again, they obtain greater indexes
- eventually I gets emptied
Since $p_i$ is correct, eventually (for all $p_k$ correct):
- i will not be anymore in $suspected_{i}$ (as $p_i$ will not crash as its correct)
- $⟨ti,i⟩ = min\{⟨TS[x],x⟩ \space|\space x \in competing_{k}\}$ (all pk will select pi as the minimum)
Hence, the invocation with index ⟨ti,i⟩ will eventually have exclusive execution, and because of obstruction freedom it eventually terminates.
#### On implementing ♢P
- every non-failed process has eventually an upper bound on the write delay
- by properly setting timers, eventually crashed processes are distinguished from the non-crashed ones by looking at the suspicions: for the crashed ones, this numbers increases indefinitely; for non-crashed ones, some reset eventually happens.