master-degree-notes/Concurrent Systems/notes/8 - Enhancing Liveness Properties.md

6 KiB
Raw Blame History

Can we take the most basic protocol that satisfies the most basic liveness property (obstruction freedom) and "upgrade" it to bounded wait freedom?

Contention manager: is an object that allows progress of processes by providing contention-free periods for completing their invocations. It provides 2 operations:

  • need_help(i): invoked by p_i when it discovers that there is contention
  • stop_help(i): invoked by p_{i} when it terminates its current invocation

Enriched implementation: when a process realizes that there is contention, it invokes need_help; when it completes its current operation, it invokes stop_help.

[!question] Why is it different from lock/unlock? Because this allows failures, and they can also happen in the contention-free period.

PROBLEM: to distinguish a failure from a long delay, we need objects called failure detectors (FDs), that provide processes information on the failed processes of the system. According to the type/quality of the info, several FDs can be defined.

Eventually restricted leadership: given a non-empty set of process IDs X, the failure detector \Omega_{X} provides each process a local variable ev_leader(X) such that:

  1. (Validity) ev_leader(x) always contains a process ID
  2. (Eventual leadership) eventually, all ev_leader(X) of all non-crashed processes of X for ever contain the same process ID, that is one of them

REMARK: the moment in which all variables contain the same leader is unknown.

From obstruction-freedom to non-blocking

NEED_HELP[1..n] : SWMR atomic R/W boolean registers init at false

need_help(i) :=
	NEED_HELP[i] <- true
	repeat
		X <- {j : NEED_HELP[j]}
	until ev_leader(X) = i # loopa finché non è lui stesso il leader (PER TUTTI)

stop_help(i) :=
	NEED_HELP[i] <- false

Theorem: the contention manager just seen transforms an obstr.-free implementation into a non-blocking enriched implementation.

Proof: By contr., \exists \tau s.t. \exists many (> 0) op.'s invoked concurrently that never terminate. Let Q be the set of proc.'s that performed these invocations.

  • by enrichment (def. at the beginning), eventually NEED_HELP[i]=true forever (\forall i\in Q)
  • since crashes are fail-stop, eventually NEED_HELP[j] is no longer modified (\forall j \not \in Q)
    • \exists \tau' \geq \tau where all proc.'s in Q compute the same X

Observation: Q \subseteq X (it is possible that p_j sets NEED_HELP[j] and then fails)

By definition of \Omega_{X}, \exists \tau'' \geq t' s.t. all proc.'s in Q have the same ev_leader(X)

  • the leader belongs to Q, since
    • it is involved in the contention
    • it can't be failed (by definition of ev_leader)
  • this is the only process allowed to proceed
  • because run in isolation, it eventually terminates (because of obstruction freedom)

On implementing \Omega

It can be proved that there exists no wait-free implementation of \Omega in an asynchronous system with atomic R/W registers and any number of crashes

  • crashes are indistinguishable from long delays
  • need of timing constraints
  1. \exists time \tau_{1}, time interval \nabla and correct process p_{L} s.t. after \tau_{1} every two consecutive writes to a specific SWMR atomic R/W by p_{L} are at most \nabla time units apart one from the other (in pratica esiste un processo che non fallisce che aggiorna un registro almeno ogni \nabla, che ci permette di essere sicuri che non sia crashato ma stia facendo le sue cose)

  2. let t be an upper bound on the number of possible failing processes and f the real number of process failed (hence, 0\leq f\leq t\leq n-1, with f unknown and t known in advance). Then, there are at least t-f correct processes different from p_L with a timer s.t. \exists time \tau_{2} for each time interval \delta, if their timer is set to \delta after \tau_{2}, it expires at least after \delta.

REMARK: \tau_{1}, \tau_{2}, \nabla and p_L are all unknown.

IDEA:

  • PROGRESS[1..n] is an array of SWMR atomic registers used by procs to signal that theyre alive
  • pi suspects pj if pi doesnt see any progress of pj after a proper time interval (to be guessed) set in its timer
  • the leader is the least suspected process, or the one with smallest/biggest ID among the least suspected ones (if there are more than one)
    • this changes in time, but not forever

Guessing the time duration for suspecting a process:

  • SUSPECT[i,j] = #times pi has suspected pj
  • For all k, take the t+1 minimum values in SUSPECT[1..n , k]
  • Sum them, to obtain Sk
  • The interval to use in the timers is the minimum Sk
    • it can be proved that this eventually becomes ≥ \nabla

From obstruction-freedom to wait-freedom

Eventually perfect: failure detector ♢P provides each process p_i a local variable suspected_i such that

  1. (Eventual completeness) eventually, suspended_{i} contains all the indexes of crashed processes, for all correct p_i
  2. (Eventual accuracy) eventually, suspected_{i} contains only indexes of crashed processes, for all correct p_{i}.

Definition: A failure detector FD1 is stronger than a failure detector FD2 if there exists an algorithm that builds FD2 from instances of FD1 and atomic R/W registers.

Proposition: ♢P is stronger than \Omega_{X} Proof: Forall i

  • i ∉ X \to ev_leader_i(X) is any ID (and may change in time)
  • i \in X \to ev_leader_i(X) = min\left(( Π \setminus suspected_{i}) \cap X \right) where Π denotes the set of all proc. IDs.

\Omega_{X} is not stronger than ♢P (so, ♢P is strictly stronger)

The formal proof consists in showing that if \Omega was stronger than ♢P, then consensus would be possible in an asynchronous system with crashes and atomic R/W registers.

From obstruction-freedom to wait-freedom

We assume a weak timestamp generator, i.e. a function such that, if it returns a positive value t to some process, only a finite number of invocations can obtain a timestamp smaller than or equal to t

TS[1..n] : SWMR atomic R/W registers init at 0

...