11 KiB
Given objects of type T and an object of type Z, is it possible to wait-free implement Z by using only objects of type T and atomic R/W registers?
If yes, then Z is not as essential as T (T can do everything Z can do).
But what is the type of an object?
- the set of all the possible values for states of objects of that type
- a set of operations for manipulating the object, each provided with a specification: a description of the conditions under which the operation can be invoked and the effect of the invocation
We focus on types whose operations are:
- Total: all operations can be invoked in any state of the object
- Sequentially specified: given the initial state, the behavior depends only by the sequence of operations, where the output to every operation invocation only depends on the input arguments and the invocations preceding it
- formally,
𝛿(s, op(args)) = {⟨s1,res1⟩,…, ⟨sk,resk⟩}
- it is deterministic whenever k=1, for every s and every op(args)
- è deterministica quando dato uno stato e dato un'operazione, posso raggiungere solo uno stato (se raffigurato come grafo, ci sarebbe un solo arco)
- ricorda un po' gli automi o le Turing Machine
- formally,
An object of type T_{U}
is universal if every other object can be wait-free implemented using only objects of type T_U
and atomic R/W registers.
This object is a consensus object: a one-shot object (every process can access it at most once) that provides only one operation propose(v)
such that:
- Validity: the returned value (or decided value) is one of the arguments of the propose (proposed value) in one invocation done by a process (participant)
- Integrity: every process decides at most once
- Agreement: the decided value is the same for all processes
- Wait-freedom: every invocation of propose by a correct process terminates
Conceptually, we can implement a consensus object by a register X, initialized at ⊥, for which the propose operation is atomically defined as:
propose(v) :=
if X = ⊥ then
X <- v
return X
Universality of consensus holds as folows:
- given an object O of type Z
- each participant runs a local copy of O, all initialized at the same value
- create a total order on the operations O, by using consensus objects
- force all processes to follow this order to locally simulate O
- all local copies are consistent (O can be any object, e.g. a queue)
First attempt (non wait-free)
Let us first concentrate on obj's with deterministic specifications (we will see how to handle non-determinism later on)
A process p_i
that wants to invoke operation op of object Z with arguments args locally runs:
on the foreground thread of pi:
result_i <- ⊥
invoc_i <- ⟨op(args) , i⟩
wait result_i != ⊥
return result_i
Every process locally runs also a simulation of Z (stored in the local variable Z_i
) such that:
Z_{i}
is initialized at the initial value of Z- every process performs on
Z_i
all the operations performed in Z by all processes, in the same order- need of consensus
Let CONS be an unbounded array of consensus objects and 𝜋1 and 𝜋2 respectively denote the first and the second element of a given pair (aka the first and second projection)
The local simulation of Z by p_i
is
on the background thread of pi:
k <- 0
z_i <- Z.init()
while true
if invoc_i ≠ ⊥ then # prosegue se ci sono invocazioni
k++
exec_i <- CONS[k].propose(invoc_i)
⟨z_i , res⟩ <- 𝛿(z_i, 𝜋1(exec_i))
if 𝜋2(exec_i) = i then
invoc_i <- ⊥
result_i <- res
This solution is non-blocking but not wait-free (can run forever). P.S. invoc_i, result_i e z_i sono variabili locali.
A wait-free construction
LAST_OP[1..n]:
array of SWMR atomic R/W registers containing pairs init at ⟨⊥,0⟩ ∀i- è condiviso e ogni processo ci infila le sue proposal
last_sn_i[1..n]:
local array of the last op byp_j
executed byp_i
init at 0 ∀i,j- ogni processo ha la sua copia locale
last_sn_i[j]
tiene traccia dell'ultima operazione eseguita da j
Idea: instead of just proposing my proposal, at every moment I propose all the proposals of all the processes.
or(arg) by p_i on Z
result_i <- ⊥
LAST_OP[i] <- ⟨op(args), last_sn_i[i]+1⟩
wait result_i != ⊥
return result_i
l'idea è che scrivo la mia proposal sulla shared memory, così ogni processo può leggere la mia proposal.
local simulation of Z by p_i
k <- 0
z_i <- Z.init()
while true
invoc_i <- 𝜀
∀ j = 1..n
if 𝜋2(LAST_OP[j]) > last_sn_i[j] then
# j ha fatto una nuova proposta
invoc_i.append(⟨𝜋1(LAST_OP[j]),j⟩)
# allora la aggiungo alla mia lista invoc_i
# per eventualmente proporlo come valore dopo
if invoc_i ≠ 𝜀 then
# ci sono nuove invocazioni
# allora partecipo al round
k++
exec_i <- CONS[k].propose(invoc_i)
# sto proponendo UNA LISTA di operazioni
# e propose in pratica me le mette in ordine
# così che posso eseguirle nello stesso ordine
# degli altri processi
for r=1 to |exec_i|
⟨z_i,res⟩ -> 𝛿(z_i,𝜋1(exec_i[r]))
j <- 𝜋2(exec_i[r])
last_sn_i[j]++
if i=j then
result_i <- res
As before, they are executed by each process on two threads.
Is it correct?
Lemma 1: the construction is wait free Proof: Let pi invoke Z.op(par); to prove the Lemma, we need to show that eventually result_i ≠ ⊥
It suffices to prove that ∃ k s.t. CONS[k].propose(-)
returns a list that contains ⟨“op(par)”,i⟩ and pi participates to the consensus:
- since
p_i
increases the sequence number ofLAST_OP[i]
, eventually all the lists proposed contain that element forever - eventually,
p_i
will always findinvoc_i != ε
and participates with an increasing sequence numberk_i
- let k be the first consensus where (1) holds
- by the properties of consensus, exec_i contains ⟨“op(par)”,i⟩
As shown before, every invocation is executed; we have to show that this cannot happen more than once.
Lemma 2: all operations invoked by processes are executed exactly once
If p_i
has inserted ⟨“op(par)”,sn⟩ in LAST_OP[i]
, it cannot invoke another operation until ⟨“op(par)”,sn⟩ appears in a list chosen by the consensus
- the sequence number of
LAST_OP[i]
can increase only after ⟨“op(par)”,sn⟩ has been executed
Let k be the minimum such that CONS[k]
contains ⟨“op(par)”,i⟩
Every p_{j}
that participates in the k-th consensus increases last_sn_j[i]
before calculating a new proposal invoc_j
- if
π2(LAST_OP[i])
is not changed, then the guard of the IF is false and ⟨“op(par)”,i⟩ is not appended ininvoc_j
- otherwise, we have a new invocation from
p_{i}
(but this can only happen after that ⟨“op(par)”,sn⟩ has been executed)
Lemma 3: the local copies s_{i}
of all correct processes behave the same and comply with the sequential specification of Z.
Proof:
- the processes use CONS in the same order
(CONS[1], CONS[2], …)
- every consensus object returns the same list to all processes
- all processes scan the returned list from head to tail
- then apply the same sequence of operations to their local copies (that started from the same initial value)
- everything works because of determinism!
REMARK: bounded wait freedom does not hold:
- if the background process suspends for a long time, when it wakes up it has an unbounded number of agreed lists to locally execute
- this may arbitrarily delay an operation issued before the sleep
Solution for non-deterministic specifications
If the specifications of Z’s operations are non-deterministic, then 𝛿 does not return one single possible pair after one invocation, but a set of possible choices.
How to force every process to run the very same sequence of operations on their local simulations? Three possible ways!
- Brute force solution: for every pair ⟨s , op(args)⟩, fix a priori one element of 𝛿(⟨s , op(args)⟩) to be chosen
- literally cancelling non-determinism, but this is not something we want...
- additional consensus objects, one for every element of every list
- reuse the same consensus object: for all k,
CONS[k]
not only chooses the list of invocations, but also the final state of every invocation- the proposals should also pre-calculate the next state and propose one
Binary vs Multivalued Consensus
Binary consensus: just two possible proposals (say, 0 and 1)
Multivalued consensus (with unbounded values)
IDEA:
- we have n processes and n binary consensus rounds
- at round k, all processes propose 1 if
p_k
has proposed something, 0 otherwise - if the decided value is 1, then decide $p_k$'s proposal; otherwise, go to the next round.
PROP[1..n] array of n proposals, all init at ⊥
BC[1..n] array of n binary consensus objects
mv_propose(i,v) :=
PROP[i] <- v
for k=1 to n do
if PROP[k] ≠ ⊥ then
res <- BC[k].propose(1)
else
res <- BC[k].propose(0)
if res = 1 then
return PROP[k]
wait forever
why wait forever? Because we want to prove that it is wait free, but not because the loop finishes.
Validity, Agreement, Integrity, Termination:
- Let
p_{x}
the first process that writes a proposal - every
p_i
that participates to the consensus reads the other proposals after that it has writtenPROP[i]
- all participants starts their for loops after
p_x
has written its proposal
- all participants starts their for loops after
- every
p_i
that participates to the consensus findsPROP[x] != ⊥
in their for loop BC[x]
only receives proposals equal to 1- because of validity of binary consensus,
BC[x]
returns 1 - every
p_i
that participates to the consensus receives 1 at most in its x-th iteration of the for - let y (<= x) be the first index such that
BC[y]
returns 1BC[z] = 0
for allz < y
- since all participating process invoke the binary consensus in the same order, they all decide the value proposed by
p_y
and terminate.
Multivalued consensus (with bounded values)
Let k be the number of possible proposals and h = \lceil \log k \rceil
be the number of bits needed to binary represent them (this value is known to all processes).
IDEA: decide bit by bit the final outcome
PROP[1..n][1..h] array of n h-bits proposals, all init at ⊥
BC[1..h] array of h binary consensus objects
bmv_propose(i,v) :=
PROP[i] <- binary_repr_h(v) # scrivo la rappresentazione binaria di v in PROP
for k=1 to h do
P <- {PROP[j][k] | PROP[j]≠⊥ ∧ PROP[j][1..k-1]=res[1..k-1]}
let b be an element of P
res[k] <- BC[k].propose(b)
return value(res)
(exercise: remove ∧ PROP[j][1..k-1]=res[1..k-1]
and show an execution in which a consensus property can be broken)
Proof:
- Wait freedom: trivial
- Integrity: trivial
- Agreement: by agreement of the h binary consensus objects
- Validity: for all k, we prove that, if
dec
is the decided value, then there exists a j such thatp_j
is participating (PROP[j] ≠ ⊥
) anddec[1...k] = PROP[j][1...k]
- by construction, P contains the k-th bits of the proposals whose first (k-1) bits coincide with the first k-1 bits decides so far:
- for every b ∈ P, there exists a j such that
PROP[j] ≠ ⊥
, PROP[j][1..k-1] = dec[1..k-1]
andPROP[j][k] = b
- for every b ∈ P, there exists a j such that
- whatever b ∈ P is selected in the k-th binary consensus, there exists a j such that
PROP[j] ≠ ⊥
andPROP[j][1..k] = dec[1..k]
- by taking
k = h
, we can conclude.
- by construction, P contains the k-th bits of the proposals whose first (k-1) bits coincide with the first k-1 bits decides so far: