forked from efficient/paper_skel
-
Notifications
You must be signed in to change notification settings - Fork 1
/
equivalence.tex
366 lines (279 loc) · 29.9 KB
/
equivalence.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
\section{External Equivalence Proof}
\label{sec:equivalence}
In this section, we first define some preliminaries and then use them to show
that an application built atop a linearizable service can be transformed to take
advantage of the concurrency offered by \multidispatch{} linearizability such that
the two versions of the application are externally equivalent. Our formalism
closely aligns with that of prior work~\cite{helt2021rss}, which leverages the
formalism of I/O automata~\cite{lynch1987ioa,lynch1996da}.
\subsection{Preliminaries}
\label{sec:equivalence:preliminaries}
\subsubsection{I/O Automata}
\label{sec:equivalence:preliminaries:ioa}
We model each component in our system as an I/O automaton
(IOA)~\cite{lynch1987ioa,lynch1996da}, a type of state machine. Each
transition of an IOA is labeled with an \textit{action}, which can be an
\textit{input}, \textit{output}, or \textit{internal} action. Input and output
actions allow the automaton to interact with other IOA and the environment. We
assume input actions are not controlled by an IOA---they may arrive at any time.
Conversely, output and internal actions are \textit{locally controlled}---an IOA defines
when they can be performed.
To specify an IOA, we must first specify its \textit{signature}. A signature $S$ is
a tuple comprising three disjoint sets of actions: input actions $\actin(S)$,
output actions $\actout(S)$, and internal actions $\actint(S)$. We also define
$\localacts(S) = \actout(S) \cup \actint(S)$ as the set of locally controlled actions, $\extacts(S) = \actin(S) \cup \actout(S)$ as the set of external actions, and $\acts(S) = \actin(S) \cup \actout(S) \cup \actint(S)$ as the set of all actions.
Formally, an I/O automaton $A$ comprises four items:
\begin{enumerate}
\item a signature $\sig(A)$,
\item a (possibly infinite) set of states $\states(A)$,
\item a set of start states $\start(A) \subseteq \states(A)$, and
\item a transition relation $\trans(A) \subseteq \states(A) \times
\acts(\sig(A)) \times \states(A)$.
\end{enumerate}
Since inputs may arrive at any time, we assume that for every state $s$ and
input action $\pi$, there is some $(s, \pi, s^\prime) \in \trans(A)$.
An \textit{execution} of an I/O automaton $A$ is a finite or infinite
sequence of alternating states and actions $s_0,\pi_1,s_1,\ldots$ such that for
each $i \geq 0$, $(s_i, \pi_{i+1}, s_{i+1}) \in \trans(A)$ and $s_0 \in \start(A)$. Finite executions always end with a state.
Given an execution $\alpha$, we can also define
its \textit{schedule} $\sched(\alpha)$, which is the subsequence of just
the actions in $\alpha$. Similarly, an execution's \textit{trace}
$\trace(\alpha)$ is the subsequence of just the external actions $\pi \in
\extacts(A)$.
\subsubsection{Composition and Projection}
\label{sec:equivalence:preliminaries:composition}
To compose
two IOA, they must be \textit{compatible}. Formally, a finite set of
signatures $\{S_i\}_{i \in I}$ is compatible if for all $i,j \in I$ such that $i \neq j$:
\begin{enumerate}
\item $\actint(S_i) \cap \acts(S_j) = \emptyset$, and
\item $\actout(S_i) \cap \actout(S_j) = \emptyset$.
\end{enumerate}
A finite set of automata are compatible if their signatures are compatible.
Given a set of compatible signatures, we define their \textit{composition} $S =
\prod_{i \in I} S_i$ as the signature with $\actin(S) = \bigcup_{i \in I}
\actin(S_i) - \bigcup_{i \in I} \actout(S_i)$, $\actout(S) = \bigcup_{i \in I}
\actout(S_i)$, and $\actint(S) = \bigcup_{i \in I} \actint(S_i)$.
The composition of a set of compatible IOA yields the automaton $A =
\prod_{i \in I} A_i$ defined as follows:
\begin{enumerate}
\item $\sig(A) = \prod_{i \in I} \sig(A_i)$,
\item $\states(A) = \prod_{i \in I} \states(A_i)$,
\item $\start(A) = \prod_{i \in I} \start(A_i)$, and
\item $\trans(A)$ contains all $(s,\pi,s^\prime)$ such that for all $i \in I$, if $\pi \in \acts(\sig(A_i))$, then $(s_i,\pi,s_i^\prime) \in \trans(A_i)$ and otherwise, $s_i = s_i^\prime$.
\end{enumerate}
The states of the composite automaton $A$ are vectors of the states of the composed
automata. When an action occurs in
$A$, all of the component automata with that action each take a step simultaneously, as defined
by their individual transition relations. The resulting state differs in each of
the components corresponding to these automata, and the other components are
unchanged. We denote the composition of a small number of
automata using an infix operator, e.g., $A \times B$.
Given the execution $\alpha$ of a composed automaton $A = \prod_{i \in I} A_i$,
we can \textit{project} the execution onto one of the component automata $A_i$.
The execution $\alpha|A_i$ is found by removing all actions from
$\alpha$ that are not actions of $A_i$. The states of $\alpha|A_i$ are
given by the $i$th component of the corresponding state in $\alpha$. The
projection of a trace is defined similarly.
Further, we can write the projection of a state $s$ of $A$ on $A_i$ as $s|A_i$.
Finally, we can also project $\trace(\alpha)$ onto a set of actions $\Pi$ where
$\trace(\alpha)|\Pi$ yields the subsequence of $\trace(\alpha)$ containing only actions in $\Pi$.
\subsubsection{Channels}
\label{sec:equivalence:preliminaries:channels}
\begin{table}[!t]
\centering
\begin{tabular}{l l l}
\multicolumn{3}{l}{\textbf{Signature:}} \\ \hline
Inputs: & & Outputs: \\
$\sendto(m)_{ij}, m \in M$ & & $\sent_{ij}$ \\
$\recvfrom_{ij}$ & & $\receive(m)_{ij}, m \in M$ \\ \hline
\\
\multicolumn{3}{l}{\textbf{States:}} \\ \hline
\multicolumn{3}{l}{$Q$, a FIFO queue of elements of $M$, initially empty} \\
\multicolumn{3}{l}{$e$, a Boolean, initially \textit{false}} \\
\multicolumn{3}{l}{$r$, a Boolean, initially \textit{false}} \\ \hline
\\
\multicolumn{3}{l}{\textbf{Actions:}} \\ \hline
\multicolumn{2}{l|}{$\sendto_{ij}(m)$:} & $\sent_{ij}$: \\
\multicolumn{2}{l|}{Precondition: \textit{true}} & Precondition: $e$ \\
\multicolumn{2}{l|}{Effect: $\textsc{Push}(Q,m); e \gets \textit{true}$} & Effect: $e \gets \textit{false}$ \\
\hline \hline
$\recvfrom_{ij}$: & \multicolumn{2}{|l}{$\receive_{ij}(m)$:} \\
Precondition: \textit{true} & \multicolumn{2}{|l}{Precondition: $r \land m = \text{Head}(Q)$} \\
Effect: $r \gets \textit{true}$ & \multicolumn{2}{|l}{Effect: $\textsc{Pop}(Q); r \gets \textit{false}$} \\ \hline
\end{tabular}
\captionof{figure}{Buffered Channel I/O Automaton}
\label{fig:buffered-channel-ioa}
\end{table}
Each pair of processes in our system communicates via a pair of FIFO
channels that are asynchronous, reliable, and buffered. Each
channel's signature, states, and actions are specified in
Figure~\ref{fig:buffered-channel-ioa}.
We denote the channel that process $i$ uses to send messages to process $j$ as
$C_{ij}$. $C_{ij}$ has two sets of input actions, $\sendto_{ij}(m)$ and
$\recvfrom_{ij}$, and two sets of output actions, $\sent_{ij}$ and
$\receive_{ij}(m)$, for all $m$ in some space of messages $M$. Process $i$ has
corresponding output actions, $\sendto_{ij}(m)$ and $\recvfrom_{ji}$, and input
actions, $\sent_{ij}$ and $\receive_{ji}(m)$, for all other processes $j$ and
messages $m$. To send a message to process $j$, process $i$ takes a
$\sendto_{ij}$ step, and $C_{ij}$ subsequently takes a $\sent_{ij}$ step.
Similarly, to receive a message from $C_{ij}$, process $j$ takes a
$\recvfrom_{ij}$ step, and $C_{ij}$ subsequently takes a $\receive_{ij}(m)$ step.
The modeling of buffering in the channels differs from past
work~\cite{lynch1996da}. There, receive-from actions are omitted, and received
actions are modeled as output actions of channels and corresponding input
actions of processes. This implies processes cannot control when they
change their state in response to a message.
But in real applications, this is unrealistic. Although the network stack of a
machine may accept and process a packet at any time, application code controls
when it processes the contained message. For instance, the packet's contents remain in a kernel buffer until the application performs a read system call
on a network socket. This control is essential to our proof as it ensures
application processes do not receive messages while waiting for responses from
services.
We say an execution $\alpha$ of a channel $C_{ij}$ is \textit{well-formed} if
(1) $\trace(\alpha) | \{\sendto_{ij}(m)\}_{m \in M} \cup \{\sent_{ij}\}$ is a
sequence of alternating send-to and sent actions, starting with a send-to; and
similarly (2) $\trace(\alpha) | \{\recvfrom_{ij}\} \cup \{\receive_{ij}(m)\}_{m
\in M}$ is a sequence of alternating receive-from and received actions,
starting with a receive-from.
\subsubsection{Types and Services}
\label{sec:equivalence:preliminaries:services}
Processes in our system interact with \textit{services}, each with a specified
\textit{type}~\cite{herlihy1990linearizability,lynch1996da}. A service's type
$\type$ defines its set of possible \textit{values} $\vals(\type)$, an
\textit{initial value} $v_0 \in \vals(\type)$, and the \textit{operations}
$\ops(\type)$ that can be invoked on the service. Each operation $o \in
\ops(\type)$ is defined by a pair of sets of actions: \textit{invocations} $\invs(o)$ and
\textit{responses} $\resps(o)$. Each contains subscripts denoting a unique service name and a process index. An invocation and response \textit{match} if their subscripts are equal.
Finally, each service has a \textit{sequential specification} $\spec$, a
prefix-closed set of sequences of matching invocation-response pairs.
For example, consider a read/write register $x$ that supports a set of $n$
processes and whose values is the set of integers. The read operation would then
be defined with invocations $\{\textsc{read}_{i,x}\}$ and responses
$\{\textsc{ret}_{i,x}(j)\}$ for all $0 \leq i \leq n$ and $j \in \mathcal{N}$.
Similarly, the write operation would have invocations
$\{\textsc{write}_{i,x}(j)\}$ and responses $\{\textsc{ack}_{i,x}\}$. Finally,
its sequential specification would be the set of all sequences of reads and
writes such that reads return the value written by the most recent write or the
initial value if none exists.
\subsubsection{System Model}
\label{sec:equivalence:preliminaries:model}
We model a distributed application as the composition of two finite sets of I/O
automata: processes and channels. $n$ denotes the number of processes, so there are $n^2$ channels. The processes execute the application's code by performing local computation, exchanging messages via channels, and performing invocations on and receiving responses from services.
In the results below, we are interested in reasoning about the processes' external behaviors while assuming various correctness conditions of the services they interact with. Thus, we do not model services as IOA. Instead, we assume the processes interact with a service with an arbitrary type $\type$ and $\spec$ defined for $n$ processes.
For each operation $o \in \ops(\type)$, process $P_i$ is then assumed to have an output action for every invocation action in $\invs(o)$ with process index $i$. Similarly, $P_i$ has an input action for every response action in $\resps(o)$ with process index $i$. We refer to these input and output actions as a process's \textit{system-facing} actions $\systemf(P_i)$.
To model stop failures, we assume each process $P_i$ has an input
action $\textit{stop}_i$ such that after receiving it, $P_i$ ceases taking
steps. If $\textit{stop}_i$ occurs while $P_i$ is waiting for a response from a service, then we assume the service does not return a response. But the operation may still cause a service's state to change, and this change may be visible to operations by other processes.
Finally, to allow the distributed application to receive input from and return values to its environment (e.g., users), we assume each $P_i$ has a set of \textit{user-facing} actions $\user(P_i)$. Similar to a process's interactions with services, a user's interaction with a process is modeled through input-output pairs of user-facing actions.
We make two final assumptions about the processes: First, we assume that while
each process has access to a local clock, which is part of its state, and may
set local timers, which are internal actions, the process makes no assumptions
about the drift or skew of its clock relative to others. Second, processes only
invoke an operation on a service when they have no outstanding send-to or
receive-from actions at any channels.
Let $P = \prod_{i \in I} P_i$ be the composition of the $n$ processes and
$C = \prod_{1 \leq i \leq n} \prod_{1 \leq j \leq n} C_{ij}$ be the composition of $n^2$ channels.
Let $\alpha$ be an execution of the distributed application $P \times C$.
$\alpha$ is (single-dispatch) \textit{well-formed} if it satisfies three criteria:
First, for all $P_i$, $\trace(\alpha) | \systemf(P_i)$ must be a sequence of alternating invocation and matching response actions, starting with an invocation. Second, for all $C_{ij}$, $\alpha | C_{ij}$ is well-formed. Third, for all $P_i$, $P_i$ does not take an output step while waiting for a response from some service. We assume a single-dispatch application $A$ only allows well-formed executions.
\subsubsection{External Equivalence}
\label{sec:equivalence:preliminaries:equivalence}
Two executions, $\alpha$ and $\beta$, are
\textit{externally equivalent} if $\alpha|U = \beta|U$. Intuitively,
externally equivalent executions are indistinguishable to users.
\subsubsection{Transformation}
\label{sec:equivalence:preliminaries:transform}
A single-dispatch application $A$ must be transformed to take advantage of the benefits offered by a multi-dispatch system. We use the notation $A^\prime = \transform(A)$ to denote applying the transformation to each process within $A$, i.e., $A^\prime = \prod_i P^\prime_i = \prod_i \transform(P_i)$.
We define $\transform$ as follows: We first replace all synchronous, system-facing I/O with futures. This allows operation invocations and responses to be reordered in $A$ with respect to other actions (i.e., instructions). The aim is to then move operation invocations earlier such that the application can issue multiple operations concurrently and reap the performance benefits of an MD-Linearizable system.
To produce $A^\prime$, we then move actions in $A$ before prior actions provided $A^\prime$ maintains the following:
\textbf{(R1)} the order of data-dependent actions within each process of $A$,
\textbf{(R2)} the control flow of each process in $A$,
\textbf{(R3)} the issue order of $A$’s operations,
\textbf{(R4)} the order of non-system-facing external actions in $A$, including messages to/from users and between processes,
\textbf{(R5)} the order between a non-system-facing external actions and operation invocations (in particular, the first invocation in a sequence of operations), and
\textbf{(R6)} the order between non-system-facing external actions and operation responses (e.g., by waiting for the successful responses of any outstanding operations before sending a message to a user).
\subsection{Equivalence Proof}
\label{sec:equivalence:proof}
\begin{thm}
Let $A^\prime = \transform(A)$. Suppose $\alpha^\prime$ is a finite execution of $A^\prime \times C$ that satisfies Multi-dispatch Linearizability. Then there exists a single-dispatch well-formed, finite execution $\alpha$ of $A \times C$ that satisfies (single-dispatch) Linearizability such that $\alpha^\prime | U = \alpha | U$.
\end{thm}
\begin{proof}
Let $A^\prime$, $A$, and $C$ be defined as above, and let $\alpha^\prime$ be an arbitrary finite execution of $A^\prime \times C$ that satisfies Multi-dispatch Linearizability.
The proof proceeds in three steps: First, we create an execution $\beta^\prime$ from $\alpha^\prime$ by essentially undoing the parallelizing transformation in each process's sub-execution. Second, we produce $\beta$ from $\beta^\prime$ by removing any incomplete or failed operations. Third, we produce $\alpha$ from $\beta$ by fixing the order of actions across processes to ensure $\alpha$ is well-formed and Linearizability.
By the assumption that $A$ only produces single-dispatch well-formed executions and rules \textbf{R4}, \textbf{R5}, and \textbf{R6} of $\transform$, $\alpha^\prime$ is multi-dispatch well-formed, so each $\alpha^\prime | P_i$ comprises sub-sequences of local and system-facing actions separated by non-system-facing external actions. Further, by \textbf{R4}, the order of the non-system-facing external actions in each $\alpha^\prime | P_i$ respects the program order of the original $A$.
We first reorder the actions in each sub-sequence of each $\alpha^\prime | P_i$ to place the local and system-facing actions such that they respect $A$'s program order. More precisely, for each $\alpha^\prime | P_i$, starting with the right-most action $\pi_1$, shift $\pi_1$ left in $\alpha^\prime$ until it precedes all other actions $\pi_2$ such that $A$'s program order requires it. Let $\beta^\prime$ be the resulting execution.
By assumption, $\alpha^\prime$ satisfies MDL, so let $<_S$ be a total order over the operations in $\alpha^\prime$ (and thus $\beta^\prime$). Observe that by the process above, each $\beta^\prime | P_i$ now issues its operations sequentially (since $A$'s program order requires it). Further, by the definition of MDL (in particular, its suffix-closed failure semantics) all of a process's completed operations precede any incomplete or failed ones. For each $\beta^\prime | P_i$, remove any trailing system-facing actions corresponding to operations that are either failed or not present in $S$ as well as any trailing local actions to ensure it remains a valid process sub-execution. Let $\beta$ be the resulting execution.
Since the order of the non-system-facing external actions in each $\alpha^\prime | P_i$ already respects the program order of the original $A$, these actions are not reordered by the steps above. Thus, the order of non-system-facing actions in $\beta^\prime | P_i$ and thus $\beta | P_i$ also respects $A$’s program order.
But $\beta$ may not satisfy Linearizability. To remedy this and produce the needed $\alpha$, we next define an order over the actions in $\beta$ that combines the order of each process’s (now single-dispatch) sub-execution in $\beta$ with the total order of operations guaranteed by MDL from the original $\alpha^\prime$. To ensure $\alpha$ is single-dispatch well-formed, the order also preserves the order of send-to and receive-from actions used to transmit messages between processes and users.
Let $\prec_\beta$ be the transitive closure of the following pairs of actions $(\pi_i, \pi_j)$ in $\beta$:
\begin{enumerate}
\item $\mathbf{\prec_{P_i}}$\textbf{:} $\pi_i$ precedes $\pi_j$ in some $\beta | P_i$,
\item $\mathbf{\prec_S}$\textbf{:} for each adjacent pair of operations $o_k$, $o_{k+1}$ in $S$, $\pi_i$ is the response action of $o_k$ and $\pi_j$ is the invocation action of $o_{k+1}$,
\item $\mathbf{\prec_M}$\textbf{:} $\pi_i$ is a send action for some message $M$ (from a user to a process, a process to a user, or between two processes) and $\pi_j$ is its corresponding receive action, or
\item $\mathbf{\prec_U}$\textbf{:} $\pi_i$ precedes $\pi_j$ in $\beta | U$.
\end{enumerate}
We posit $\prec_\beta$ defines an irreflexive partial order over the actions in $\beta$. Clearly it is transitive; we now prove it is acyclic and irreflexive.
For sake of contradiction, assume $\prec_\beta$ contains at least one cycle. Let $\pi_1 \prec_\beta \pi_2 \prec_\beta \ldots \prec_\beta \pi_1$ be a shortest cycle that does not contain any transitive edges.
We begin by proving an important lemma (Lemma~\ref{lemma:helper2}) regarding sequences of actions connected by $\prec_\beta$. We use this lemma in the remainder of the proof.
\begin{lem}
Let $\pi_1 \prec_\beta \ldots \prec_\beta \pi_m$ be a sequence of $m$ actions connected only by $\prec_{P_i}$ and $\prec_S$ edges, and further assume $\pi_1$ and $\pi_m$ are send-to or receive-from actions. Then $\pi_1$ precedes $\pi_m$ in $\beta$.
\label{lemma:helper1}
\end{lem}
\begin{proof}
We proceed by cases:
$\mathbf{m=2}$\textbf{.} Since they are send-to or receive-from actions, $\pi_1$ and $\pi_2$ must be connected by a $\prec_{P_i}$ edge, so $\pi_1$ precedes $\pi_2$ by the definition of $\prec_{P_i}$.
$\mathbf{m=3}$\textbf{.} By similar reasoning, the two edges must be $\prec_{P_i}$ edges for some $P_i$, so $\pi_1$ precedes $\pi_3$ again by the definition of $\prec_{P_i}$.
$\mathbf{m \geq 4}$\textbf{.} By the same reasoning, the conclusion is true if all of the edges are $\prec_{P_i}$ for some $P_i$, so assume there is at least one $\prec_S$ edge.
Then the sequence has the following structure:
$\pi_1 \prec_{P_1} \ldots \prec_{P_1} \pi_i \prec_S \pi_{i+1} \prec_{P_2} \ldots \prec_{P_2} \pi_j \prec_S \pi_{j+1} \ldots \pi_k \prec_S \pi_{k+1} \prec_{P_n} \ldots \prec_{P_n} \pi_m$.
Note the first and final edges cannot be $\prec_S$ edges because by assumption $\pi_1$ and $\pi_m$ are send-to or receive-from actions.
By the definition of $\prec_S$, $\pi_i$ is a response action for some operation invoked by $P_1$. By the definition of $\prec_{P_1}$, $\pi_1$ precedes $\pi_i$ in $\beta | P_1$. Since $\alpha^\prime$ is multi-dispatch well-formed and $A$, by assumption, only produces single-dispatch well-formed executions, the reordering above to produce $\beta^\prime$ and thus $\beta$ must have maintained the relative order of $\pi_1$ (a send-to or receive-from action) and any subsequent invocation or response actions at $P_1$. It follows that since $\pi_1$ precedes $\pi_i$ in $\beta | P_1$, it must also precede $\pi_i$’s corresponding invocation $\inv(\pi_i)$ in $\beta | P_1$. Since the relative order of $\pi_1$ and $\inv(\pi_i)$ is the same in $\alpha^\prime$ and $\beta$, this implies $\pi_1$ must also precede $\inv(\pi_i)$ in $\alpha^\prime | P_1$.
By the definition of $\prec_S$, $\prec_{k+1}$ is an invocation action at $P_n$. By similar reasoning as above, we can conclude that $\pi_{k+1}$’s corresponding response $\resp(\pi_{k+1})$ must precede $\pi_m$ in $\beta | P_n$ and thus $\alpha^\prime | P_n$. Thus, we have $\pi_1$ precedes $\inv(\pi_i)$ and $\resp(\pi_{k+1})$ precedes $\pi_m$ in $\alpha^\prime$.
Now consider $\pi_{i+1} \prec_{P_2} \ldots \prec_{P_2} \pi_j$. $\pi_{i+1}$ is an invocation, and $\pi_j$ is a response. By the definition of $\prec_{P_2}$, $\pi_{i+1}$ precedes $\pi_j$ in $\beta | P_2$. Further, $P_2$’s system interactions occur sequentially in $\beta$, so $\pi_{i+1}$ must also precede $\pi_j$’s corresponding invocation in $\beta | P_2$. By \textbf{R3}, the transformation between $A$ and $A^\prime$ preserves each process’s issue order, so this implies $\pi_{i+1}$ must precede $\pi_j$’s corresponding invocation in $\alpha^\prime | P_2$.
By the definition of MDL, the operations in $S$ respect this issue order, too, so $\op(\pi_{i+1})$ is ordered in $S$ before $\op(\pi_j)$. We can apply the same reasoning to conclude the same about any similar sub-sequence. For example, we can conclude $\op(\pi_{j+1})$ is ordered in $S$ before $\op(\pi_k)$.
Combined with $\pi_i \prec_S \pi_{i+1}$, $\pi_j \prec_S \pi_{j+1}$, and so on, we can thus conclude that $\op(\pi_i)$ is ordered in $S$ before $\op(\pi_{k+1})$. Then it must be the case that $\inv(\pi_i)$ precedes $\resp(\pi_{k+1})$ in $\alpha^\prime$ because otherwise, by the real-time guarantees of MDL, $\op(\pi_{k+1})$ would be ordered in $S$ before $\op(\pi_i)$. Combined with the facts that $\pi_1$ precedes $\inv(\pi_i)$ and $\resp(\pi_{k+1})$ precedes $\pi_m$ in $\alpha^\prime$, we can thus conclude $\pi_1$ precedes $\pi_m$ in $\alpha^\prime$.
By this fact and the reasoning above that the order of non-system-facing external actions is preserved between $\alpha^\prime$ and $\beta$, $\pi_1$ precedes $\pi_m$ in $\beta$.
\end{proof}
\begin{lem}
Let $\pi_1 \prec_\beta \ldots \prec_\beta \pi_m$ be a sequence of $m$ actions connected by any of $\prec_{P_i}$, $\prec_S$, $\prec_M$, and $\prec_U$ edges, and further assume $\pi_1$ and $\pi_m$ are send-to or receive-from actions. Then $\pi_1$ precedes $\pi_m$ in $\beta$.
\label{lemma:helper2}
\end{lem}
\begin{proof}
Note that Lemma~\ref{lemma:helper1} proves the case where there are no $\prec_U$ or $\prec_M$ edges, so assume there is at least one such edge.
Observe that both $\prec_U$ and $\prec_M$ edges require the adjoining actions to be send-to or receive-from actions. The sequence thus comprises a set of sub-sequences connected by only $\prec_{P_i}$ and $\prec_S$ edges, starting and ending with a send-to or receive-from action, and which are joined by either a $\prec_U$ or $\prec_M$ edge.
Next observe that by definition, $\pi_i \prec_U \pi_{i+1}$ implies $\pi_i$ precedes $\pi_{i+1}$ in $\beta$. Similarly, by the reasoning above that $\alpha^\prime$ is multi-dispatch well-formed, the fact that $\beta$ preserved the order of non-system-facing actions in $\alpha^\prime$, and the definition of $\prec_M$, $\pi_i \prec_M \pi_{i+1}$ also implies $\pi_i$ precedes $\pi_{i+1}$ in $\beta$.
Combining these two facts with repeated application of Lemma~\ref{lemma:helper1} (to each subsequence mentioned above), we conclude $\pi_1$ precedes $\pi_m$ in $\beta$.
\end{proof}
We now continue the proof the main theorem. Recall we have a shortest cycle
$\pi_1 \prec_\beta \pi_2 \prec_\beta \ldots \prec_\beta \pi_1$ that does not contain any transitive edges,
and we aim to derive a contradiction. We note three properties of this cycle:
\begin{enumerate}
\item The cycle contains at least two actions. This follows from the definitions of $\prec_{P_i}$, $\prec_S$, $\prec_M$, and $<_U$.
\item The cycle does not contain two consecutive $\prec_{P_i}$ edges nor two consecutive $\prec_U$ edges. Otherwise, a shorter cycle exists since $\prec_{P_i}$ and $\prec_U$ are transitive.
\item The cycle does not contain two consecutive $\prec_M$ edges nor two consecutive $\prec_S$. This follows from the their definitions.
\end{enumerate}
Based on these three properties, we can finish the proof by cases:
\noindent \textbf{The cycle contains at least one $\mathbf{\prec_M}$ edge.}
The cycle can be written as $\pi_1 \prec_M \pi_2 \prec_\beta \ldots \prec_\beta \pi_m \prec_\beta \pi_1$ by re-indexing the actions as needed. By the observation that $\alpha^\prime$ is multi-dispatch well-formed, the fact that $\beta$ preserved the order of non-system-facing actions in $\alpha^\prime$, and the definition of $\pi_1 \prec_M \pi_2$, $\pi_1$ must precede $\pi_2$ in $\beta$.
But the remainder of the cycle $\pi_2 \prec_\beta \ldots \prec_\beta \pi_1$ is a sequence of actions connected by $\prec_{P_i}$, $\prec_S$, $\prec_M$, and $\prec_U$ edges starting with a receive-from action ($\pi_2$) and ending with a send-to action ($\pi_1$). Thus, by Lemma~\ref{lemma:helper2}, $\pi_2$ precedes $\pi_1$ in $\beta$, a contradiction.
\noindent \textbf{The cycle contains at least one $\mathbf{\prec_U}$ edge.}
We employ similar reasoning as in the case above. The cycle can be written as $\pi_1 \prec_U \pi_2 \prec_\beta \ldots \prec_\beta \pi_m \prec_\beta \pi_1$. By the definition of $\prec_U$, $\pi_1$ precedes $\pi_2$ in $\beta$.
Since users interact with processes using channels, $\pi_2$ and $\pi_1$ are either receive-from or send-to actions, so by Lemma~\ref{lemma:helper2}, we again reach a contradiction.
\noindent \textbf{The cycle contains neither $\mathbf{\prec_M}$ nor $\mathbf{\prec_U}$ edges.}
By properties 2 and 3 above, the cycle comprises alternating $\prec_{P_i}$ and $\prec_S$ edges. Also, since they alternate with $\prec_S$ edges, for each $\pi_i \prec_{P_i} \pi_{i+1}$, we know $\pi_i$ is an invocation and $\pi_{i+1}$ is a response.
By the definition of $\prec_{P_i}$, $\pi_i$ precedes $\pi_{i+1}$ in $\beta | P_i$. Since each process’s system interactions occur sequentially in $\beta$, $\pi_i$ must also precede $\pi_{i+1}$'s corresponding invocation.
Since the transformation between $A$ and $A^\prime$ preserves each process’s issue order and so does Multi-dispatch Linearizability, this implies $\op(\pi_i)$ is ordered in $S$ before $\op(\pi_{i+1})$.
As a result, for each $\pi_i \prec_{P_i} \pi_{i+1}$ in our cycle, we can replace it with a sequence of $\prec_{P_i}$ and $\prec_S$ edges that includes the invocations and responses of all operations in $S$ between $\op(\pi_i)$ and $\op(\pi_{i+1})$, if any. This (no longer shortest) cycle implies there is a cycle over operations in $S$. But this contradicts the definition of $S$, which is a total order.
Thus, $\prec_\beta$ is acyclic. Combined with the definitions of $\prec_{P_i}$, $\prec_S$, $\prec_M$, and $\prec_U$, this also shows it is irreflexive.
Let $\alpha$ be a topological sort of $\prec_\beta$ on the actions in $\beta$. To conclude the proof, we show $\alpha$ is single-dispatch well-formed, satisfies Linearizability, and is externally equivalent to $\alpha^\prime$.
$\alpha$ is clearly a well-formed execution of $A \times C$ because
(1) the order of user-facing actions is preserved by $\prec_\beta$;
(2) all messages are sent before being received by the definition of $\prec_\beta$; and
(3) the order of each process’s actions reflects $A$’s program order.
The last follows from the initial reordering to produce $\beta^\prime$, which is preserved in producing $\beta$, and finally, by the definition of $\prec_\beta$.
$\alpha$ also satisfies Linearizability:
First, the system-facing actions are now single-dispatch by the initial reordering to produce $\beta$ and the definition of $\prec_\beta$.
Second, the system-facing invocations and responses satisfy Linearizability. In fact, by definition of $\prec_\beta$, accesses are now sequential in the order defined by $S$.
Finally, we can see that $\alpha^\prime | U = \beta^\prime | U = \beta | U = \alpha | U$. The first equality follows from the fact that the order of user-facing actions is preserved in initial transformation to $\beta^\prime$. The second follows from the fact that no user-facing actions were removed in producing $\beta$, and finally, the last equality is preserved by definition of $\prec_\beta$ when producing $\alpha$.
\end{proof}