# Adam: Causality-Based Synthesis of Distributed Systems

- First Online:

## Abstract

## 1 Introduction

Research on the *reactive synthesis* problem, i.e., the challenge of constructing a reactive system automatically from a formal specification, dates back to the early years of computer science [5, 6, 16]. Over the past decade, this research has led to tools like Acacia+ [4], Ratsy [3], and Unbeast [7], which translate a formal specification automatically into an implementation that is correct in the sense that the system reacts to every possible input from the system’s environment in a way that ensures that the specification is satisfied. The tools have been used in nontrivial applications, such as the synthesis of bus arbiter circuits [2] and robotic control [12]. The key limitation of the current state of the art is that the underlying system model consists of a single process. If the system under construction consists of several distributed parts, such as several robots, then the implementation is always based on a central controller with whom the entire system must constantly synchronize. This is unfortunate, because in practice, it is specifically the design of the *distributed* implementation with multiple concurrent processes that is most error-prone and would, therefore, benefit most from a synthesis tool.

*Adam*Petri.) Our aim is to automate the construction of complex distributed systems, such as production plants with multiple independent robots. Rather than creating a central controller, with whom all robots must constantly synchronize, Adam creates an individual controller for each robot, which acts on locally available information such as the information obtained through observations and through synchronization with nearby robots.

The most well-studied model for the synthesis of distributed systems is due to Pnueli and Rosner [14]. This model captures the *partial information* available to the processes by specifying for each process the subset of events that are visible to the process. The decisions of the process are based only on the history of its observations, not on the full state history. Unfortunately, the Pnueli/Rosner model has never been translated into practical tools; the synthesis problem under the Pnueli/Rosner model is undecidable in general, and very expensive (nonelementary) in the special cases where it can be decided [10, 15].

Adam is based on the more recently developed model of *Petri games* [9]. The synthesis problem is modeled as a game between a team of system players, representing the processes, and an environment (player), representing the user (and other external influences) of the system. Both the system players and the environment are represented as tokens of a Petri net. As Petri nets, the games capture the complex *causal dependencies* (and independence) between the processes (cf. [17]). Figure 1 shows a typical application scenario, taken here from the synthesis of robot controllers in a production plant, addressing concurrency, usage constraints, and uncertain availability of machines. The robots are expected to process *k* orders on *n* machines (here: \(k=2, n=3\)), despite the actions of a hostile environment, which is allowed to declare one machine to be defective. The environment, initially at place \( Env \), chooses which machine is defective and activates the remaining machines by putting tokens on two of the places \(A_i\), \(i \in \{0,1,2\}\). The two system players in place *Sys* represent the two robots. Different robots can take their orders concurrently to different machines. If a robot chooses a machine \(M_i\) right away, it does not know whether \(M_i\) is defective, i.e., without a token on \(A_i\). Then from \(M_i\) only the bad place \(B_i\) is reachable. If a robot chooses an active machine \(M_i\) (with a token on \(A_i\)) then from \(M_i\) the good place \(G_i\) is reachable by consuming the token from \(A_i\). If a robot chooses \(M_i\) again, the token on \(A_i\) is missing, and only the bad place \(B_i\) is reachable. A winning strategy for the robots must avoid any transitions to a bad place. To this end, the robots first inform themselves, via the synchronizing transitions \(t_0, t_1\) and \(t_2\), which machines are broken (this is done simultaneously by the two robots due to the arc multiplicity 2) and then use two different active machines.

In recent work [9], we showed that solving Petri games with safety objectives, a single environment player and an arbitrary (but fixed) number of system players is EXPTIME-complete, and thus dramatically cheaper than comparable synthesis problems in the Pnueli/Rosner setting. Adam represents the first practical implementation of this theoretical result.

## 2 The Synthesis Game

We model the synthesis problem as a game between a team of system players on one side and a hostile environment player on the other side. The system players have a joint objective, to defeat the environment, but are independent of each other in the sense that they have no information of each other’s state unless they explicitly communicate. A *Petri game* [9] is a refinement of a Petri net. The players are the tokens in the underlying Petri net. They are organized into two teams, the system players and the environment players, where the system players wish to avoid a certain “bad” place (i.e., they follow a safety objective), while the environment players wish to reach just such a marking. To partition the tokens into the teams, we distinguish each place *p* as belonging to either the system (\(p \in \mathcal {P}_S\)) or the environment (\(p \in \mathcal {P}_E\)). A token belongs to one of these teams whenever it is on a place that belongs to that team. Formally, a Petri game is a structure \(\mathcal G = (\mathcal {P}_S, \mathcal {P}_E, \mathcal {T}, \mathcal {F}, In , \mathcal {B})\), where the underlying Petri net of \(\mathcal G\) is \(\mathcal N = (\mathcal {P}, \mathcal {T}, \mathcal {F}, In )\) with set of places \(\mathcal {P}=\mathcal {P}_S \cup \mathcal {P}_E\), set of transitions \(\mathcal {T}\), flow relation \(\mathcal {F}\), initial marking \( In \), and set of bad places \(\mathcal {B}\). We depict places of \(\mathcal {P}_S\) in gray and of \(\mathcal {P}_E\) in white. In the following, we assume that there is a single environment player and an arbitrary (but bounded) number of system players. We further assume that the Petri net is safe, i.e., every place is, at all times, occupied by at most one token. Petri games that are bounded, but not necessarily safe, like the example from the introduction, can be translated into an equivalent game with a safe net using the standard transformation.

A player (token) is always informed about its causal past. As long as different players move in concurrent places of the net, they do not know of each other. Only when they communicate, i.e., synchronize at a joint transition, they exchange their knowledge about the past. Formally, this is modelled by the net unfolding.

*strategy*\(\sigma \) for the system players will eliminate at each place of the net unfolding some of the available branches. A strategy is

*winning*for the system players if all branches that lead to a bad place are eliminated. For each player we can obtain a

*local controller*by isolating the part of \(\sigma \) that is relevant for this player. These local controllers can proceed independently unless they have to synchronize at a joint transition with other local controllers as described by \(\sigma \). Since the winning condition of a game is a

*safety objective*, the system players can satisfy it by doing nothing. To avoid such trivial solutions, we look for strategies that are

*deadlock-avoiding*in the sense that in every reachable marking, whenever there is a transition enabled in the unfolding then there is some transition enabled in the strategy. A marking where there is no enabled transition in the unfolding either is not a deadlock. Then we say that the game has

*terminated*. A

*play*\(\pi \) (conforming to a strategy \(\sigma \)) is obtained from \(\sigma \) by eliminating all remaining choices such that at each place there is only one transition left (determinism). The system players win the play \(\pi \) if it does not contain a bad place. Otherwise, the environment wins.

## 3 Solving Petri Games

Petri games can be solved via a reduction to two-player games over finite graphs. In this section, we give an informal sketch of the reduction, focusing on the symbolic representation and the fixed point iteration of the game solving algorithm. For a more formal presentation of the reduction from Petri games to two-player games over finite graphs, the reader is referred to [9].

The two-player game simulates the Petri game through a sequence of *cuts*, i.e., maximal sets of concurrent places. We annotate the system places in a cut with a *decision set*, i.e., a set of transitions currently selected by the player represented by the token on the system place. In each cut, we designate a subset of the system places as *type-2*, which means that its strategy will no longer synchronize (directly or indirectly through other system tokens) with the environment. Additionally, we designate a subset of the system places as *type-1*. These are places that still require a synchronization with the environment but are, in the current cut, not able to move (following their decision sets) before the environment makes its next move.

Cuts where all system places are either type-1 or type-2 are called *mcuts*. Mcuts correspond to situations in which the system players have progressed maximally in the sense that all non-type-2 places are blocked until the environment moves. The key idea of the reduction is to delay all environment decisions until an mcut is reached. This ensures that all system decisions that should be made independently of the environment choice have actually been made *before* the environment decision is made, and are, hence, guaranteed to be independent of this decision. A winning strategy for the system players must thus legally move from mcut to mcut, in response to the environment decisions at the mcuts, without encountering bad situations (such as bad places), either until the Petri net terminates or forever, if the play never terminates.

*The symbolic representation of cuts.* Our representation of a cut is organized by the tokens, rather than places: this is motivated by the fact that the number of tokens in a Petri net is usually much smaller than the number of places; it is therefore cheaper to assign to each token the currently occupied place instead of simply representing an (arbitrary) subset of the places. A cut is represented as a bitvector, which is composed of several subvectors, one for each token. Figure 2 depicts such a subvector for a system token *i*. The first part of the bitvector encodes the place \(p_i\) and its type (type-1 vs. type-2). The second part encodes the decision of the strategy. The bit \(t_j\) is set iff the player represented by token *i* chooses to allow the *j*th transition of the Petri net. The \(\top \)-bit is set right after a transition is executed. It indicates that the player is allowed to choose a new set of transitions. For the special case of the environment token, we only need to encode the place, without the type, \(\top \)-, and transition flags. We use BDDs to represent sets of cuts and relations on cuts.

*The game solving algorithm.* The game solving algorithm consists of three phases. *Phase 1* is a preprocessing step that identifies the type-2 places in the cuts. The strategy from type-2 places must guarantee that the tokens on the type-2 places have no further interaction with the environment. The set of all cuts with correct type-2 annotation is computed as a largest fixed point. *Phase 2* identifies the winning mcuts, i.e., mcuts where the strategy from type-1 places guarantees that the game continues with an infinite sequence of mcuts or reaches an mcut with only type-2 tokens. The set of mcuts is computed as a largest fixed point. Nested inside the largest fixed point computation is a least fixed point iteration that finds the predecessor mcuts, by first identifying all cuts from which the system players can force the game without further interaction with the environment into some mcut of the current approximation of the largest fixed point. Phase 2 also computes, as a least fixed point, the set of all cuts from which the system players can enforce a visit of such an mcut. The game is won by the system players iff the initial cut is in this set. *Phase 3* constructs a winning strategy if the game is won by the system players. The strategy first enforces the visit of a winning mcut according to the computation in Phase 2. From there, the strategy from type-1 places forces the game into new winning mcuts, and the strategy from type-2 places ensures, according to the computation in Phase 1, the safe continuation without any further interaction with the environment.

## 4 Experience with Adam

Adam is a Java-based implementation of the fixed point construction described in Sect. 3. For the BDD operations, Adam uses BuDDy [13], a BDD library written in C. The size of the BDDs is reduced with various optimizations, including a compact representation of the place encodings, based on net invariants (which reduce the set of potential places for each token). We use the DOT [1] format as output for Graphviz for the visualization of the Petri games and the strategy graph of the 2-player game, as well as the Petri game strategies.

Experimental results.

For each benchmark, the table shows the number \(\# Tok \) of tokens, the number \(\# Var \) of BDD variables used, and the numbers \(\#P\) and \(\#T\) of places and transitions, respectively, of the Petri game. We give the elapsed CPU *time* in *s*, and the used *memory* in GB for solving the problem. For the resulting solution, \(\#P_s\) and \(\#T_s\) are the number of places and transitions of the strategy, respectively. *Par* states the parameter size(s) of the example. The time and memory values are an average of 10 runs. The results were obtained on an Intel i7-2700K CPU with 3.50GHz and 32 GB RAM. The experiments refer to the following scalable benchmarks:

\(\bullet \) CM: *Concurrent Machines* (see Fig. 1). The environment decides which of *n* machines is functioning. On these machines, *k* orders should be processed, each order by one machine. Different orders can be processed concurrently on different machines. No machine should be used twice for processing orders.

*Parameters*: *n* machines/*k* orders

\(\bullet \) SR: *Self-reconfiguring Robots* [11]. Each piece of material needs to be processed by *n* different tools. There are *n* robots having all *n* tools to their disposal, of which only one tool is currently used. The environment may (repeatedly) destroy a tool on a robot *R*. Then the robots reconfigure themselves so that *R* uses another tool and the other robots adapt their usage of tools accordingly.

*Parameters*: *n* robots with *n* tools/*k* tools will be successively destroyed

\(\bullet \) JP: *Job Processing*. The environment chooses a subset of *n* different processors and a job that requires handling by each processor in this subset in ascending order. *Parameter*: *n* processors

\(\bullet \) DW: *Document Workflow*. The environment hands over a document to one of *n* clerks. The document then circulates among the clerks. Each clerk should endorse it or not, but wants to make the decision dependent on who has endorsed it already. Altogether they should reach a unanimous decision. In a simpler variant DWs, all clerks should endorse it. *Parameter*: *n* clerks

The benchmarks represent essential building blocks for modeling various manufacturing and workflow scenarios that can be analyzed automatically by synthesizing winning strategies with Adam.