Specifying and Verifying Secrecy in Workflows with Arbitrarily Many Agents
- First Online:
Abstract
Web-based workflow management systems, like EasyChair, HealthVault, Ebay, or Amazon, often deal with confidential information such as the identity of reviewers, health data, or credit card numbers. Because the number of participants in the workflow is in principle unbounded, it is difficult to describe the information flow policy of such systems in specification languages that are limited to a fixed number of agents. We introduce a first-order version of HyperLTL, which allows us to express information flow requirements in workflows with arbitrarily many agents. We present a bounded model checking technique that reduces the violation of the information flow policy to the satisfiability of a first-order formula. We furthermore identify conditions under which the resulting satisfiability problem is guaranteed to be decidable.
1 Introduction
Web-based workflow management systems allow diverse groups of users to collaborate efficiently on complex tasks. For example, conference management systems like EasyChair let authors, reviewers, and program committees collaborate on the organization of a scientific conference; health management systems like HealthVault let family members, doctors, and other health care providers collaborate on the management of a patient’s care; shopping sites like Amazon or Ebay let merchants, customers, as well as various other agents responsible for payment, customer service, and shipping, collaborate on the purchase and delivery of products.
Since the information maintained in such systems is often confidential, the workflows must carefully manage who has access to what information in a particular stage of the workflow. For example, in a conference management system, PC members must declare conflicts of interest, and they should only see reviews of papers where no conflict exists. Authors eventually get access to reviews of their papers, but only when the process has reached the official notification stage, and without identifying information about the reviewers.
It is difficult to characterize the legitimate information flow in such systems with standard notions of secrecy. Classic information flow policies are often too strong. For example, noninterference [12] requires that the PC member cannot observe any difference when classified input, such as the reviews of papers where the PC member has a conflict of interest, is removed. This strong requirement is typically violated, because another PC member might, for example, nondeterministically post a message in a discussion about a paper where they both have no conflict. Weaker information flow policies, on the other hand, often turn out too weak. Nondeducibility [19], for example, only requires that an agent cannot deduce, i.e., conclusively determine, the classified information. The problem is that a piece of information is considered nondeducible already if, in the entire space of potential behaviors, there exists some other explanation. In reality, however, not all agents exhibit the full set of potentially possible behaviors, and an actual agent might be able to deduce far more than expected (cf. [15]).
Temporal logics for the specification of information flow [10] are an important step forward, because they make it possible to customize the secrecy properties. HyperLTL [7] is the linear-time representative of this class of logics. As an extension of linear-time temporal logic (LTL), HyperLTL can describe the precise circumstances under which a particular information flow policy must hold. While standard linear or branching-time logics, like LTL or CTL\(^*\), can only reason about the observations at a single computation trace at a time, and can thus, by themselves, not specify information flow, HyperLTL formulas use trace quantifiers and trace variables to simultaneously refer to multiple traces. For example, HyperLTL can directly express information flow properties like “for any pair of traces \(\pi , \pi '\), if the low-security observer sees the same inputs on \(\pi \) and \(\pi '\), then the low-security observer must also see the same outputs on \(\pi \) and \(\pi '\)”. The key limitation of HyperLTL for the specification of workflows is that it is a propositional logic. It is, hence, impossible to specify the information flow in workflows unless the number of agents is fixed a-priori. In this paper, we overcome this limitation.
Example workflow from a conference management system.
2 Workflows with Arbitrarily Many Agents
Symbolic Transition Systems. As the formal setting for the specification and verification of our workflows, we chose symbolic transition systems, where the states are defined as the valuations of a set of first-order predicates \(\mathcal P\). The initial states and the transitions between states are described symbolically using an assertion logic over \(\mathcal P\). For the purpose of describing workflows, we use first-order predicate logic (PL) with equality as the assertion language.
A symbolic transition system\(\mathcal S=(\mathcal P, \varTheta , \varDelta )\) consists of a set of predicates \(\mathcal P\), an initial condition\(\varTheta \), and a transition relation\(\varDelta \). The initial condition \(\varTheta \) is given as a formula of the assertion language over the predicates \(\mathcal P\). The transition relation \(\varDelta (P_1,\ldots ,P_k; P_1', \ldots ,P_k')\) is given as a formula over the predicates \(\mathcal P=\{P_1, \ldots , P_k\}\), which indicate the interpretation of the predicates in the present state, and the set of primed predicates \(\mathcal P'=\{P_1',\ldots ,P_k'\}\), which indicate the interpretation of the predicates in the next state.
Let U be some arbitrary universe. In the case of the workflows, U is the set of agents participating in the workflow. Let \(\mathcal P^n\) denote the set of predicates with arity n. A state\(s: \bigcup _{n\ge 0}\mathcal P^n \times U^n \rightarrow \mathbb B\) is then an evaluation of the predicates over U. A trace is an infinite sequence of states \(s_0,s_1,\dots \) such that (1) \(s_0\) satisfies \(\varTheta \) (initiation), and (2) for each \(i\ge 0\), the transition relation \(\varDelta \) is satisfied by the consecutive states \(s_i\) and \(s_{i+1}\), where the predicates in \(\mathcal P\) are evaluated according to \(s_i\) and the predicates in \(\mathcal P'\) are evaluated according to \(s_{i+1}\). We denote the set of all traces of a transition system \(\mathcal S\) as \( Traces (\mathcal S)\).
We remark that, by successive substitution of the formulas \(\varPhi _{R,l}\), we obtain for every prefix of the workflow of length l and for every predicate R, a formula \(\bar{\varPhi }_{R,l}\) which expresses the value of R in terms of the predicates at program start and the predicates \(\textit{Choice}_i\) only.
Example 1
Consider a variation of the conference management workflow given in the introduction, where a set of all PC members that do not have a conflict with any paper is collected.
(\(s_1\)) \(\mathbf{forall}\ x,p \ \mathbf{may}\ \textit{true}\ \rightarrow \ \textit{Conf}\ \mathbin {{+}{=}}\ (x,p)\)
(\(s_2\)) \(\mathbf{forall}\ x,p \ \lnot \textit{Conf}(x,p)\ \rightarrow \ S\ \mathbin {{+}{=}}\ (x)\)
Example 2
3 Specifying Secrecy with First-Order HyperLTL
HyperLTL [7] is a recent extension of linear-time temporal logic (LTL) with trace variables and trace quantifiers. HyperLTL can express noninterference and other information flow policies by relating multiple traces, which are each identified by a separate trace variable. Since HyperLTL was introduced as a propositional logic, it cannot express properties about systems with an arbitrary number of agents. We now present first-order HyperLTL, which extends propositional HyperLTL with first-order quantifiers. In the following, we will refer to first-order HyperLTL simply as HyperLTL.




In a workflow, the inputs or outputs of different agents may be collected in the same predicate. In the conference management example from the introduction, the low outputs observed by a PC member x consist of the pairs (x, p, r) for some paper p in the \(\textit{Read}\) relation and, additionally, of the tuples (x, y, p) for some PC member y and a paper p in the \(\textit{Comm}\) relation. The low input provided by agent x is given by the tuples of the \(\textit{Choice}\) predicates that begin with x. Additionally, the system has high input in the form of the \( Oracle \) predicate.
which expresses that on all pairs of traces where the low inputs are the same and, additionally, the high inputs are the same whenever the declassification condition is true, the low outputs must be the same.
Example 3
\(\square \)
Causality Assumptions on Agents. In the workflow from Fig. 1, it is easy to see that no PC member can directly read the reviews of papers where a conflict of interest has been declared: the PC member can only read a review if the PC member was assigned to the paper, which, in turn, can only happen if no conflict of interest was declared. It is much more difficult to rule out an indirect flow of information via a message sent by another PC member. So far, neither the description of the workflow, nor the HyperLTL specification would prevent other PC members to add such messages to \(\textit{Comm}\). To rule out messages that would leak information about papers where a PC member has a conflict, we must make assumptions about the possible behaviors of the other agents.
Example 4

\(\square \)
4 Verifying Secrecy
We now present a bounded model checking method for symbolic transition systems and HyperLTL specifications. The approach reduces the violation of a HyperLTL formula on the prefix of a trace of a given symbolic transition system to the satisfiability of a formula of the assertion language. For workflows, it suffices to consider prefixes of bounded length, because the workflow terminates (and then stutters forever) after a fixed number of steps. Since the assertion language in the description of the workflows is first-order predicate logic, satisfiability of the resulting formula is not necessarily decidable. We return to this issue in Sect. 5, where we identify conditions under which decidability is guaranteed.
For workflows, satisfaction and bounded satisfaction coincide.
Theorem 1
Let \(\mathcal S\) be the transition system representing a workflow with n blocks. For all HyperLTL formulas \(\psi \), it holds that \(\mathcal S \models \psi \text { iff } \mathcal S \models ^n \psi \).
Bounded Model Checking. We now translate a transition system \(\mathcal S\) and a given universal HyperLTL formula for a given bound n into a formula \(\varPsi _{\mathcal S, \lnot \psi }\) of the assertion language such that \(\varPsi _{\mathcal S, \lnot \psi }\) is satisfiable iff \(\mathcal S \not \models ^n \psi \). Since \(\psi \) is universal, its negation is of the form \(\exists \pi _1,\ldots , \pi _k.\ \varphi \), where \(\varphi \) does not contain any more trace quantifiers. Let the set of predicates \(\mathcal P\) be given as \(\mathcal P = \{ P_1, \ldots , P_m\}\). In \(\varPsi ^n_{\mathcal S, \lnot \psi }\), we use for every predicate \(P_i\) several copies \(P_{i,\pi ,l}\), one per trace variable \(\pi \in \{\pi _1,\ldots ,\pi _k\}\) and position l, \(0\le l \le n\). The formula \(\varPsi ^n_{\mathcal S, \lnot \psi } = [\![\mathcal S ]\!]^n \wedge [\![\varphi ]\!]_0^n\) is a conjunction of two formulas of the assertion language, the unfolding \([\![\mathcal S ]\!]^n\) of the transition system \(\mathcal S\) and the unfolding \([\![\varphi ]\!]_0^n\) of the HyperLTL formula \(\varphi \).
Theorem 2
For a symbolic transition system S, a universal HyperLTL formulas \(\psi \), and a bound \(n\ge 0\), it holds that \(S \models ^n \psi \) iff \(\varPsi ^n_{\mathcal S, \lnot \psi }\) is unsatisfiable.
Combining Theorems 1 and 2, we obtain the corollary that bounded model checking is a complete verification technique for workflows.
Corollary 1
Let \(\mathcal S\) be the transition system representing a workflow with T blocks. For all HyperLTL formulas \(\psi \), it holds that \(\mathcal S \models \psi \) iff \(\varPsi ^T_{\mathcal S, \lnot \psi }\) is unsatisfiable. \(\square \)
5 Decidability
We now identify cases where the satisfiability of the predicate logic formulas constructed by the verification method of the previous section are decidable. For background on PL and decidable subclasses, we refer to the textbook [6].
Theorem 3
Consider a workflow consisting of T blocks where all agents are stubborn, and every predicate R encountered by the workflow after l blocks, is characterized by a quantifier-free formula \(\bar{\varPhi }_{R,l}\).
Let \(\forall \pi _1,\ldots ,\pi _r. \varphi _\mathsf{stubborn}\rightarrow \varphi \) denotes a HyperLTL formula where \(\varPsi '\equiv [\![ \lnot \varphi ]\!]_0^T\) is a Bernays-Schönfinkel formula, i.e., the prenex form of \(\varPsi '\) has a quantifier sequence of the form \(\exists ^*\forall ^*\). Then it is decidable whether \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{stubborn}\rightarrow \varphi \) holds.
Proof
For every predicate R, let \(\bar{\varPhi }_{R,\pi _j,l}\) denote the formula which characterizes \(R_{\pi _j,l}\), i.e., the value of R after l blocks along the execution of \(\pi _j\). The formula \(\bar{\varPhi }_{R,\pi _j,l}\) is obtained from \(\bar{\varPhi }_{R,l}\) by replacing the occurrences of \(\textit{Choice}_i, Oracle \) with \(\textit{Choice}_{i,\pi _j}\) and \( Oracle _{\pi _j}\), respectively. Let \(\bar{\varPsi }'\) denote the formula obtained from \(\varPsi '\) by first replacing every occurrence of a literal \(R_{\pi _j,l}(s_1,\ldots ,s_k)\) with \(\bar{\varPhi }_{R,\pi _j,l}(s_1,\ldots ,s_k)\). As all agents are stubborn, the predicates \(\textit{Choice}_{i,\pi _j}\) are equivalent for \(j=1,\ldots ,r\). Accordingly, we may replace all \(\textit{Choice}_{i,\pi _j}(s_1,\ldots ,s_k)\) with \(\textit{Choice}_{i,\pi _1}(s_1,\ldots ,s_k)\). The resulting formula is still a Bernays-Schönfinkel formula. It is unsatisfiable iff \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{stubborn}\rightarrow \varphi \) is universally true. Satisfiability of \(\bar{\varPsi }'\), however, is decidable — which thus implies the the theorem.
\(\square \)
Theorem 3 can be extended also to more general classes of workflows, given that the predicates \(R_{\pi _j,l}\) occur only positively or only negatively in \(\varPsi '\). Non-interference, however, amounts to stating that (under certain conditions) no distinction is observable between some \(R_{\pi _j,l}\) and \(R_{\pi _{j'},l}\). Logically, indistinguishability is expressed by equivalence, which thus results in both positive and negative occurrences of the predicates in question.
Theorem 4
Consider a workflow consisting of T blocks where all agents are causal, and every predicate R encountered by the workflow after l blocks, is characterized by a quantifierfree formula \(\bar{\varPhi }_{R,l}\). Assume that \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{causal}\rightarrow \varphi \) is a temporal formula where the prenex form of \(\varPsi '\equiv [\![ \lnot \varphi ]\!]_0^T\) is purely existential. Then it is decidable whether \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{causal}\rightarrow \varphi \) holds.
Proof
The argument for causal agents is somewhat more complicated and accordingly leads to decidability only for a smaller fragment of HyperLTL formulas. Removal of the temporal operators and skolemization of the formula \(\varphi _\mathsf{causal}\) describing causality yields a conjunction of clauses in one of the forms \((*)\): \({S(x,f_1(x),\ldots ,f_r(x)) \vee \textit{Choice}_{i,\pi _{j_1}}(x,\mathbf{z})\vee \lnot \textit{Choice}_{i,\pi _j}(x,\mathbf{z})}\), or \({S(x,f_1(x),\ldots ,f_r(x)) \vee \lnot \textit{Choice}_{i,\pi _{j_2}}(x,\mathbf{z})\vee \textit{Choice}_{i,\pi _j}(x,\mathbf{z})}\) for \(j_1,j_2<j\), where the disjunction S refers to predicates which depend on \(\textit{Choice}_{i',\_}\) for \(i'<i\) only. To perform ordered resolution, we order predicates so that \(\textit{Choice}_{i,\pi _j}\) receives a higher priority than \(\textit{Choice}_{i',\pi _{j'}}\) if \(i'<i\) or, if \(i=i'\), \(j'<j\). All predicates in S have lower priorities than the \(\textit{Choice}\) predicates. Accordingly, the highest priority literal in each clause of \(\varphi _\mathsf{causal}\) contains all free variables of the clause.
Let us first consider the case \(r=2\). Then resolution of two clauses with a positive and negative occurrence of the same highest-priority literal will result in a tautology and therefore is useless. As in the proof of Theorem 3, let \(\bar{\varPsi }'\) denote the formula obtained from \(\varPsi '\) by replacing each occurrence of a predicate \(R_{\pi _j,l}(s_1,\ldots ,s_k)\) with the formulas \(\bar{\varPhi }_{R,\pi _j,l}(s_1,\ldots ,s_k)\) (\(j=1,2\)). According to our assumption on \(\varPsi '\), the clauses obtained from \(\bar{\varPsi }'\) are all ground. Resolution of such a clause with a clause of \(\varphi _\mathsf{causal}\) for some \(\textit{Choice}_{i,\pi _2}\) will again return a ground formula. By substituting the semantic formulas \(\bar{\varPhi }_{R,\pi _j,l}\) we obtain a set of new ground clauses, this time, however, with occurrences of predicates \(\textit{Choice}_{i',\pi _{j'}}, i'<i,\) only. As a consequence, for every i, there is a bounded number of new clauses derivable by means of clauses from \(\varphi _\mathsf{causal}\) with highest priority predicate \(\textit{Choice}_{i,\pi _2}\). Altogether, we therefore obtain only a bounded number of ground clauses which are derivable by means of ordered resolution. Hence, it is decidable whether a contradiction is derivable or not. This concludes the proof.
The argument for \(r > 2\) is similar, only that resolution of any two such clauses originating from \(\varphi _\mathsf{causal}\) with \(j_1\ne j_2\) upon the literal \(\textit{Choice}_{i,\pi _{j}}(x,\mathbf{z})\)) will again result in a clause of the given form. In particular, no further literals are introduced. Therefore, saturation of \(\varphi _\mathsf{causal}\) by ordered resolution will eventually terminate. Then the argument for termination proceeds analogously to the case \(r=2\) where \(\varphi _\mathsf{causal}\) is replaced with the saturation of \(\varphi _\mathsf{causal}\). \(\square \)
Theorem 4 can be extended to formulas \(\varphi \) where \(\bar{\varPsi }'\) obtained from \([\![ \lnot \varphi ]\!]_0^T\) is a Bernays-Schönfinkel formula at least in restricted cases. Consider the clauses of the form \((*)\) as obtained from \(\varphi _\mathsf{causal}\) after skolemization. In case that the disjunction S is empty, we call the corresponding clause simple, otherwise complex. Now assume that complex clauses from the saturation of \(\varphi _\mathsf{causal}\) are always resolved with clauses (originating from the skolemization of \(\bar{\varPsi }'\)) upon a ground literal. Then the same argument as in the proof of Theorem 4 applies to show that saturation by resolution will eventually terminate.
6 Completing the Conference Management Example

(2a) \(\mathbf{forall}\ x,y,p,q .\ \textit{Conf}(x,p) \wedge \lnot \textit{Conf}(y,p) \wedge \textit{A}(x,q) \wedge \textit{A}(y,q) \rightarrow \textit{A}\mathbin {{-}{=}}(y,q)\)
% PC chair removes assignments that might cause leaks
The resulting formula after substitution of the semantics formulas and simplification is similar to \(\lnot \bar{\varPsi }\), but adds two conjunctions with the \(\forall \)-clause of \(\bar{\varPhi }_{\textit{A}'}\) instantiated for \((x,p')\) and \((y,p')\) on both sides of the inequivalence. The resulting formula is a Bernays-Schönfinkel formula where again the decision procedure of Theorem 4 can be applied. That procedure now derives a contradiction. Intuitively, the reason is that on both paths, x has declared a conflict with \(p_1\). Since y is assigned to \(p_1\), x and y cannot be assigned jointly to the same paper. Thus, both sides of the inequivalence collapse to \(\textit{false}\) — implying that for (WF2) requirement (2) is satisfied and thus (WF2) is indeed noninterferent. \(\square \)
7 Related Work
There is a vast body of work on information flow policies and associated verification techniques. We mention Goguen and Meseguer’s seminal work on noninterference [12], Zdancewic and Myer’s observational determinism [20], Sutherland’s nondeducability [19], and Halpern and O’Neill’s secrecy maintenance [13] as representative examples. See Kanav et al. [15] for a recent overview with a detailed discussion of the most relevant notions for the verification of workflows. Our approach is based on the temporal logic HyperLTL [7]. HyperLTL has been applied in the verification of hardware systems, such as an Ethernet controller with 20000 latches [11]. Other logical approaches to information flow control include SecLTL [8], the polyadic modal \(\mu \)-calculus [2] and the epistemic temporal logics [9]. While standard linear-time temporal logic has been extended with first-order quantifiers [16], our first-order extension of HyperLTL is the first temporal logic for the specification of information flow in systems with arbitrarily many agents. In terms of practical verification efforts, there has been a lot of recent interest in proving secrecy in web-based workflow management systems. For example, for the ConfiChair conference management system it was proven that the system provider cannot learn the contents of papers [3]. For CoCon, another conference management system, it was proven that various groups of users, such as authors, reviewers, and PC members cannot deduce certain content, such as reviews, unless certain declassification triggers, such as being a PC member without a conflict of interest, are met [15]. For the verification of an eHealth system, Bhardwaj and Prasad [5] assume that all agents are known at analysis time. Based on this information, the authors construct a dedicated security lattice and then apply techniques from universal information flow [1, 14]. Our verification method is based on a reduction to the satisfiability problem of first-order predicate logic. First-order logic has many applications in verification. Most related, perhaps, is recent work on the verification of software defined networks [4, 17]. There, a network controller is translated into a first order formula and either a theorem prover or an SMT-solver is used to determine properties of the topology so that the controller satisfies a given invariant.
8 Conclusion
We have presented a formalization of fine-grained security properties for workflow systems with an unbounded number of agents. HyperLTL is the first approach to specify hyperproperties for systems without a fixed set of agents. For the verification of HyperLTL formulas, we have provided a bounded model checking algorithm that translates the problem of verifying such a property for a given workflow to the satisfiability of first-order predicate logic. We have also provided a non-trivial fragment of properties and workflows so that the corresponding decision problem is decidable. As an example we considered noninterference for a simple workflow of a conference management system. Unexpectedly, our method exhibited a subtle form of indirect information flow. We also indicated how that deficiency can be cured. All corresponding proving took place within our benevolent fragments. Various problems remain for future work. Further decidable fragments are of major concern. Also, our work should be extended to more complex and thus more expressive forms of workflows.
Acknowledgment
This work was partially supported by the German Research Foundation (DFG) under the project “SpAGAT” (grant no. FI 936/2-1) in the priority program “Reliably Secure Software Systems “RS3” and the doctorate program “Program and Model Analysis - PUMA” (no. 1480).