information-sign

Information Sign: Satisfiability Modulo Theory

CDCL and Quantifiers

matp

2024-12-15


In the previous post, we have seen the problem of metavariable coupling. A solution to resolve metavariable coupling already exists, called Satisfiability Modulo Theory (SMT). SMT refers to the problem of determining if a mathematical formula is satisfiable. It cannot do this for general mathematical formulae, but for the ones which it can solve, SMT has seen applications from program verification to model checking.

Examples

The witch Marisa is going to a conference and is riding her broom on the way there. Since there is no Department of Transportation for witches, she has to avoid colliding with other conference attendees and debris in the air. Is there a way to determine which direction Marisa should move to next?

Marisa Problem Setting

Although this example is fictitious, avoiding collisions with objects is a task required in self-driving and aircraft autopiloting (see TCAS).

The Simpler Version

We first simplify this problem down. Rather than dodging the exact outline of the entities, we define a “hitbox” with radius rr that Marisa would avoid for each entity. Marisa’s X and Y coördinates (xˊ,yˊ)(\acute x,\acute y) are not allowed to enter a box of a certain radius rr from each entity. Moreover, we assume that with great speed, Marisa can effectively view each entity as having a fixed location in space (we’ll relax this constraint later), and she has to make progress towards the conference by moving her Y-coördinate up.

Suppose an entity has coördinate (xi,yi)(x_i,y_i), Marisa hits the entity when her coördinate falls in the box xirxˊxi+r,yiryˊyi+rx_i-r \leq \acute x \leq x_i+r, y_i - r \leq \acute y \leq y_i + r. Hence Marisa’s coördinates (xˊ,yˊ)(\acute x,\acute y) have to satisfy

Ei:=xˊ<xirxˊ>xi+ryˊ<yiryˊ>yi+rE_i := \acute x < x_i - r \lor \acute x > x_i + r \lor \acute y < y_i - r \lor \acute y > y_i + r

we abbreviate this using arrows

Ei:=iiiiE_i := \leftarrow_i \lor \rightarrow_i \lor \downarrow_i \lor \uparrow_i
An entity has a square hitbox

Finally, Marisa has to make progress towards the conference, so we can assume that she starts at (0,0)(0,0) and must move into the region

T:=xTxˊxT,yT1yˊyT2T := -x_T \leq \acute x \leq x_T, y_{T1} \leq \acute y \leq y_{T2}

depicted in the diagram.

Marisa Problem Simplified

This can be written down formally,

P:=xTxˊxˊxTyT1yˊyˊyT2Green Zonei(iiii)P := \underbrace{-x_T \leq \acute x \land \acute x \leq x_T \land y_{T1} \leq \acute y \land \acute y \leq y_{T2}}_{\text{Green Zone}} \land \bigwedge_i ( \leftarrow_i \lor \rightarrow_i \lor \downarrow_i \lor \uparrow_i )

This formula is in Conjunctive Normal Form.

We can now think about a way to approach this problem mechanically. With a small number of entities, it is easy to find a solution by random guessing or brute forcing, but this becomes infeasible for a large number of entities. Real world SMT problems often have thousands of clauses, so we have to get clever.

To start somewhere, for entity 11, we arbitrarily pick a condition (e.g. Marisa is on the left side i\leftarrow_i) that Marisa’s coördinates must satisfy in order to avoid it. This guess is random and is called a decision point. However, making this decision makes deciding Marisa’s coördinates a geometric impossibility (called TT-Conflict), since she cannot land outside of the green region.

L1 Conflict

This indicates that our decision is wrong and we have a conflict. We need to backtrack to an earlier decision point involved in this conflict, namely 1\leftarrow_1. Either xTxˊ-x_T \leq \acute x or 1\leftarrow_1 must be false:

L1:=¬(xTxˊ)¬1L_1 := \neg (-x_T \leq \acute x) \lor \neg \leftarrow_1

This is a theory lemma: It is deduced by geometric reasoning on a possible configuration, and during subsequent decision process to search for a solution, we must take it into consideration. The theory lemma L1L_1 is a necessary but not sufficient condition for PP.

Since our problem PP asserts Marisa must land in the green zone, xTxˊ-x_T \leq \acute x is true and therefore ¬i\neg \leftarrow_i must be true since otherwise there’s no way to satisfy L1L_1. In other words, Marisa must not land on the left side of entity 11. This logical deduction is called propagation.

We repeat the guess, and eliminate a bunch of conditions via decision and propagation:

Marisa Problem After Unit Propagation

Next, suppose we make a decision 3\rightarrow_3. Geometrically, this implies 4\uparrow_4, and hence we have ¬5\neg \downarrow_5. This is TT-Propagation, where we use geometry to determine logical implication.

Marisa Problem After Deciding R3

But on the other hand 3\rightarrow_3 implies 5\downarrow_5 and ¬4\neg \uparrow_4. We can depic these logical relations in an implication graph.

Implication Graph after R3

All descendants from this implication graph’s common decision point 3\rightarrow_3 lead to conflicts, so we apply backtracking to the decision graph and conclude ¬3\neg \rightarrow_3. Using the unit propagation results, we already have ¬3,¬3\neg \uparrow_3, \neg \downarrow_3, and the avoidance clause E3E_3 implies via unit propagation 3\leftarrow_3.

Marisa Problem After Unit Propagating to L3

Repeating this process, we eventually narrow down on all the conditions required for a coördinate satisfying PP:

Final State

The Temporal Version

We have made some unrealistic assumptions about Marisa’s mobility. What if the entities are moving at a non-negligible relative speed? In this case, we need to ensure Marisa does not collide into anything within a specified timespan TT. Suppose Marisa’s position is given by the vector xˊ\vec{\acute x} and velocity by vˊ\vec{\acute v}. This asserts

P:=vˊ21vˊ2>0t.(0tT)ixˊ+tvˊxitvi>rEiP := \norm{\vec{\acute v}}_2 \leq 1 \land \acute v^2 > 0 \land \forall t. (0 \leq t \leq T) \to \bigwedge_i \underbrace{\norm{\vec{\acute x} + t\vec{\acute v} - \vec{x}_i - t\vec{v}_i} > r}_{E_i}

The only variable we get to decide is Marisa’s velocity vˊ\vec{\acute v}, which is constrained in a semicircle vˊ21\norm{\vec{\acute v}}_2 \leq 1 corresponding to Marisa’s maximum movement speed and the condition that we must make progress.

Marisa Problem with Time

We begin by randomly guessing a velocity vˊ:=[0,0.5]\vec{\acute v} := [0, 0.5]. This is a candidate model. Then we can use the above quantifier-free procedure to detect if Marisa collides with an entity. A collision at time t=0.5t = 0.5 happens with entity 11 when vˊ\vec{\acute v} is set to this value.

Hence we remember that 0.50.5 is a critical time when a collision happened with entity 11. We record this important snapshot and instantiate t:=0.5t := 0.5 in the avoidance clause E1E_1 for entity 1. This generates the clause

C1:=xˊ+0.5vˊx10.5v1>rC_1 := \norm{\vec{\acute x} + 0.5\vec{\acute v} - \vec{x}_1 - 0.5\vec{v}_1} > r

Note that only entity 11 is involved. This clause can be rewritten as a geometry problem in the phase space of vˊ\vec{\acute v}.

Phase Space after 1 guess

We make a second guess of vˊ\vec{\acute v} satisfying the clause C1C_1 and check PP against this model. This generates a violation at time t=0.8t = 0.8, so we create another clause in the phase space:

C2:=xˊ+0.8vˊx20.8v2>rC_2 := \norm{\vec{\acute x} + 0.8\vec{\acute v} - \vec{x}_2 - 0.8\vec{v}_2} > r
Phase Space after 2 guesses

Hence we make another guess in C1C2C_1 \land C_2. This time, Marisa does not collide with any entity in the time frame t[0,1]t \in [0,1], and hence our proposed candidate velocity vˊ\vec{\acute v} helps Marisa avoid all collisions on her way.

The Algorithm

Conflict-Driven Clause Learning

The above process in the simpler problem, the Conflict Driven Clause Learning (T) or CDCL (T) algorithm, lies at the core of modern SMT solvers such as cvc5 or z3. It is implemented by the SAT solver CaDiCaL. CDCL is an improvement based on the chronologically-backtracking Davis-Putman-Logemann-Loveland or DPLL algorithm.

CDCL(T) and DPLL(T) are based on two components, a SAT-solving core, and individual theories. The SAT-solving core is responsible for finding a set of (possibly compatible) conditions in the SMT formula and sending them to the theory solvers to verify. The theory solvers verify the condition generates by the SMT solver are compatible. In the above case with Marisa dodging entities, the SAT solver finds possible solutions, and the theory solver is a linear inequality solver that checks these conditions are compatible.

If the SAT solver cannot propose more combinations, the formula is unsatisfiable (UNSAT). If the theory solvers check that an assignment of the formula is consistent, the formula is satisfiable (SAT).

Quantifiers

The solution we presented in the Temporal Version of Marisa’s problem is called Model-Based Quantifier Instantiation (MBQI). For a formula with quantifiers P:=x.f(x)P := \forall x. f(x), the Skolem Normal Form is the version P:=f(x)P' := f(x) with quantifiers stripped away and each quantifier-bound variable replaced by an unknown function.

The MBQI algorithm proposes candidate models and check the satisfiability of the Skolemized formula PP' using quantifier-free (CDCL, etc.) methods. If the result is unsatisfiable, MBQI has found a model which solves the original formula PP. Otherwise, the counterexample xx (or tt in the case of Marisa’s formula) is added to the overall clause of PP. Intuitively, any counterexample generated for the Skolemized formula PP' is a “important time snapshot” and should be taken into account in the next iteration of model searching. If at any time no candidate model can be proposed, the original formula PP is unsatisfiable.

MBQI can easily go into an infinite loop if the generated model candidates don’t lead to progress or if the size of each model is infinite. This is when other quantifier solving methods such as E-matching come in. SMT with quantifiers is in general an undecidable problem so there is no universal algorithm to solve SMT problems containing quantifiers.

Post Scriptum

For the audience who are interested in learning about SMT, I recommend Stanford’s CS 257: Introduction to Automated Reasoning. The author of this article works in the lab (Stanford Centaur) which oversees the operation of this course.


Created and Designed by Leni Aniva based on Tokiwa