# Benchmarks for the two-variable fragment

This page contains benchmarks for solvers of finite satisfiability problems of the two-variable fragment of first-order logic used to test out tool FO²-solver in our preprint arXiv:1610.02101. Some of the benchmarks derive from the verification problem of script programs with SQL statements. Other were motivated by notions in graph theory. Others still test the behavior of FO²-sentences with only one variable. In the preprint arXiv:1610.02101, the verification benchmarks are treated in Tables 1 and 2, and the graph-theoretic benchmarks and the one-variable benchmarks are treated in Table 3.

# Graph-theoretic benchmarks (graph-theoretic-benchmark.zip)

The graph-theoretic benchmarks are generated from python scripts linked below with a size parameter.

Each of the benchmarks can be instantiated for any chosen size.

## 2-colorings

The 2colX.smt2 benchmark attempts to construct a graph G = (V,E) whose vertices are partitioned into X non-empty and disjoint parts U_{0},…,U_{X-1} such that

(1) there are no edges inside each part

(2) every vertex in U_{i} has an outgoing edge into U_{i+1} (where i+1 is taken modulo X), and

(3) the set A induces a proper 2-coloring of G.

2colX.smt2 is satisfiable iff X is even. This is true since G contains a cycle of length X, and all cycles are of length X, and a graph as 2-colorable iff it contains no odd cycle.

smt2 files: 2, 3, 4, 5, 6, 10, 11

Python script

## Paths

This benchmark constructs a graph G = (V,E) whose vertices may be labeled by an unary relation U

with unlabeled paths of length *size* between labeled vertices. The paths need not be simple. The sentence is the conjunction of:

∃ x (U(x))∀ x ∃ y (B(x,y))∀ x (U(x) → α_{size}(x))

where

α_{i}(x) = ∃ y (B(x,y) ∧ ¬ U(x) ∧ ∀ x (B(x,y) → (¬ U(x) ∧ α_{i-1}(x))))α_{0}(x) = ∃ y (B(x,y) ∧ U(y))

The sentence is satisfiable.

smt2 files: 10, 100

Python script

## Simple paths

This benchmark constructs a graph G = (V,E) whose vertices are partitioned into 2*size+2 non-empty disjoint sets U_{0},…,U_{2*size+1} such that all edges go from a vertex in U_{i} to a vertex in U_{i+1} (modulo 2*size+2) and such that every vertex participates in a simple path. Moreover, the graph satisfies the sentence ∀ x (U_{0}(x) → β_{0}(x)), where

β_{i}(x) = ∃ y (B(x,y) ∧ U_{2*i+1}(y) ∧ ∀ x (B(y,x) → (U_{2*i+2}(y) ∧ β_{i+1}(x))))β_{size}(x) = ∃ y (B(x,y) ∧ U_{2*size+1}(y))

The sentence is satisfiable.

smt2 files: 2, 3, 4, 7, 10

Python script

## Exponential graph

This benchmark constructs a graph G = (V,E) whose size is exponential in the length of the formula. The unary relations U_{0},…,U_{size-1} are subsets of the set of vertices V. For every vertex v and every U_{i}, there is u which agrees with v on all the U_{j}, j≠i, and disagrees with v on U_{i}. Any maximally consistent conjunction of U_{0},…,U_{size-1},¬U_{0},…,¬U_{size-1} is non-empty. Hence, G has at least 2^{size} vertices.

The sentence is satisfiable.

smt2 files: 2, 3, 4

Python script

## Impossible path

This benchmark constructs a graph G = (V,E) whose vertices are partitioned into *size* non-empty disjoint sets U_{0},…,U_{size-1} such that all edges go from a vertex in U_{i} to a vertex in U_{i+1} (modulo size) and such that every vertex participates in a simple path.

In addition, the benchmark makes the following impossible requirement: all the vertices of U_{0} belong to an unary relation A, which moreover propagates along graph edges, but none of the vertices of U_{size-1} belong to A.

The sentence is unsatisfiable.

smt2 files: 2, 3, 6

Python script

# One variable (one-var-benchmarks.zip)

These benchmarks use only the variable x. The main challenge they present is in their large nesting of quantifiers. They are generated from python scripts linked below with a parameter propostional to the number of quantifiers. Each of the benchmarks can be instantiated for any chosen size.

## Existential quantifiers

The sentence consists of an atomic formula inside a block of existential quantifiers:

∃ x ∃ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 300

Python script

## Universal quantifiers

The sentence consists of an atomic formula inside a block of universal quantifiers:

∀ x ∀ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 500

Python script

## Alternating quantifiers 1

The sentence consists of an atomic formula inside a block of alternating quantifiers:

∀ x ∃ x ∀ x ∃ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 300

Python script

## Alternating quantifiers 2

The sentence consists of a contradiction inside a block of alternating quantifiers:

∀ x ∃ x ∀ x ∃ x … (U(x) ∧ ¬U(x))

The sentence is unsatisfiable.

smt2 files: 5, 300

Python script

# Program-based benchmarks (verification-benchmarks.zip)

These benchmarks are generated based on programs in SmpSQL and their specifications in FO^{2} using our SmpSL Verification Conditions Generator tool.

## Simplified web administrator

In this example we verify two functions of a simple web administrator. The web administrator has a database

with tables as follows:

- The table
*Subscriptions*with two columns,*newsletterId*and*userId*. A row (*nl*,*usr*) in the table indicates that the user*usr*is subscribed to the newsletter*nl*. - The table
*Users*with one column*userId*. - The table
*Letter*with one column*newsletterId*.

The web administrator has two functions:

(1) *subscribe*(*u*,*nl*)

(2) *unsubscribe*(*u*,*nl*)

The function *subscribe*(*u*,*nl*) adds the pair (*u*,*nl*) consisting of a user u and a newsletter nl to the Subscriptions table. The function *unsubscribe*(*u*,*nl*) removes the pair (*u*,*nl*) from the Subscriptions table.

We test each function both with a correct specification and an incorrect specification. The correct specification we use for *subscribe* and *unsubscribe* is the same:

Precondition:

∃ x ((x ≠ nl) ∧ ∀ y (Users(y) → Subscriptions(x, y)))Postcondition:

∀ y (Users(y) → ∃ x (Subscriptions(x, y)))

This specification asserts that if there is a newsletter — which is not nl — to which all users are subscribed before the function is executed, then afterwards every user is subscribed to some newsletter.

The incorrect specification of *subscribe* is:

Precondition:

∃ x (∀ y (∃ x Subscriptions(x,y)) → Subscriptions(x,y)) ∧

∀ x ∀ y (Subscriptions(x,y) → User(y)) ∧

∀ x ∀ y (Subscriptions(x,y) → Letter(x)) ∧

∀ y (User(y) → (∃ x Subscriptions(x,y))) ∧

∀ x (Letter(x) → (∃ y Subscriptions(x,y)))Postcondition:

∃ x (∀ y (∃ x Subscriptions(x,y)) → Subscriptions(x,y))

The incorrect specification of *unsubscribe* is:

Precondition:

∀ y (Users(y) → ∃ x Subscriptions(x,y)) ∧

∀ x ∀ y (Subscriptions(x,y) → Users(y)) ∧

∃ x (∀ y (Users(y) → Subscriptions(x,y)))Postcondition:

∃ x (∀ y (Users(y) → Subscriptions(x,y)))

### programs

subscribe with correct specification

subscribe with incorrect specification

unsubscribe with correct specification

unsubscribe with incorrect specification

### smt2 files

Using our VC generator we created the verification conditions for the functions. Additionally, we generated smt2 files which test simultaneously the specifications of 10 and 100 very similar functions (differing only on the input arguments) as a crude test of scalability of our solver.

*Correct specifications (unsatisfiable smt2 file):*

subscribe

subscribe-x10

subscribe-x100

unsubscribe

unsubscribe-x10

unsubscribe-x100

*Incorrect specifications (satisfiable smt2 file):*

subscribe

subscribe-x10

subscribe-x100

unsubscribe

unsubscribe-x10

unsubscribe-x100

## Firewall

In this example we verify the correct behavior of one function in a simple firewall. The firewall is provided with a database consisting of two tables: *Device* and *CanSend*.

- The table
*Device*consists of a single column*deviceId*. - The table
*CanSend*consists of two columns*senderId*and*receiverId*. The*CanSend*table determines whether device A is allowed to send to device B.

The *delete-device*(*device_to_delete*) function is executed when *device_to_delete* is removed from the network. The function removes *device_to_delete* from CanSend (both as a sender and as a receiver).

We provide two specifications for *delete-device*, a correct one and an incorrect one. Both specifications have the same postcondition, which says that there exists a device to which all other devices — apart from *device-to-delete* — can send:

Postcondition:

∃ x (Devices(x) ∧ ∀ y (((y ≠ device_to_delete) ∧ Devices(y)) → CanSend(y,x)))

The precondition of the correct specification guarantees that there exists a device different from *device-to-delete* to which all devices can send:

Precondition (correct):

∃ x (((x ≠ device_to_delete)) ∧ (Devices(x)) ∧ ∀ y (Devices(y) → CanSend(y,x)))

The precondition of the incorrect specification guarantees (among other things) that there exists a device to which all devices can send, but it might not be different from *device-to-delete*:

Precondition (incorrect):

∀ x ∀ y (CanSend(y,x) → Devices(y)) ∧

∀ x ∀ x (CanSend(y,x) → Devices(x)) ∧

∀ y (Devices(y)→(∃ x (CanSend(x,y)))) ∧

∃ y (Devices(y) ∧ (∀ x (Devices(x) → (CanSend(x,y)))))

### programs

delete-device with correct specification

delete-device with incorrect specification

### smt2 files

Like the case of the web administrator, we generated the verification conditions for the functions and the inflated examples.

*Correct specifications (unsatisfiable smt2 file):*

firewall

firewall-x10

firwall-x100

*Incorrect specifications (satisfiable smt2 file):*

firewall

firewall-x10

firewall-x100

## Conference management system

In this example we verify parts of a system for conference management which assigns reviewers to papers and records the reviews and acceptance/rejection decisions. We focus on the earlier parts of the reviewing process: First, potential reviewers (e.g. PC members) bid on papers to review. Based on the bids, reviewers are assigned to the papers (e.g. by the PC chair). An additional functionality of the system that we focus on is displaying the list of papers by a specific author.

The database contains the following tables and columns:

*Papers*with columns*paperId*,*title*,*status*, and*session*. The column*status*ranges over the bounded domain consisting of three values 0:3, 1:3, and 2:3 interpreted as undecided, accepted, and rejected respectively. The column*session*ranges over the bounded domain consisting of 5 values interpreted as blank(0:5), invited (1:5), null (2:5), contributed session A (3:5), and contributed session B (4:5).*Authors*with columns*userId*and*paperId*};*ReviewerBids*with columns*userId*and*paperId*;*ReviewerAssignments*with columns*userId*and*paperId*;*Conflicts*with columns*userId*and*paperId*.

The columns *userId* and *paperId* range over the unbounded domain. The key of Papers is *paperId*. The other tables have a many to many relationship between *userId*and *paperId* attesting respectively to the fact that the user is the author of the paper, the user has bid to review the paper, the user has been assigned to review the paper, or the user is in conflict with the paper (and therefore cannot review it).

Before the bidding process begins, all papers are assigned the status undecided and the session invited (for an invited paper) or null (for a contributed submission). The session value blank comes up in the display function, and is at the root of a bug in the program.

The function *bid* registers that the user *currentUser* is willing to review the paper *bidId* with the sanity check that there is no conflict between the user and the paper.

The function *assign* registers that the user *reviewer* is assigned to review the paper *assignId*.

The function *display* receives as input the user id *currentUser* and returns the list of papers by *currentUser* that should be displayed. the function *display* is supposed to be safe to use even if the review phase of the conference is not yet completed, so *display* removes the session values of contributed papers from the output. This is done to prevent leaking the information that a contributed paper has been accepted (since only accepted papers have sessions) before the status of the paper has been announced. *display* leaves the status value $\mathit{invited}$ visible.

We have two versions of *display*: one correct and one incorrect.

- The incorrect version display-incorrect leaves the session value
*null*unchanged. Since*null*and*blank*are different values, the information leak which the program tries to avoid is still present. - The correct version display-correct replaces the status
*null*by*blank*. This is done by expanding the WHERE condition of the UPDATE.

The database preserves two invariants:

Inv_{1}= ∀_{D}x ∀_{D}y ReviewerBids(x,y) → ~Conflicts(x,y)Inv_{2}= ∀_{D}x ∀_{D}y ReviewerAssignments(x,y) → ~Conflicts(x,y)

These invariants state that no user may bid or be assigned to review a paper with which they are in conflict.

The specification of both versions of *display* is as follows:

Precondition:

Inv_{1}∧ Inv_{2}Postcondition:

Inv_{1}∧ Inv_{2}∧

∀_{D}x ∀_{D}y Out(x,y) → (y = blank ∨ y = invited)

This specification holds for *display-correct* and does not hold for *display-incorrect*.

For *bid* and *assign* we provide one version each with two specifications, one correct and one incorrect. The correct specification is as follows:

Precondition of bid and postconditions of bid and assign:

Inv_{1}∧ Inv_{2}Precondition of assign

Inv_{1}∧ Inv_{2}∧ ReviewerBids(reviewer, assignId)

In order to ensure that Inv_{2} is preserved by *assign*, we only allow a reviewer assignment to occur if there was a corresponding reviewer bid. Reviewer bids are required to avoid the conflicts by Inv_{1}, and thus Inv_{2} is preserved. The incorrect specification for *bid* and *assign* is as follows:

Precondition of bid:

Inv_{2}Precondition of assign:

Inv_{2}∧ ReviewerBids(reviewer, assignId)Postconditions of bid and assign:

Inv_{1}∧ Inv_{2}

It is obtained from the correct specification by omitting Inv_{1} from the preconditions.

### programs

display with correct specification

display with incorrect specification

bid with correct specification

bid with incorrect specification

assign with correct specification

assign with incorrect specification

### smt2 files

Using our tool we generated the verification conditions for the functions:

*Correct specifications (unsatisfiable smt2 file):*

assign

bid

display

*Incorrect specifications (satisfiable smt2 file):*

assign

bid

display

Note that although the specifications are not in FO^{2} — but rather in FO_{BD}, FO^{2} with bounded domains — the result of applying our SmpSL VC generator tool is a sentence in FO^{2<\sup> (in smt2 format). }