# Benchmarks for the two-variable fragment

This page contains benchmarks for solvers of finite satisfiability problems of the two-variable fragment of first-order logic used to test out tool FO²-solver in our preprint arXiv:1610.02101. Some of the benchmarks derive from the verification problem of script programs with SQL statements. Other were motivated by notions in graph theory. Others still test the behavior of FO²-sentences with only one variable. In the preprint arXiv:1610.02101, the verification benchmarks are treated in Tables 1 and 2, and the graph-theoretic benchmarks and the one-variable benchmarks are treated in Table 3.

# Graph-theoretic benchmarks (graph-theoretic-benchmark.zip)

The graph-theoretic benchmarks are generated from python scripts linked below with a size parameter.
Each of the benchmarks can be instantiated for any chosen size.

## 2-colorings

The 2colX.smt2 benchmark attempts to construct a graph G = (V,E) whose vertices are partitioned into X non-empty and disjoint parts U0,…,UX-1 such that
(1) there are no edges inside each part
(2) every vertex in Ui has an outgoing edge into Ui+1 (where i+1 is taken modulo X), and
(3) the set A induces a proper 2-coloring of G.

2colX.smt2 is satisfiable iff X is even. This is true since G contains a cycle of length X, and all cycles are of length X, and a graph as 2-colorable iff it contains no odd cycle.

smt2 files: 2, 3, 4, 5, 6, 10, 11
Python script

## Paths

This benchmark constructs a graph G = (V,E) whose vertices may be labeled by an unary relation U
with unlabeled paths of length size between labeled vertices. The paths need not be simple. The sentence is the conjunction of:

∃ x (U(x))
∀ x ∃ y (B(x,y))
∀ x (U(x) → αsize(x))

where

αi(x) = ∃ y (B(x,y) ∧ ¬ U(x) ∧ ∀ x (B(x,y) → (¬ U(x) ∧ αi-1(x))))
α0(x) = ∃ y (B(x,y) ∧ U(y))

The sentence is satisfiable.

smt2 files: 10, 100
Python script

## Simple paths

This benchmark constructs a graph G = (V,E) whose vertices are partitioned into 2*size+2 non-empty disjoint sets U0,…,U2*size+1 such that all edges go from a vertex in Ui to a vertex in Ui+1 (modulo 2*size+2) and such that every vertex participates in a simple path. Moreover, the graph satisfies the sentence ∀ x (U0(x) → β0(x)), where

βi(x) = ∃ y (B(x,y) ∧ U2*i+1(y) ∧ ∀ x (B(y,x) → (U2*i+2(y) ∧ βi+1(x))))
βsize(x) = ∃ y (B(x,y) ∧ U2*size+1(y))

The sentence is satisfiable.

smt2 files: 2, 3, 4, 7, 10
Python script

## Exponential graph

This benchmark constructs a graph G = (V,E) whose size is exponential in the length of the formula. The unary relations U0,…,Usize-1 are subsets of the set of vertices V. For every vertex v and every Ui, there is u which agrees with v on all the Uj, j≠i, and disagrees with v on Ui. Any maximally consistent conjunction of U0,…,Usize-1,¬U0,…,¬Usize-1 is non-empty. Hence, G has at least 2size vertices.

The sentence is satisfiable.

smt2 files: 2, 3, 4
Python script

## Impossible path

This benchmark constructs a graph G = (V,E) whose vertices are partitioned into size non-empty disjoint sets U0,…,Usize-1 such that all edges go from a vertex in Ui to a vertex in Ui+1 (modulo size) and such that every vertex participates in a simple path.
In addition, the benchmark makes the following impossible requirement: all the vertices of U0 belong to an unary relation A, which moreover propagates along graph edges, but none of the vertices of Usize-1 belong to A.

The sentence is unsatisfiable.

smt2 files: 2, 3, 6
Python script

# One variable (one-var-benchmarks.zip)

These benchmarks use only the variable x. The main challenge they present is in their large nesting of quantifiers. They are generated from python scripts linked below with a parameter propostional to the number of quantifiers. Each of the benchmarks can be instantiated for any chosen size.

## Existential quantifiers

The sentence consists of an atomic formula inside a block of existential quantifiers:

∃ x ∃ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 300
Python script

## Universal quantifiers

The sentence consists of an atomic formula inside a block of universal quantifiers:

∀ x ∀ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 500
Python script

## Alternating quantifiers 1

The sentence consists of an atomic formula inside a block of alternating quantifiers:

∀ x ∃ x ∀ x ∃ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 300
Python script

## Alternating quantifiers 2

The sentence consists of a contradiction inside a block of alternating quantifiers:

∀ x ∃ x ∀ x ∃ x … (U(x) ∧ ¬U(x))

The sentence is unsatisfiable.

smt2 files: 5, 300
Python script

# Program-based benchmarks (verification-benchmarks.zip)

These benchmarks are generated based on programs in SmpSQL and their specifications in FO2 using our SmpSL Verification Conditions Generator tool.

In this example we verify two functions of a simple web administrator. The web administrator has a database
with tables as follows:

• The table Subscriptions with two columns, newsletterId and userId. A row (nl,usr) in the table indicates that the user usr is subscribed to the newsletter nl.
• The table Users with one column userId.
• The table Letter with one column newsletterId.

The web administrator has two functions:
(1) subscribe(u,nl)
(2) unsubscribe(u,nl)

The function subscribe(u,nl) adds the pair (u,nl) consisting of a user u and a newsletter nl to the Subscriptions table. The function unsubscribe(u,nl) removes the pair (u,nl) from the Subscriptions table.

We test each function both with a correct specification and an incorrect specification. The correct specification we use for subscribe and unsubscribe is the same:

Precondition:
∃ x ((x ≠ nl) ∧ ∀ y (Users(y) → Subscriptions(x, y)))

Postcondition:
∀ y (Users(y) → ∃ x (Subscriptions(x, y)))

This specification asserts that if there is a newsletter — which is not nl — to which all users are subscribed before the function is executed, then afterwards every user is subscribed to some newsletter.

The incorrect specification of subscribe is:

Precondition:
∃ x (∀ y (∃ x Subscriptions(x,y)) → Subscriptions(x,y)) ∧
∀ x ∀ y (Subscriptions(x,y) → User(y)) ∧
∀ x ∀ y (Subscriptions(x,y) → Letter(x)) ∧
∀ y (User(y) → (∃ x Subscriptions(x,y))) ∧
∀ x (Letter(x) → (∃ y Subscriptions(x,y)))

Postcondition:
∃ x (∀ y (∃ x Subscriptions(x,y)) → Subscriptions(x,y))

The incorrect specification of unsubscribe is:

Precondition:
∀ y (Users(y) → ∃ x Subscriptions(x,y)) ∧
∀ x ∀ y (Subscriptions(x,y) → Users(y)) ∧
∃ x (∀ y (Users(y) → Subscriptions(x,y)))

Postcondition:
∃ x (∀ y (Users(y) → Subscriptions(x,y)))

### smt2 files

Using our VC generator we created the verification conditions for the functions. Additionally, we generated smt2 files which test simultaneously the specifications of 10 and 100 very similar functions (differing only on the input arguments) as a crude test of scalability of our solver.

Correct specifications (unsatisfiable smt2 file):
subscribe
subscribe-x10
subscribe-x100
unsubscribe
unsubscribe-x10
unsubscribe-x100

Incorrect specifications (satisfiable smt2 file):
subscribe
subscribe-x10
subscribe-x100
unsubscribe
unsubscribe-x10
unsubscribe-x100

## Firewall

In this example we verify the correct behavior of one function in a simple firewall. The firewall is provided with a database consisting of two tables: Device and CanSend.

• The table Device consists of a single column deviceId.
• The table CanSend consists of two columns senderId and receiverId. The CanSend table determines whether device A is allowed to send to device B.

The delete-device(device_to_delete) function is executed when device_to_delete is removed from the network. The function removes device_to_delete from CanSend (both as a sender and as a receiver).

We provide two specifications for delete-device, a correct one and an incorrect one. Both specifications have the same postcondition, which says that there exists a device to which all other devices — apart from device-to-delete — can send:

Postcondition:
∃ x (Devices(x) ∧ ∀ y (((y ≠ device_to_delete) ∧ Devices(y)) → CanSend(y,x)))

The precondition of the correct specification guarantees that there exists a device different from device-to-delete to which all devices can send:

Precondition (correct):
∃ x (((x ≠ device_to_delete)) ∧ (Devices(x)) ∧ ∀ y (Devices(y) → CanSend(y,x)))

The precondition of the incorrect specification guarantees (among other things) that there exists a device to which all devices can send, but it might not be different from device-to-delete:

Precondition (incorrect):
∀ x ∀ y (CanSend(y,x) → Devices(y)) ∧
∀ x ∀ x (CanSend(y,x) → Devices(x)) ∧
∀ y (Devices(y)→(∃ x (CanSend(x,y)))) ∧
∃ y (Devices(y) ∧ (∀ x (Devices(x) → (CanSend(x,y)))))

### smt2 files

Like the case of the web administrator, we generated the verification conditions for the functions and the inflated examples.

Correct specifications (unsatisfiable smt2 file):
firewall
firewall-x10
firwall-x100

Incorrect specifications (satisfiable smt2 file):
firewall
firewall-x10
firewall-x100

## Conference management system

In this example we verify parts of a system for conference management which assigns reviewers to papers and records the reviews and acceptance/rejection decisions. We focus on the earlier parts of the reviewing process: First, potential reviewers (e.g. PC members) bid on papers to review. Based on the bids, reviewers are assigned to the papers (e.g. by the PC chair). An additional functionality of the system that we focus on is displaying the list of papers by a specific author.

The database contains the following tables and columns:

• Papers with columns paperId, title, status, and session. The column status ranges over the bounded domain consisting of three values 0:3, 1:3, and 2:3 interpreted as undecided, accepted, and rejected respectively. The column session ranges over the bounded domain consisting of 5 values interpreted as blank(0:5), invited (1:5), null (2:5), contributed session A (3:5), and contributed session B (4:5).
• Authors with columns userId and paperId};
• ReviewerBids with columns userId and paperId;
• ReviewerAssignments with columns userId and paperId;
• Conflicts with columns userId and paperId.

The columns userId and paperId range over the unbounded domain. The key of Papers is paperId. The other tables have a many to many relationship between userIdand paperId attesting respectively to the fact that the user is the author of the paper, the user has bid to review the paper, the user has been assigned to review the paper, or the user is in conflict with the paper (and therefore cannot review it).

Before the bidding process begins, all papers are assigned the status undecided and the session invited (for an invited paper) or null (for a contributed submission). The session value blank comes up in the display function, and is at the root of a bug in the program.

The function bid registers that the user currentUser is willing to review the paper bidId with the sanity check that there is no conflict between the user and the paper.

The function assign registers that the user reviewer is assigned to review the paper assignId.

The function display receives as input the user id currentUser and returns the list of papers by currentUser that should be displayed. the function display is supposed to be safe to use even if the review phase of the conference is not yet completed, so display removes the session values of contributed papers from the output. This is done to prevent leaking the information that a contributed paper has been accepted (since only accepted papers have sessions) before the status of the paper has been announced. display leaves the status value \$\mathit{invited}\$ visible.

We have two versions of display: one correct and one incorrect.

• The incorrect version display-incorrect leaves the session value null unchanged. Since null and blank are different values, the information leak which the program tries to avoid is still present.
• The correct version display-correct replaces the status null by blank. This is done by expanding the WHERE condition of the UPDATE.

The database preserves two invariants:

Inv1 = ∀D x ∀D y ReviewerBids(x,y) → ~Conflicts(x,y)
Inv2 = ∀D x ∀Dy ReviewerAssignments(x,y) → ~Conflicts(x,y)

These invariants state that no user may bid or be assigned to review a paper with which they are in conflict.

The specification of both versions of display is as follows:

Precondition:
Inv1 ∧ Inv2

Postcondition:
Inv1 ∧ Inv2
D x ∀D y Out(x,y) → (y = blank ∨ y = invited)

This specification holds for display-correct and does not hold for display-incorrect.

For bid and assign we provide one version each with two specifications, one correct and one incorrect. The correct specification is as follows:

Precondition of bid and postconditions of bid and assign:
Inv1 ∧ Inv2

Precondition of assign
Inv1 ∧ Inv2 ∧ ReviewerBids(reviewer, assignId)

In order to ensure that Inv2 is preserved by assign, we only allow a reviewer assignment to occur if there was a corresponding reviewer bid. Reviewer bids are required to avoid the conflicts by Inv1, and thus Inv2 is preserved. The incorrect specification for bid and assign is as follows:

Precondition of bid:
Inv2

Precondition of assign:
Inv2 ∧ ReviewerBids(reviewer, assignId)

Postconditions of bid and assign:
Inv1 ∧ Inv2

It is obtained from the correct specification by omitting Inv1 from the preconditions.

### smt2 files

Using our tool we generated the verification conditions for the functions:
Correct specifications (unsatisfiable smt2 file):
assign
bid
display

Incorrect specifications (satisfiable smt2 file):
assign
bid
display

Note that although the specifications are not in FO2 — but rather in FOBD, FO2 with bounded domains — the result of applying our SmpSL VC generator tool is a sentence in FO2<\sup> (in smt2 format).

## Katalin Fazekas receives Hertha Firnberg Fellowship

Katalin Fazekas received a Hertha Firnberg Fellowship and will (re-)join FORSYTE to perform research on Incremental SAT and SMT Reasoning for Scalable Verification. Congratulations!

## Jens Pagel wins Bill McCune PhD Award

We congratulate Jens Pagel for receiving the 2021 Bill McCune PhD Award in Automated Reasoning! Jens graduated in 2020; his thesis on Decision procedures for separation logic: beyond symbolic heaps (supervised by Florian Zuleger) presents his substantial contributions to the theory of formal verification and automated reasoning, and to verifying heap-manipulating programs in particular.

## Thanh-Hai Tran, Igor Konnov and Josef Widder win FORTE21 Best Paper Award

Thanh-Hai Tran, Igor Konnov and Josef Widder received the Best Paper Award at FORTE21 for their Case Study on Parametric Verification of Failure Detectors. Congratulations!

## Emmanuel Pescosta nominated for EPILOG Young Alumnus Award

Congratulations to FORSYTE alumnus Emmanuel Pescosta, who got nominated for the Distinguished Young Alumnus award for his master’s thesis on Bounded Model Checking of Speculative Non-Interference.

## ERC Consolidator Grant 2020 for Laura Kovacs

Laura Kovács has been awarded with an ERC Consolidator Grant 2020, for her project “ARTIST: Automated Reasoning with Theories and Induction for Software Technology”.