Benchmarks for the two-variable fragment

This page contains benchmarks for solvers of finite satisfiability problems of the two-variable fragment of first-order logic used to test out tool FO²-solver in our preprint arXiv:1610.02101. Some of the benchmarks derive from the verification problem of script programs with SQL statements. Other were motivated by notions in graph theory. Others still test the behavior of FO²-sentences with only one variable. In the preprint arXiv:1610.02101, the verification benchmarks are treated in Tables 1 and 2, and the graph-theoretic benchmarks and the one-variable benchmarks are treated in Table 3.

Graph-theoretic benchmarks (graph-theoretic-benchmark.zip)

The graph-theoretic benchmarks are generated from python scripts linked below with a size parameter.
Each of the benchmarks can be instantiated for any chosen size.

2-colorings

The 2colX.smt2 benchmark attempts to construct a graph G = (V,E) whose vertices are partitioned into X non-empty and disjoint parts U0,…,UX-1 such that
(1) there are no edges inside each part
(2) every vertex in Ui has an outgoing edge into Ui+1 (where i+1 is taken modulo X), and
(3) the set A induces a proper 2-coloring of G.

2colX.smt2 is satisfiable iff X is even. This is true since G contains a cycle of length X, and all cycles are of length X, and a graph as 2-colorable iff it contains no odd cycle.

smt2 files: 2, 3, 4, 5, 6, 10, 11
Python script

Paths

This benchmark constructs a graph G = (V,E) whose vertices may be labeled by an unary relation U
with unlabeled paths of length size between labeled vertices. The paths need not be simple. The sentence is the conjunction of:

∃ x (U(x))
∀ x ∃ y (B(x,y))
∀ x (U(x) → αsize(x))

where

αi(x) = ∃ y (B(x,y) ∧ ¬ U(x) ∧ ∀ x (B(x,y) → (¬ U(x) ∧ αi-1(x))))
α0(x) = ∃ y (B(x,y) ∧ U(y))

The sentence is satisfiable.

smt2 files: 10, 100
Python script

Simple paths

This benchmark constructs a graph G = (V,E) whose vertices are partitioned into 2*size+2 non-empty disjoint sets U0,…,U2*size+1 such that all edges go from a vertex in Ui to a vertex in Ui+1 (modulo 2*size+2) and such that every vertex participates in a simple path. Moreover, the graph satisfies the sentence ∀ x (U0(x) → β0(x)), where

βi(x) = ∃ y (B(x,y) ∧ U2*i+1(y) ∧ ∀ x (B(y,x) → (U2*i+2(y) ∧ βi+1(x))))
βsize(x) = ∃ y (B(x,y) ∧ U2*size+1(y))

The sentence is satisfiable.

smt2 files: 2, 3, 4, 7, 10
Python script

Exponential graph

This benchmark constructs a graph G = (V,E) whose size is exponential in the length of the formula. The unary relations U0,…,Usize-1 are subsets of the set of vertices V. For every vertex v and every Ui, there is u which agrees with v on all the Uj, j≠i, and disagrees with v on Ui. Any maximally consistent conjunction of U0,…,Usize-1,¬U0,…,¬Usize-1 is non-empty. Hence, G has at least 2size vertices.

The sentence is satisfiable.

smt2 files: 2, 3, 4
Python script

Impossible path

This benchmark constructs a graph G = (V,E) whose vertices are partitioned into size non-empty disjoint sets U0,…,Usize-1 such that all edges go from a vertex in Ui to a vertex in Ui+1 (modulo size) and such that every vertex participates in a simple path.
In addition, the benchmark makes the following impossible requirement: all the vertices of U0 belong to an unary relation A, which moreover propagates along graph edges, but none of the vertices of Usize-1 belong to A.

The sentence is unsatisfiable.

smt2 files: 2, 3, 6
Python script

One variable (one-var-benchmarks.zip)

These benchmarks use only the variable x. The main challenge they present is in their large nesting of quantifiers. They are generated from python scripts linked below with a parameter propostional to the number of quantifiers. Each of the benchmarks can be instantiated for any chosen size.

Existential quantifiers

The sentence consists of an atomic formula inside a block of existential quantifiers:

∃ x ∃ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 300
Python script

Universal quantifiers

The sentence consists of an atomic formula inside a block of universal quantifiers:

∀ x ∀ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 500
Python script

Alternating quantifiers 1

The sentence consists of an atomic formula inside a block of alternating quantifiers:

∀ x ∃ x ∀ x ∃ x … U(x)

The sentence is satisfiable.

smt2 files: 5, 300
Python script

Alternating quantifiers 2

The sentence consists of a contradiction inside a block of alternating quantifiers:

∀ x ∃ x ∀ x ∃ x … (U(x) ∧ ¬U(x))

The sentence is unsatisfiable.

smt2 files: 5, 300
Python script

Program-based benchmarks (verification-benchmarks.zip)

These benchmarks are generated based on programs in SmpSQL and their specifications in FO2 using our SmpSL Verification Conditions Generator tool.

Simplified web administrator

In this example we verify two functions of a simple web administrator. The web administrator has a database
with tables as follows:

  • The table Subscriptions with two columns, newsletterId and userId. A row (nl,usr) in the table indicates that the user usr is subscribed to the newsletter nl.
  • The table Users with one column userId.
  • The table Letter with one column newsletterId.

The web administrator has two functions:
(1) subscribe(u,nl)
(2) unsubscribe(u,nl)

The function subscribe(u,nl) adds the pair (u,nl) consisting of a user u and a newsletter nl to the Subscriptions table. The function unsubscribe(u,nl) removes the pair (u,nl) from the Subscriptions table.

We test each function both with a correct specification and an incorrect specification. The correct specification we use for subscribe and unsubscribe is the same:

Precondition:
∃ x ((x ≠ nl) ∧ ∀ y (Users(y) → Subscriptions(x, y)))
 
Postcondition:
∀ y (Users(y) → ∃ x (Subscriptions(x, y)))

This specification asserts that if there is a newsletter — which is not nl — to which all users are subscribed before the function is executed, then afterwards every user is subscribed to some newsletter.

The incorrect specification of subscribe is:

Precondition:
∃ x (∀ y (∃ x Subscriptions(x,y)) → Subscriptions(x,y)) ∧
∀ x ∀ y (Subscriptions(x,y) → User(y)) ∧
∀ x ∀ y (Subscriptions(x,y) → Letter(x)) ∧
∀ y (User(y) → (∃ x Subscriptions(x,y))) ∧
∀ x (Letter(x) → (∃ y Subscriptions(x,y)))
 
Postcondition:
∃ x (∀ y (∃ x Subscriptions(x,y)) → Subscriptions(x,y))

The incorrect specification of unsubscribe is:

Precondition:
∀ y (Users(y) → ∃ x Subscriptions(x,y)) ∧
∀ x ∀ y (Subscriptions(x,y) → Users(y)) ∧
∃ x (∀ y (Users(y) → Subscriptions(x,y)))
 
Postcondition:
∃ x (∀ y (Users(y) → Subscriptions(x,y)))

programs

subscribe with correct specification
subscribe with incorrect specification

unsubscribe with correct specification
unsubscribe with incorrect specification

smt2 files

Using our VC generator we created the verification conditions for the functions. Additionally, we generated smt2 files which test simultaneously the specifications of 10 and 100 very similar functions (differing only on the input arguments) as a crude test of scalability of our solver.

Correct specifications (unsatisfiable smt2 file):
subscribe
subscribe-x10
subscribe-x100
unsubscribe
unsubscribe-x10
unsubscribe-x100

Incorrect specifications (satisfiable smt2 file):
subscribe
subscribe-x10
subscribe-x100
unsubscribe
unsubscribe-x10
unsubscribe-x100

Firewall

In this example we verify the correct behavior of one function in a simple firewall. The firewall is provided with a database consisting of two tables: Device and CanSend.

  • The table Device consists of a single column deviceId.
  • The table CanSend consists of two columns senderId and receiverId. The CanSend table determines whether device A is allowed to send to device B.

The delete-device(device_to_delete) function is executed when device_to_delete is removed from the network. The function removes device_to_delete from CanSend (both as a sender and as a receiver).

We provide two specifications for delete-device, a correct one and an incorrect one. Both specifications have the same postcondition, which says that there exists a device to which all other devices — apart from device-to-delete — can send:

Postcondition:
∃ x (Devices(x) ∧ ∀ y (((y ≠ device_to_delete) ∧ Devices(y)) → CanSend(y,x)))

The precondition of the correct specification guarantees that there exists a device different from device-to-delete to which all devices can send:

Precondition (correct):
∃ x (((x ≠ device_to_delete)) ∧ (Devices(x)) ∧ ∀ y (Devices(y) → CanSend(y,x)))

The precondition of the incorrect specification guarantees (among other things) that there exists a device to which all devices can send, but it might not be different from device-to-delete:

Precondition (incorrect):
∀ x ∀ y (CanSend(y,x) → Devices(y)) ∧
∀ x ∀ x (CanSend(y,x) → Devices(x)) ∧
∀ y (Devices(y)→(∃ x (CanSend(x,y)))) ∧
∃ y (Devices(y) ∧ (∀ x (Devices(x) → (CanSend(x,y)))))

programs

delete-device with correct specification
delete-device with incorrect specification

smt2 files

Like the case of the web administrator, we generated the verification conditions for the functions and the inflated examples.

Correct specifications (unsatisfiable smt2 file):
firewall
firewall-x10
firwall-x100

Incorrect specifications (satisfiable smt2 file):
firewall
firewall-x10
firewall-x100

Conference management system

In this example we verify parts of a system for conference management which assigns reviewers to papers and records the reviews and acceptance/rejection decisions. We focus on the earlier parts of the reviewing process: First, potential reviewers (e.g. PC members) bid on papers to review. Based on the bids, reviewers are assigned to the papers (e.g. by the PC chair). An additional functionality of the system that we focus on is displaying the list of papers by a specific author.

The database contains the following tables and columns:

  • Papers with columns paperId, title, status, and session. The column status ranges over the bounded domain consisting of three values 0:3, 1:3, and 2:3 interpreted as undecided, accepted, and rejected respectively. The column session ranges over the bounded domain consisting of 5 values interpreted as blank(0:5), invited (1:5), null (2:5), contributed session A (3:5), and contributed session B (4:5).
  • Authors with columns userId and paperId};
  • ReviewerBids with columns userId and paperId;
  • ReviewerAssignments with columns userId and paperId;
  • Conflicts with columns userId and paperId.

The columns userId and paperId range over the unbounded domain. The key of Papers is paperId. The other tables have a many to many relationship between userIdand paperId attesting respectively to the fact that the user is the author of the paper, the user has bid to review the paper, the user has been assigned to review the paper, or the user is in conflict with the paper (and therefore cannot review it).

Before the bidding process begins, all papers are assigned the status undecided and the session invited (for an invited paper) or null (for a contributed submission). The session value blank comes up in the display function, and is at the root of a bug in the program.

The function bid registers that the user currentUser is willing to review the paper bidId with the sanity check that there is no conflict between the user and the paper.

The function assign registers that the user reviewer is assigned to review the paper assignId.

The function display receives as input the user id currentUser and returns the list of papers by currentUser that should be displayed. the function display is supposed to be safe to use even if the review phase of the conference is not yet completed, so display removes the session values of contributed papers from the output. This is done to prevent leaking the information that a contributed paper has been accepted (since only accepted papers have sessions) before the status of the paper has been announced. display leaves the status value $\mathit{invited}$ visible.

We have two versions of display: one correct and one incorrect.

  • The incorrect version display-incorrect leaves the session value null unchanged. Since null and blank are different values, the information leak which the program tries to avoid is still present.
  • The correct version display-correct replaces the status null by blank. This is done by expanding the WHERE condition of the UPDATE.

The database preserves two invariants:

Inv1 = ∀D x ∀D y ReviewerBids(x,y) → ~Conflicts(x,y)
Inv2 = ∀D x ∀Dy ReviewerAssignments(x,y) → ~Conflicts(x,y)

These invariants state that no user may bid or be assigned to review a paper with which they are in conflict.

The specification of both versions of display is as follows:

Precondition:
Inv1 ∧ Inv2
 
Postcondition:
Inv1 ∧ Inv2
D x ∀D y Out(x,y) → (y = blank ∨ y = invited)

This specification holds for display-correct and does not hold for display-incorrect.

For bid and assign we provide one version each with two specifications, one correct and one incorrect. The correct specification is as follows:

Precondition of bid and postconditions of bid and assign:
Inv1 ∧ Inv2
 
Precondition of assign
Inv1 ∧ Inv2 ∧ ReviewerBids(reviewer, assignId)

In order to ensure that Inv2 is preserved by assign, we only allow a reviewer assignment to occur if there was a corresponding reviewer bid. Reviewer bids are required to avoid the conflicts by Inv1, and thus Inv2 is preserved. The incorrect specification for bid and assign is as follows:

Precondition of bid:
Inv2
 
Precondition of assign:
Inv2 ∧ ReviewerBids(reviewer, assignId)
 
Postconditions of bid and assign:
Inv1 ∧ Inv2

It is obtained from the correct specification by omitting Inv1 from the preconditions.

programs

display with correct specification
display with incorrect specification

bid with correct specification
bid with incorrect specification

assign with correct specification
assign with incorrect specification

smt2 files

Using our tool we generated the verification conditions for the functions:
Correct specifications (unsatisfiable smt2 file):
assign
bid
display

Incorrect specifications (satisfiable smt2 file):
assign
bid
display

Note that although the specifications are not in FO2 — but rather in FOBD, FO2 with bounded domains — the result of applying our SmpSL VC generator tool is a sentence in FO2<\sup> (in smt2 format).

Latest News

Helmut Veith Stipend 2017: Deadline Extension (November 30)

The application deadline for the Helmut Veith Stipend 2017 has been extended to November 30. The stipend is dedicated to the memory of an outstanding computer scientist who worked in the fields of logic in computer science, computer-aided verification, software engineering, and computer security. We encourage all female master’s students attending (or planning to attend) […]

Continue reading

Helmut Veith Stipend 2017

Outstanding female students in the field of computer science who pursue (or plan to pursue) one of the master‘s programs in Computer Science at TU Wien taught in English are invited to apply for the Helmut Veith Stipend

Continue reading

Helmut Veith Stipend

The first recipient of the Helmut Veith Stipend for excellent female master’s students in computer science will be presented on March 14 at the following event: "More female students in computer science. Who cares?" Panel discussion with renowned scientists about diversity in STEM Studies March 14, 5:30pm, TU Wien The Helmut Veith Stipend is dedicated […]

Continue reading

WAIT 2016 in Vienna

The third WAIT workshop on induction is held between 17-18 November at the TU Wien. Details are available on the workshop page.

Continue reading

Full news archive