A Two-Watched Literal Scheme for First-Order Logic

Christoph Weidenbach; Lorenz Leutgeb; Martin Bromberger; Simon Schwarz; Tobias Gehl; Yasmine Briefs

arxiv: 2605.21335 · v1 · pith:G7YM22CAnew · submitted 2026-05-20 · 💻 cs.LO

A Two-Watched Literal Scheme for First-Order Logic

Yasmine Briefs , Martin Bromberger , Tobias Gehl , Lorenz Leutgeb , Simon Schwarz , Christoph Weidenbach This is my paper

Pith reviewed 2026-05-21 03:20 UTC · model grok-4.3

classification 💻 cs.LO

keywords first-order logictwo-watched literalsclause propagationCDCLsatisfiabilityground literalsfirst-order clausesconflict detection

0 comments

The pith

The two-watched literal scheme extends to first-order logic clauses to efficiently detect propagations and conflicts from ground literals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper adapts the two-watched literal technique, familiar from fast SAT solvers, to first-order logic. The adaptation lets a solver spot which clauses become true or false when some ground facts are known, without checking every literal in every clause. A sympathetic reader would care because first-order reasoning often deals with large clause sets where naive scanning is too slow. The authors give the method as a set of rules that they prove sound and complete, and they show an implementation that beats a standard dynamic-programming check especially on long clauses.

Core claim

Given a set of first-order clauses and a set of ground literals, the lifted two-watched literal scheme efficiently detects all propagating and false clauses with respect to the ground literals. The algorithm is presented as a system of rules and proved sound and complete. An implementation of the scheme outperforms a standard dynamic programming approach for detecting propagatable literals and conflicts, especially when dealing with long clauses.

What carries the argument

The lifted two-watched literal scheme, which maintains two watched literals per clause to track possible propagations and conflicts without full clause scans each time a ground literal arrives.

If this is right

The scheme integrates into first-order CDCL solvers to speed up propagation.
It correctly identifies all clauses that propagate a literal or become false under the given ground facts.
Performance gains appear most clearly when clauses contain many literals.
Incremental updates to the watch structure avoid revisiting the whole clause set on each new literal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If adopted, first-order theorem provers could handle larger problems in verification and AI planning without proportional slowdowns in propagation.
The rule-based presentation may allow easy combination with other first-order reasoning techniques such as unification.
Testing the scheme on standard first-order benchmark sets would show whether the efficiency gains hold beyond the authors' test cases.

Load-bearing premise

The scheme assumes that ground literals arrive as input and that the two-watched structure for each clause can be maintained and updated incrementally without rechecking the entire set of clauses every time.

What would settle it

A collection of first-order clauses together with ground literals for which the scheme fails to report a clause that should propagate a literal, or reports a propagation that does not actually follow, would show the rules are neither complete nor sound.

Figures

Figures reproduced from arXiv: 2605.21335 by Christoph Weidenbach, Lorenz Leutgeb, Martin Bromberger, Simon Schwarz, Tobias Gehl, Yasmine Briefs.

**Figure 1.** Figure 1: Runtime comparison on all eligible TPTP v9.2.1 problems. 10 5 10 4 10 3 10 2 10 1 10 0 10 1 10 2 Runtime of DP (s, log-scaled) 10 5 10 4 10 3 10 2 10 1 10 0 10 1 10 2 Runtime of TWFO (s, log-scaled) Total Runtime of DP and TWFO, problems with median clause size > 2 (N=820) [PITH_FULL_IMAGE:figures/full_fig_p025_1.png] view at source ↗

**Figure 4.** Figure 4: Number of considered instances on all eligible problems [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

read the original abstract

The two-watched literal scheme, a core component of efficient CDCL (Conflict-Driven Clause Learning) implementations for propositional logic, is extended to first-order logic. Given a set of first-order clauses and a set of ground literals, our lifted two-watched literal scheme efficiently detects all propagating and false clauses with respect to the ground literals. We present the algorithm as a system of rules and prove its soundness and completeness. Additionally, we provide an implementation of the two-watched literal scheme, which outperforms a standard dynamic programming approach for detecting propagatable literals and conflicts, especially when dealing with long clauses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper lifts the two-watched literal scheme to first-order logic with a rule system and shows an implementation that beats dynamic programming on long clauses.

read the letter

The main point is that this paper takes the two-watched literal scheme from propositional SAT and lifts it to first-order logic. Given first-order clauses and ground literals, it detects propagating and false clauses efficiently using a rule system that they prove sound and complete. The implementation beats a dynamic programming baseline, especially with long clauses. The new part is the rule-based formulation for the first-order case. It keeps the spirit of watching two literals per clause but adapts to the structure of FOL clauses and ground substitutions. This seems fresh compared to existing literature on FOL solvers. The paper does well by including an actual implementation and performance numbers, which grounds the efficiency claim in practice rather than just theory. The soft spots are limited. The proofs for soundness and completeness are stated but would need careful review to confirm no gaps in handling variable bindings or clause updates. The efficiency relies on incremental watching without full rescans, which the abstract assumes works, but details on data structures for watching in the presence of unification could be a minor point to clarify. No major circularity or invented entities show up. This is for researchers and developers in automated theorem proving and SAT-based solvers for first-order logic. Someone optimizing the core loop of a CDCL-style FOL prover would find the rules and the comparison useful. It deserves a serious referee because the algorithmic idea is concrete and the empirical evidence suggests real-world relevance. I recommend sending it for peer review. The contribution is solid enough to warrant external feedback on the proofs and implementation.

Referee Report

2 major / 3 minor

Summary. The paper extends the two-watched literal scheme from propositional CDCL to first-order logic. Given a set of first-order clauses and ground literals, it defines a system of rules that detects all propagating and false clauses, proves the soundness and completeness of the rule system, and reports an implementation that outperforms dynamic programming on long clauses.

Significance. If the soundness and completeness proofs hold, the work offers a practical optimization for incremental clause watching in first-order settings, avoiding full rescans on each new ground literal. The empirical comparison with dynamic programming is a concrete strength, indicating potential utility in theorem provers or SMT solvers handling large clause sets.

major comments (2)

[§3] §3 (Rule definitions): the completeness argument for detecting all false clauses under multiple possible unifiers is not fully load-bearing without an explicit case analysis showing that the watching rules do not miss ground instances when a clause has more than one matching literal.
[§4] §4 (Soundness proof): the induction on rule applications assumes incremental maintenance of watches; the base case for the initial assignment of watches to non-ground clauses should be stated explicitly to confirm no initial propagations are overlooked.

minor comments (3)

[Experiments] The experimental section should report the exact clause lengths and number of ground literals used in the dynamic-programming comparison to make the 'outperforms especially for long clauses' claim reproducible.
[Preliminaries] Notation for watched literals mixes propositional and first-order terminology; a short table distinguishing the lifted concepts would improve clarity.
[§3] A few minor typos appear in the rule schemas (e.g., missing subscript on the unification variable in rule R3).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the constructive comments on the rule definitions and soundness proof. We address each major comment below and will incorporate clarifications in the revised manuscript.

read point-by-point responses

Referee: [§3] §3 (Rule definitions): the completeness argument for detecting all false clauses under multiple possible unifiers is not fully load-bearing without an explicit case analysis showing that the watching rules do not miss ground instances when a clause has more than one matching literal.

Authors: We appreciate this observation on strengthening the completeness argument. The proof in §3 establishes that the watching rules detect all false clauses by maintaining the invariant that each clause has two watches and that propagations occur only on valid unifiers. For clauses with multiple matching literals under different unifiers, the rules ensure that if a ground instance falsifies all but one literal, a watch will trigger detection or propagation. To make this fully explicit as requested, we will add a dedicated case analysis paragraph in the revised §3, enumerating scenarios with multiple unifiers and confirming no ground instances are missed by the watching scheme. revision: yes
Referee: [§4] §4 (Soundness proof): the induction on rule applications assumes incremental maintenance of watches; the base case for the initial assignment of watches to non-ground clauses should be stated explicitly to confirm no initial propagations are overlooked.

Authors: We thank the referee for highlighting the need for an explicit base case. The soundness proof proceeds by induction on the number of rule applications, with the initial configuration defined by the watch assignment algorithm applied to the input clauses and ground literals. This initial assignment ensures that any propagatable literals from the starting ground literals are immediately detected by the rules, with no overlooked propagations. In the revised version, we will explicitly state and justify this base case at the beginning of the induction in §4, including a short argument that the non-ground clauses receive watches consistent with the two-watched literal scheme before any rules are applied. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via direct rule definitions and proofs

full rationale

The paper introduces a lifted two-watched literal scheme for first-order logic as an explicit system of rules, then states that soundness and completeness are proved for detecting propagating and false clauses given first-order clauses and ground literals. An implementation is supplied and benchmarked against dynamic programming, supplying an external performance check. No load-bearing step reduces by definition or self-citation to the target result itself; the rules are presented as new and their properties are claimed to be established independently rather than fitted or renamed from prior outputs. The efficiency claim rests on the reported implementation comparison, not on any internal tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the existence of a sound and complete rule system for watching first-order clauses; the abstract does not list explicit free parameters or new invented entities.

pith-pipeline@v0.9.0 · 5634 in / 1054 out tokens · 15443 ms · 2026-05-21T03:20:06.976661+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present the algorithm as a system of rules and prove its soundness and completeness... the two-watched literal scheme for first-order logic
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 5 (Two-Watched Invariants)... if one watched literal is false... then M |= ¬C'

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

In: Auto- mated Reasoning – 13th International Joint Conference, IJCAR 2026, Lisbon, Portugal, July 26–29, 2026, Proceedings (2026), to appear 27

Briefs, Y., Bromberger, M., Gehl, T., Leutgeb, L., Schwarz, S., Weiden- bach, C.: A two-watched literal scheme for first-order logic. In: Auto- mated Reasoning – 13th International Joint Conference, IJCAR 2026, Lisbon, Portugal, July 26–29, 2026, Proceedings (2026), to appear 27

work page 2026
[2]

https://doi.org/10.5281/zenodo.20138614

Briefs, Y., Bromberger, M., Gehl, T., Leutgeb, L., Schwarz, S., Weiden- bach, C.: A two-watched literal scheme for first-order logic: Implemen- tation artifact (2026). https://doi.org/10.5281/zenodo.20138614

work page doi:10.5281/zenodo.20138614 2026
[3]

In: Pientka, B., Tinelli, C

Bromberger, M., Desharnais, M., Weidenbach, C.: An Isabelle/HOL formalization of the SCL(FOL) calculus. In: Pientka, B., Tinelli, C. (eds.) Automated Deduction - CADE 29 - 29th International Conference on Automated Deduction, Rome, Italy, July 1-4, 2023, Proceedings. Lecture Notes in Computer Science, vol. 14132, pp. 116–133. Springer (2023). https://doi....

work page doi:10.1007/978-3-031-38499-8_7 2023
[4]

In: Konev, B., Schon, C., Steen, A

Bromberger, M., Gehl, T., Leutgeb, L., Weidenbach, C.: A two-watched literal scheme for first-order logic. In: Konev, B., Schon, C., Steen, A. (eds.) Proceedings of the Workshop on Practical Aspects of Automated Reasoning Co-located with the 11th International Joint Conference on Automated Reasoning (FLoC/IJCAR 2022), Haifa, Israel, August, 11 - 12, 2022....

work page 2022
[5]

CoRRabs/2302.05954(2023)

Bromberger, M., Schwarz, S., Weidenbach, C.: SCL(FOL) re- visited. CoRRabs/2302.05954(2023). https://doi.org/10.48550/ ARXIV.2302.05954

work page arXiv 2023
[6]

McCune, W.: Experiments with discrimination-tree indexing and path indexing for term retrieval. J. Autom. Reason.9(2), 147–167 (1992). https://doi.org/10.1007/BF00245458

work page doi:10.1007/bf00245458 1992
[7]

226–231 (2001)

Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineering an efficient SAT solver. In: Proceedings of the 38th Design Automation Conference, DAC 2001, Las Vegas, NV, USA, June 18-22, 2001.pp.530–535.ACM(2001).https://doi.org/10.1145/378239. 379017

work page doi:10.1145/378239 2001
[8]

In: Goré, R., Leitsch, A., Nipkow, T

Nieuwenhuis, R., Hillenbrand, T., Riazanov, A., Voronkov, A.: On the evaluation of indexing techniques for theorem proving. In: Goré, R., Leitsch, A., Nipkow, T. (eds.) Automated Reasoning, First In- ternational Joint Conference, IJCAR 2001, Siena, Italy, June 18-23, 2001, Proceedings. LNCS, vol. 2083, pp. 257–271. Springer (2001). https://doi.org/10.1007...

work page doi:10.1007/3-540-45744-5_19 2001
[9]

(eds.): Handbook of Automated Reason- ing (in 2 volumes)

Robinson, J.A., Voronkov, A. (eds.): Handbook of Automated Reason- ing (in 2 volumes). Elsevier and MIT Press (2001) 28

work page 2001
[10]

In: Rutenbar, R.A., Otten, R.H.J.M

Silva, J.P.M., Sakallah, K.A.: GRASP - a new search algorithm for satisfiability. In: Rutenbar, R.A., Otten, R.H.J.M. (eds.) Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 1996, San Jose, CA, USA, November 10-14, 1996. pp. 220–227. IEEE Computer Society / ACM (1996). https://doi.org/10. 1109/ICCAD.1996.569607

work page arXiv 1996
[11]

SAT2005(53), 1–2 (2005)

Sorensson, N., Een, N.: Minisat v1.13 - a SAT solver with conflict-clause minimization. SAT2005(53), 1–2 (2005)

work page 2005
[12]

https://doi.org/10.1007/s10817-017-9407-7

Sutcliffe, G.: The TPTP Problem Library and Associated Infrastruc- ture.FromCNFtoTH0, TPTPv6.4.0.JournalofAutomatedReasoning 59(4), 483–502 (2017). https://doi.org/10.1007/s10817-017-9407-7

work page doi:10.1007/s10817-017-9407-7 2017
[13]

Sutcliffe, G.: Proceedings of CASC-30 – the CADE-30 ATP system competition (2025), https://tptp.org/CASC/30/Proceedings.pdf 29

work page 2025

[1] [1]

In: Auto- mated Reasoning – 13th International Joint Conference, IJCAR 2026, Lisbon, Portugal, July 26–29, 2026, Proceedings (2026), to appear 27

Briefs, Y., Bromberger, M., Gehl, T., Leutgeb, L., Schwarz, S., Weiden- bach, C.: A two-watched literal scheme for first-order logic. In: Auto- mated Reasoning – 13th International Joint Conference, IJCAR 2026, Lisbon, Portugal, July 26–29, 2026, Proceedings (2026), to appear 27

work page 2026

[2] [2]

https://doi.org/10.5281/zenodo.20138614

Briefs, Y., Bromberger, M., Gehl, T., Leutgeb, L., Schwarz, S., Weiden- bach, C.: A two-watched literal scheme for first-order logic: Implemen- tation artifact (2026). https://doi.org/10.5281/zenodo.20138614

work page doi:10.5281/zenodo.20138614 2026

[3] [3]

In: Pientka, B., Tinelli, C

Bromberger, M., Desharnais, M., Weidenbach, C.: An Isabelle/HOL formalization of the SCL(FOL) calculus. In: Pientka, B., Tinelli, C. (eds.) Automated Deduction - CADE 29 - 29th International Conference on Automated Deduction, Rome, Italy, July 1-4, 2023, Proceedings. Lecture Notes in Computer Science, vol. 14132, pp. 116–133. Springer (2023). https://doi....

work page doi:10.1007/978-3-031-38499-8_7 2023

[4] [4]

In: Konev, B., Schon, C., Steen, A

Bromberger, M., Gehl, T., Leutgeb, L., Weidenbach, C.: A two-watched literal scheme for first-order logic. In: Konev, B., Schon, C., Steen, A. (eds.) Proceedings of the Workshop on Practical Aspects of Automated Reasoning Co-located with the 11th International Joint Conference on Automated Reasoning (FLoC/IJCAR 2022), Haifa, Israel, August, 11 - 12, 2022....

work page 2022

[5] [5]

CoRRabs/2302.05954(2023)

Bromberger, M., Schwarz, S., Weidenbach, C.: SCL(FOL) re- visited. CoRRabs/2302.05954(2023). https://doi.org/10.48550/ ARXIV.2302.05954

work page arXiv 2023

[6] [6]

McCune, W.: Experiments with discrimination-tree indexing and path indexing for term retrieval. J. Autom. Reason.9(2), 147–167 (1992). https://doi.org/10.1007/BF00245458

work page doi:10.1007/bf00245458 1992

[7] [7]

226–231 (2001)

Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineering an efficient SAT solver. In: Proceedings of the 38th Design Automation Conference, DAC 2001, Las Vegas, NV, USA, June 18-22, 2001.pp.530–535.ACM(2001).https://doi.org/10.1145/378239. 379017

work page doi:10.1145/378239 2001

[8] [8]

In: Goré, R., Leitsch, A., Nipkow, T

Nieuwenhuis, R., Hillenbrand, T., Riazanov, A., Voronkov, A.: On the evaluation of indexing techniques for theorem proving. In: Goré, R., Leitsch, A., Nipkow, T. (eds.) Automated Reasoning, First In- ternational Joint Conference, IJCAR 2001, Siena, Italy, June 18-23, 2001, Proceedings. LNCS, vol. 2083, pp. 257–271. Springer (2001). https://doi.org/10.1007...

work page doi:10.1007/3-540-45744-5_19 2001

[9] [9]

(eds.): Handbook of Automated Reason- ing (in 2 volumes)

Robinson, J.A., Voronkov, A. (eds.): Handbook of Automated Reason- ing (in 2 volumes). Elsevier and MIT Press (2001) 28

work page 2001

[10] [10]

In: Rutenbar, R.A., Otten, R.H.J.M

Silva, J.P.M., Sakallah, K.A.: GRASP - a new search algorithm for satisfiability. In: Rutenbar, R.A., Otten, R.H.J.M. (eds.) Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 1996, San Jose, CA, USA, November 10-14, 1996. pp. 220–227. IEEE Computer Society / ACM (1996). https://doi.org/10. 1109/ICCAD.1996.569607

work page arXiv 1996

[11] [11]

SAT2005(53), 1–2 (2005)

Sorensson, N., Een, N.: Minisat v1.13 - a SAT solver with conflict-clause minimization. SAT2005(53), 1–2 (2005)

work page 2005

[12] [12]

https://doi.org/10.1007/s10817-017-9407-7

Sutcliffe, G.: The TPTP Problem Library and Associated Infrastruc- ture.FromCNFtoTH0, TPTPv6.4.0.JournalofAutomatedReasoning 59(4), 483–502 (2017). https://doi.org/10.1007/s10817-017-9407-7

work page doi:10.1007/s10817-017-9407-7 2017

[13] [13]

Sutcliffe, G.: Proceedings of CASC-30 – the CADE-30 ATP system competition (2025), https://tptp.org/CASC/30/Proceedings.pdf 29

work page 2025