pith. sign in

arxiv: 2603.27774 · v2 · submitted 2026-03-29 · 💻 cs.CC · cs.LO

Automated Reencoding Meets Graph Theory

Pith reviewed 2026-05-14 21:37 UTC · model grok-4.3

classification 💻 cs.CC cs.LO
keywords Bounded Variable Addition2-CNF formulasSAT preprocessinggraph-theoretic characterizationreencodingat-most-one constraintsclause minimization
0
0 comments X

The pith

Idealized BVA reencodes any 2-CNF formula into one with (lg(3)/4 + o(1)) n² / lg n clauses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper gives a graph-theoretic model that exactly describes which 2-CNF encodings an idealized Bounded Variable Addition procedure can build. From this model the authors derive that idealized BVA plus minor steps such as equivalent literal substitution can rewrite any 2-CNF on n variables as an equivalent 2-CNF whose clause count is asymptotically (lg(3)/4) n² over lg n. The result matters because every clause removed from a SAT instance directly reduces the work done by modern solvers on combinatorial search problems. The same model also yields tight lower bounds for the at-most-one constraint and shows how the graph view produces a faster practical BVA implementation.

Core claim

The encodings that idealized BVA can produce correspond to specific graphs; this correspondence lets the authors prove that any 2-CNF can be reencoded to (lg(3)/4 + o(1)) n² / lg n clauses when equivalent literal substitution is allowed, that the constant rises to 1 without that step, that no reencoding method can beat 0.25, and that the at-most-one constraint on n variables requires at least 3n-6 clauses under BVA, a bound achieved by real implementations but not by the product encoding.

What carries the argument

Graph-theoretic characterization of the 2-CNF encodings producible by idealized BVA, which models clause sets as graphs whose structure determines what BVA can construct.

If this is right

  • Any 2-CNF on n variables can be rewritten as an equivalent 2-CNF whose size is (lg(3)/4 + o(1)) n² / lg n.
  • Without equivalent literal substitution the best constant BVA achieves is 1 rather than lg(3)/4.
  • No reencoding procedure, BVA or otherwise, can reach a constant factor below 0.25 for general 2-CNF formulas.
  • The at-most-one constraint on n variables requires at least 3n-6 clauses under idealized BVA, and this bound is tight for actual implementations.
  • The product encoding of at-most-one, which uses 2n + o(n) clauses, lies outside the set of encodings BVA can produce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph view of BVA may be extended to characterize other preprocessing routines such as variable elimination or clause subsumption.
  • The faster implementation derived from the graph model could be dropped into existing SAT solvers to improve preprocessing speed on large instances.
  • Similar graph reductions might be discovered for non-2-CNF constraints or for encodings that mix BVA with other techniques.
  • The 0.25 lower bound suggests a fundamental limit on how much any local reencoding method can compress 2-CNF formulas.

Load-bearing premise

The idealized BVA algorithm produces exactly the encodings predicted by the graph model; real heuristics could deviate and invalidate the bounds.

What would settle it

Take a large random monotone 2-CNF on n variables, apply idealized BVA plus equivalent literal substitution, and count the resulting clauses; if the count stays above (lg(3)/4 + o(1)) n² / lg n for arbitrarily large n, the reencoding claim is false.

Figures

Figures reproduced from arXiv: 2603.27774 by Benjamin Przybocki, Bernardo Subercaseaux, Marijn J. H. Heule.

Figure 1
Figure 1. Figure 1: Illustration of Example 1 More generally, let C and D be sets of clauses. Then, using notation C ▷◁ D to denote the set of clauses of the form (Ci ∨ Dj ) for Ci ∈ C and Dj ∈ D, we define a BVA step as follows: Definition 2 (BVA Step). Let φ be a formula, and let C and D be sets of nonempty clauses such that C ▷◁ D ⊆ φ, and y ̸∈ Var(φ) be a fresh variable. Then, we say that the formula φ ′ := (φ \ (C ▷◁ D))… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of a diagram and a polarized diagram for a simple 2-CNF. Edges with [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the BVA steps in Example 2 Remark 1. Let φ be a simple 2-CNF formula, and let (G, p) be a polarized diagram of Gφ. Then, φ = FR(G,p) . It turns out that polarized rectifier networks represent a more general class of simple 2-CNF encodings than what can be achieved by idealized BVA. To that end, we define the following: Definition 9 (Strict Polarized Rectifier Network). Let R = (B, A, E, p) … view at source ↗
Figure 4
Figure 4. Figure 4: SPRN realizing Kn with 3n − 6 edges Using Theorem 5, we prove that 3n−6 clauses is the smallest possible encoding for this constraint achievable by idealized BVA: Theorem 8 (Formal version of Theorem 4). For every φ with AtMostOne(x1, . . . , xn)∼∼▷ BVA ∗φ, we have |φ| ≥ 3n − 6. Note that there are 2-CNF encodings for AtMostOne(x1, . . . , xn) using 2n + o(n) clauses [7], and it was previously unknown whet… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration for the proof of Theorem 8 Lemma 7. Let R be an SPRN realizing Kn with m edges for some n ≥ 4, and let x1 and x2 be base vertices. If the total degrees of x1 and x2 in R are both 2 and there is an edge between x1 and x2, then there is an SPRN realizing Kn−1 with at most m − 3 edges. Proof. Let R′ be obtained from R by deleting x1 and its two incident edges; also if R′ contains any auxiliary ve… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of reencoding methods on random monotone formulas [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: End-to-end runtime comparison of reencoding methods on random monotone formulas [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Bounded Variable Addition (BVA) is a central preprocessing method in modern state-of-the-art SAT solvers. We provide a graph-theoretic characterization of which 2-CNF encodings can be constructed by an idealized BVA algorithm. Based on this insight, we prove new results about the behavior and limitations of BVA and its interaction with other preprocessing techniques. We show that idealized BVA, plus some minor additional preprocessing (e.g., equivalent literal substitution), can reencode any 2-CNF formula with $n$ variables into an equivalent 2-CNF formula with $(\tfrac{\lg(3)}{4}+o(1))\,\tfrac{n^2}{\lg n}$ clauses. Furthermore, we show that without the additional preprocessing the constant factor worsens from $\tfrac{\lg(3)}{4} \approx 0.396$ to $1$, and that no reencoding method can achieve a constant below $0.25$. On the other hand, for the at-most-one constraint on $n$ variables, we prove that idealized BVA cannot reencode this constraint using fewer than $3n-6$ clauses, a bound that we prove is achieved by actual implementations. In particular, this shows that the product encoding for at-most-one, which uses $2n+o(n)$ clauses, cannot be constructed by BVA regardless of the heuristics used. Finally, our graph-theoretic characterization of BVA allows us to leverage recent work in algorithmic graph theory to develop a drastically more efficient implementation of BVA that achieves a comparable clause reduction on random monotone 2-CNF formulas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript provides a graph-theoretic characterization of the 2-CNF encodings producible by an idealized Bounded Variable Addition (BVA) algorithm. Using this model, the authors prove that idealized BVA combined with minor additional preprocessing (such as equivalent literal substitution) can reencode any 2-CNF formula on n variables into an equivalent 2-CNF with at most (lg(3)/4 + o(1)) n² / lg n clauses. They establish that the constant worsens to 1 without the extra preprocessing, that no reencoding technique can achieve a constant below 0.25, and that idealized BVA requires at least 3n-6 clauses for the at-most-one constraint on n variables—a bound shown to be tight for actual BVA implementations, implying the product encoding (2n + o(n) clauses) cannot be obtained via BVA. The characterization is further used to derive a more efficient BVA implementation that performs well on random monotone 2-CNF formulas.

Significance. If the central claims hold, the work makes a substantive contribution by connecting SAT preprocessing to algorithmic graph theory, yielding explicit constants, matching upper and lower bounds, and a concrete tight result (3n-6) for at-most-one that is verified on deployed BVA. The development of a drastically faster implementation based on the graph model is a clear practical strength, as is the absence of free parameters or fitted quantities in the derivations. These results clarify the limitations of BVA and should inform both solver design and future theoretical analyses of preprocessing.

minor comments (2)
  1. [Abstract] In the abstract and throughout, explicitly state that lg denotes the base-2 logarithm to avoid ambiguity for readers outside the SAT community.
  2. [Implementation] The description of the new graph-theoretic BVA implementation would benefit from a short pseudocode snippet or high-level algorithm outline to make the efficiency improvement more transparent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, the recognition of the graph-theoretic characterization of BVA, the explicit constants and matching bounds, the tight 3n-6 result for at-most-one, and the practical faster implementation. We appreciate the recommendation to accept.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper provides an explicit graph-theoretic characterization of encodings producible by idealized BVA, then derives the (lg(3)/4 + o(1)) n² / lg n upper bound, the worsened constant without extra preprocessing, the 0.25 lower bound, and the tight 3n-6 at-most-one result directly from that model plus standard graph-theoretic arguments. These steps are self-contained against external benchmarks and do not reduce to fitted parameters, self-definitions, or load-bearing self-citations. The concrete tightness result for actual BVA implementations is shown by matching the model to deployed behavior rather than by renaming or circular prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on a graph model of BVA that is introduced in the paper; the model itself is treated as an axiom for the subsequent proofs. No free parameters or invented physical entities are visible in the abstract.

axioms (1)
  • domain assumption The idealized BVA algorithm produces exactly the 2-CNF formulas whose incidence structure satisfies the stated graph-theoretic property.
    This modeling assumption is the foundation for all subsequent bounds; it is introduced to enable the graph-theoretic analysis.

pith-pipeline@v0.9.0 · 5593 in / 1362 out tokens · 40153 ms · 2026-05-14T21:37:42.933873+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Ab´ ıo, R

    I. Ab´ ıo, R. Nieuwenhuis, A. Oliveras, and E. Rodr´ ıguez-Carbonell. A Parametric Approach for Smaller and Better Encodings of Cardinality Constraints. In C. Schulte, editor,Principles and Practice of Constraint Programming, pages 80–96, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. 4 15

  2. [2]

    P. Allen. Almost every 2-SAT function is unate.Israel J. Math., 161:311–346, 2007. doi: 10.1007/s11856-007-0081-z. 6, 11

  3. [3]

    Balogh, D

    J. Balogh, D. Dong, B. Lidick´ y, N. Mani, and Y. Zhao. Nearly all k-SAT functions are unate. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, page 958–962, New York, NY, USA, 2023. Association for Computing Machinery. doi: 10.1145/3564246.3585123. 14

  4. [4]

    Balyo, M

    T. Balyo, M. Heule, M. Iser, M. J¨ arvisalo, and M. Suda, editors.Proceedings of SAT Competition 2023: Solver, Benchmark and Proof Checker Descriptions. Department of Computer Science Series of Publications B. Department of Computer Science, University of Helsinki, Finland,

  5. [5]

    A. Biere. exactly one clauses·Issue #39·arminbiere/kissat — github.com. https:// github.com/arminbiere/kissat/issues/39#issuecomment-1686043817, 2023. [Accessed 06-06-2025]. 1

  6. [6]

    A. Biere. cadical/src/factor.cpp at master·arminbiere/cadical — github.com. https://github. com/arminbiere/cadical/blob/master/src/factor.cpp, 2026. [Accessed 06-02-2026]. 1

  7. [7]

    J. Chen. A new SAT encoding of the at-most-one constraint.Proc. of the Tenth Int. Workshop of Constraint Modelling and Reformulation., page 8, 2010. 3, 12

  8. [8]

    Chistikov, S

    D. Chistikov, S. Iv´ an, A. Lubiw, and J. Shallit. Fractional Coverings, Greedy Coverings, and Rectifier Networks. In H. Vollmer and B. Vall´ ee, editors,34th Symposium on Theoretical Aspects of Computer Science (STACS 2017), volume 66 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:14, Dagstuhl, Germany, 2017. Schloss Dagstuhl ...

  9. [9]

    Ganesh and M

    V. Ganesh and M. Y. Vardi.On the Unreasonable Effectiveness of SAT Solvers, page 547–566. Cambridge University Press, 2021. 1

  10. [10]

    Haberlandt, H

    A. Haberlandt, H. Green, and M. J. H. Heule. Effective Auxiliary Variables via Structured Reencoding. In M. Mahajan and F. Slivovsky, editors,26th International Conference on Theory and Applications of Satisfiability Testing (SAT 2023), volume 271 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 11:1–11:19, Dagstuhl, Germany, 2023. Schlo...

  11. [11]

    M. J. H. Heule, M. J¨ arvisalo, and A. Biere. Efficient CNF Simplification Based on Binary Impli- cation Graphs. In K. A. Sakallah and L. Simon, editors,Theory and Applications of Satisfiability Testing - SAT 2011, pages 201–215. Springer, 2011. doi:10.1007/978-3-642-21581-0_17 . 2

  12. [12]

    M. J. H. Heule and M. Scheucher. Happy Ending: An Empty Hexagon in Every Set of 30 Points. In B. Finkbeiner and L. Kov´ acs, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 61–80, Cham, 2024. Springer Nature Switzerland. 4

  13. [13]

    PySAT: A Python toolkit for prototyping with SAT oracles

    A. Ignatiev, A. Morgado, and J. Marques-Silva. PySAT: A Python toolkit for prototyping with SAT oracles. InSAT, pages 428–437, 2018.doi:10.1007/978-3-319-94144-8_26. 28

  14. [14]

    Iv´ an,´A

    S. Iv´ an,´A. D. Lelkes, J. Nagy-Gy¨ orgy, B. Sz¨ or´ enyi, and G. Tur´ an. Biclique Coverings, Rectifier Networks and the Cost of ε-Removal. In H. J¨ urgensen, J. Karhum¨ aki, and A. Okhotin, editors, 16 Descriptional Complexity of Formal Systems, pages 174–185, Cham, 2014. Springer International Publishing. 2, 5

  15. [15]

    S. Jukna. Computational Complexity of Graphs. InAdvances in Network Complexity, chapter 5, pages 99–153. John Wiley & Sons, Ltd, 2013.doi:10.1002/9783527670468.ch05. 5

  16. [16]

    Jukna and I

    S. Jukna and I. Sergeev. Complexity of linear boolean operators.Found. Trends Theor. Comput. Sci., 9(1):1–123, 2013.doi:10.1561/0400000063. 8

  17. [17]

    Kirchweger, T

    M. Kirchweger, T. Peitl, B. Subercaseaux, and S. Szeider. From the finite to the infinite: Sharper asymptotic bounds on Norin’s conjecture via SAT, 2025. URL: https://arxiv.org/ abs/2511.08386,arXiv:2511.08386. 4

  18. [18]

    Krapivin, B

    A. Krapivin, B. Przybocki, N. Sanhueza-Matamala, and B. Subercaseaux. Optimal and efficient partite decompositions of hypergraphs, 2025. URL: https://arxiv.org/abs/2511.11855, arXiv:2511.11855. 3, 9, 10, 14

  19. [19]

    O. Lupanov. On rectifier and switching-and-rectifier circuits.Doklady Academii nauk SSSR, 111, 1956. Available at https://web.vu.lt/mif/s.jukna/boolean/lupanov56.pdf, thanks to Stasys Jukna. 2, 5

  20. [20]

    Manthey, M

    N. Manthey, M. J. H. Heule, and A. Biere. Automated reencoding of boolean formulas. In Haifa Verification Conference, pages 102–117. Springer, 2012. 1, 2, 3, 4, 5, 12, 14, 27

  21. [21]

    E. I. Nechiporuk. The topological principles of self-correction.Problemy Kibernet., (21):5– 102, 1969. Available in Russian at https://web.vu.lt/mif/s.jukna/boolean/Russians/ Nechiporuk-1969a.pdf, thanks to Stasys Jukna. 2, 9

  22. [22]

    L. Qian, E. Wang, B. Subercaseaux, and M. J. H. Heule. Unfolding boxes with local constraints. InAutomated Deduction – CADE 30: 30th International Conference on Automated Deduction, Stuttgart, Germany, July 28-31, 2025, Proceedings, page 736–754, Berlin, Heidelberg, 2025. Springer-Verlag.doi:10.1007/978-3-031-99984-0_38. 4

  23. [23]

    2024.arXiv:10138/584822

    Proceedings of SAT Competition 2024 : Solver, Benchmark and Proof Checker Descriptions. 2024.arXiv:10138/584822. 1

  24. [24]

    Subercaseaux

    B. Subercaseaux. Asymptotically smaller encodings for graph problems and scheduling, 2025. (ModRef workshop). URL:https://arxiv.org/abs/2506.14042,arXiv:2506.14042. 11

  25. [25]

    Subercaseaux and M

    B. Subercaseaux and M. J. H. Heule. The Packing Chromatic Number of the Infinite Square Grid is 15. In S. Sankaranarayanan and N. Sharygina, editors,Tools and Algorithms for the Construction and Analysis of Systems - 29th International Conference, TACAS 2023, volume 13993 ofLecture Notes in Computer Science, pages 389–406. Springer, 2023. doi: 10.1007/978...

  26. [26]

    R. E. Tarjan. Depth-first search and linear graph algorithms.SIAM J. Comput., 1(2):146–160, 1972.doi:10.1137/0201010. 18 A Proofs from Section 3 Lemma 8.Given a satisfiable 2-CNF formula φ, we can compute a formula φ′ in linear time such that (a) φ′ is obtained from φ by replacing some variables by their negations wherever they appear, and (b) every claus...

  27. [27]

    for everyx∈Xandy∈Y, we have(x, y)∈E; and

  28. [28]

    Definition 11(Biclique Reduction).Let R = (B, A, E, p)be a PRN and( X, Y )a coherent biclique with edge set EX,Y

    for everyy∈Y∩Bandx 1, x2 ∈X, we havep(x 1, y) =p(x 2, y). Definition 11(Biclique Reduction).Let R = (B, A, E, p)be a PRN and( X, Y )a coherent biclique with edge set EX,Y . Then, thebiclique reductionof R with respect to( X, Y )is a new PRN R′ = (B, A′, E′, p′), whereA ′ =A⊔ {z}for some new vertexzand E′ := (E\E X,Y )∪ {(x, z)|x∈X} ∪ {(z, y)|y∈Y}, p′(u, v...

  29. [29]

    add the clause ( xi ∨ xj) toφ

  30. [30]

    add the clause ( xi ∨x j) toφ; or

  31. [31]

    There are 3(n

    add neither clause toφ. There are 3(n

  32. [32]

    Proposition 3.There is a monotone 2-CNF formula with n variables whose smallest 2-CNF encoding has at least( 1 4 −o(1)) n2 lgn clauses

    = 3(1/2−o(1))n2 ways to carry out this construction, each of which yields a distinct simple 2-CNF formula onnvariables. Proposition 3.There is a monotone 2-CNF formula with n variables whose smallest 2-CNF encoding has at least( 1 4 −o(1)) n2 lgn clauses. Proof. Let g(m) be the number of 2-CNF formulas with at most m clauses, and let h(n) := 2(n

  33. [33]

    By Lemma 2, lgg (m) ≤ (1 +o(1))m·lgm

    be the number of monotone 2-CNF functions on n variables. By Lemma 2, lgg (m) ≤ (1 +o(1))m·lgm . If every monotone 2-CNF function on n variables can be encoded with at most m clauses, then g(m) ≥h (n), and consequently lgg (m) ≥lgh (n). If m≤ (r + o(1))n2/lgn for some constant r, then lgm≤(2 +o(1)) lgn, and thus (1/2−o(1))n 2 ≤lgh(n)≤lgg(m)≤(1 +o(1))m·lgm...

  34. [34]

    Since randomness is involved, all our results, both for clauses, variables, and runtimes, are averaged over 5 independent trials

    is well-concentrated around 2 lgn , we consider both a satisfiable regime, for which we set k := 1 + ⌊1.2 lgn⌋ , as well as an unsatisfiable regime for which we set k := ⌊30 lgn⌋ in order to keep the runtimes manageable (as k approaches the critical threshold, the instances become harder to solve). Since randomness is involved, all our results, both for c...