Automated Reencoding Meets Graph Theory
Pith reviewed 2026-05-14 21:37 UTC · model grok-4.3
The pith
Idealized BVA reencodes any 2-CNF formula into one with (lg(3)/4 + o(1)) n² / lg n clauses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The encodings that idealized BVA can produce correspond to specific graphs; this correspondence lets the authors prove that any 2-CNF can be reencoded to (lg(3)/4 + o(1)) n² / lg n clauses when equivalent literal substitution is allowed, that the constant rises to 1 without that step, that no reencoding method can beat 0.25, and that the at-most-one constraint on n variables requires at least 3n-6 clauses under BVA, a bound achieved by real implementations but not by the product encoding.
What carries the argument
Graph-theoretic characterization of the 2-CNF encodings producible by idealized BVA, which models clause sets as graphs whose structure determines what BVA can construct.
If this is right
- Any 2-CNF on n variables can be rewritten as an equivalent 2-CNF whose size is (lg(3)/4 + o(1)) n² / lg n.
- Without equivalent literal substitution the best constant BVA achieves is 1 rather than lg(3)/4.
- No reencoding procedure, BVA or otherwise, can reach a constant factor below 0.25 for general 2-CNF formulas.
- The at-most-one constraint on n variables requires at least 3n-6 clauses under idealized BVA, and this bound is tight for actual implementations.
- The product encoding of at-most-one, which uses 2n + o(n) clauses, lies outside the set of encodings BVA can produce.
Where Pith is reading between the lines
- The graph view of BVA may be extended to characterize other preprocessing routines such as variable elimination or clause subsumption.
- The faster implementation derived from the graph model could be dropped into existing SAT solvers to improve preprocessing speed on large instances.
- Similar graph reductions might be discovered for non-2-CNF constraints or for encodings that mix BVA with other techniques.
- The 0.25 lower bound suggests a fundamental limit on how much any local reencoding method can compress 2-CNF formulas.
Load-bearing premise
The idealized BVA algorithm produces exactly the encodings predicted by the graph model; real heuristics could deviate and invalidate the bounds.
What would settle it
Take a large random monotone 2-CNF on n variables, apply idealized BVA plus equivalent literal substitution, and count the resulting clauses; if the count stays above (lg(3)/4 + o(1)) n² / lg n for arbitrarily large n, the reencoding claim is false.
Figures
read the original abstract
Bounded Variable Addition (BVA) is a central preprocessing method in modern state-of-the-art SAT solvers. We provide a graph-theoretic characterization of which 2-CNF encodings can be constructed by an idealized BVA algorithm. Based on this insight, we prove new results about the behavior and limitations of BVA and its interaction with other preprocessing techniques. We show that idealized BVA, plus some minor additional preprocessing (e.g., equivalent literal substitution), can reencode any 2-CNF formula with $n$ variables into an equivalent 2-CNF formula with $(\tfrac{\lg(3)}{4}+o(1))\,\tfrac{n^2}{\lg n}$ clauses. Furthermore, we show that without the additional preprocessing the constant factor worsens from $\tfrac{\lg(3)}{4} \approx 0.396$ to $1$, and that no reencoding method can achieve a constant below $0.25$. On the other hand, for the at-most-one constraint on $n$ variables, we prove that idealized BVA cannot reencode this constraint using fewer than $3n-6$ clauses, a bound that we prove is achieved by actual implementations. In particular, this shows that the product encoding for at-most-one, which uses $2n+o(n)$ clauses, cannot be constructed by BVA regardless of the heuristics used. Finally, our graph-theoretic characterization of BVA allows us to leverage recent work in algorithmic graph theory to develop a drastically more efficient implementation of BVA that achieves a comparable clause reduction on random monotone 2-CNF formulas.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript provides a graph-theoretic characterization of the 2-CNF encodings producible by an idealized Bounded Variable Addition (BVA) algorithm. Using this model, the authors prove that idealized BVA combined with minor additional preprocessing (such as equivalent literal substitution) can reencode any 2-CNF formula on n variables into an equivalent 2-CNF with at most (lg(3)/4 + o(1)) n² / lg n clauses. They establish that the constant worsens to 1 without the extra preprocessing, that no reencoding technique can achieve a constant below 0.25, and that idealized BVA requires at least 3n-6 clauses for the at-most-one constraint on n variables—a bound shown to be tight for actual BVA implementations, implying the product encoding (2n + o(n) clauses) cannot be obtained via BVA. The characterization is further used to derive a more efficient BVA implementation that performs well on random monotone 2-CNF formulas.
Significance. If the central claims hold, the work makes a substantive contribution by connecting SAT preprocessing to algorithmic graph theory, yielding explicit constants, matching upper and lower bounds, and a concrete tight result (3n-6) for at-most-one that is verified on deployed BVA. The development of a drastically faster implementation based on the graph model is a clear practical strength, as is the absence of free parameters or fitted quantities in the derivations. These results clarify the limitations of BVA and should inform both solver design and future theoretical analyses of preprocessing.
minor comments (2)
- [Abstract] In the abstract and throughout, explicitly state that lg denotes the base-2 logarithm to avoid ambiguity for readers outside the SAT community.
- [Implementation] The description of the new graph-theoretic BVA implementation would benefit from a short pseudocode snippet or high-level algorithm outline to make the efficiency improvement more transparent.
Simulated Author's Rebuttal
We thank the referee for the positive summary, the recognition of the graph-theoretic characterization of BVA, the explicit constants and matching bounds, the tight 3n-6 result for at-most-one, and the practical faster implementation. We appreciate the recommendation to accept.
Circularity Check
No significant circularity identified
full rationale
The paper provides an explicit graph-theoretic characterization of encodings producible by idealized BVA, then derives the (lg(3)/4 + o(1)) n² / lg n upper bound, the worsened constant without extra preprocessing, the 0.25 lower bound, and the tight 3n-6 at-most-one result directly from that model plus standard graph-theoretic arguments. These steps are self-contained against external benchmarks and do not reduce to fitted parameters, self-definitions, or load-bearing self-citations. The concrete tightness result for actual BVA implementations is shown by matching the model to deployed behavior rather than by renaming or circular prediction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The idealized BVA algorithm produces exactly the 2-CNF formulas whose incidence structure satisfies the stated graph-theoretic property.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean; IndisputableMonolith/Foundation/AlexanderDuality.leanreality_from_one_distinction; alexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We provide a graph-theoretic characterization of which 2-CNF encodings can be constructed by an idealized BVA algorithm... correspondence between reencodings achievable by BVA and what we call strict polarized rectifier networks (Theorem 5).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel; J_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
idealized BVA... can reencode any 2-CNF formula with n variables into an equivalent 2-CNF formula with (lg(3)/4 + o(1)) n² / lg n clauses
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
I. Ab´ ıo, R. Nieuwenhuis, A. Oliveras, and E. Rodr´ ıguez-Carbonell. A Parametric Approach for Smaller and Better Encodings of Cardinality Constraints. In C. Schulte, editor,Principles and Practice of Constraint Programming, pages 80–96, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. 4 15
work page 2013
-
[2]
P. Allen. Almost every 2-SAT function is unate.Israel J. Math., 161:311–346, 2007. doi: 10.1007/s11856-007-0081-z. 6, 11
-
[3]
J. Balogh, D. Dong, B. Lidick´ y, N. Mani, and Y. Zhao. Nearly all k-SAT functions are unate. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, page 958–962, New York, NY, USA, 2023. Association for Computing Machinery. doi: 10.1145/3564246.3585123. 14
- [4]
-
[5]
A. Biere. exactly one clauses·Issue #39·arminbiere/kissat — github.com. https:// github.com/arminbiere/kissat/issues/39#issuecomment-1686043817, 2023. [Accessed 06-06-2025]. 1
work page 2023
-
[6]
A. Biere. cadical/src/factor.cpp at master·arminbiere/cadical — github.com. https://github. com/arminbiere/cadical/blob/master/src/factor.cpp, 2026. [Accessed 06-02-2026]. 1
work page 2026
-
[7]
J. Chen. A new SAT encoding of the at-most-one constraint.Proc. of the Tenth Int. Workshop of Constraint Modelling and Reformulation., page 8, 2010. 3, 12
work page 2010
-
[8]
D. Chistikov, S. Iv´ an, A. Lubiw, and J. Shallit. Fractional Coverings, Greedy Coverings, and Rectifier Networks. In H. Vollmer and B. Vall´ ee, editors,34th Symposium on Theoretical Aspects of Computer Science (STACS 2017), volume 66 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:14, Dagstuhl, Germany, 2017. Schloss Dagstuhl ...
-
[9]
V. Ganesh and M. Y. Vardi.On the Unreasonable Effectiveness of SAT Solvers, page 547–566. Cambridge University Press, 2021. 1
work page 2021
-
[10]
A. Haberlandt, H. Green, and M. J. H. Heule. Effective Auxiliary Variables via Structured Reencoding. In M. Mahajan and F. Slivovsky, editors,26th International Conference on Theory and Applications of Satisfiability Testing (SAT 2023), volume 271 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 11:1–11:19, Dagstuhl, Germany, 2023. Schlo...
-
[11]
M. J. H. Heule, M. J¨ arvisalo, and A. Biere. Efficient CNF Simplification Based on Binary Impli- cation Graphs. In K. A. Sakallah and L. Simon, editors,Theory and Applications of Satisfiability Testing - SAT 2011, pages 201–215. Springer, 2011. doi:10.1007/978-3-642-21581-0_17 . 2
-
[12]
M. J. H. Heule and M. Scheucher. Happy Ending: An Empty Hexagon in Every Set of 30 Points. In B. Finkbeiner and L. Kov´ acs, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 61–80, Cham, 2024. Springer Nature Switzerland. 4
work page 2024
-
[13]
PySAT: A Python toolkit for prototyping with SAT oracles
A. Ignatiev, A. Morgado, and J. Marques-Silva. PySAT: A Python toolkit for prototyping with SAT oracles. InSAT, pages 428–437, 2018.doi:10.1007/978-3-319-94144-8_26. 28
-
[14]
S. Iv´ an,´A. D. Lelkes, J. Nagy-Gy¨ orgy, B. Sz¨ or´ enyi, and G. Tur´ an. Biclique Coverings, Rectifier Networks and the Cost of ε-Removal. In H. J¨ urgensen, J. Karhum¨ aki, and A. Okhotin, editors, 16 Descriptional Complexity of Formal Systems, pages 174–185, Cham, 2014. Springer International Publishing. 2, 5
work page 2014
-
[15]
S. Jukna. Computational Complexity of Graphs. InAdvances in Network Complexity, chapter 5, pages 99–153. John Wiley & Sons, Ltd, 2013.doi:10.1002/9783527670468.ch05. 5
-
[16]
S. Jukna and I. Sergeev. Complexity of linear boolean operators.Found. Trends Theor. Comput. Sci., 9(1):1–123, 2013.doi:10.1561/0400000063. 8
-
[17]
M. Kirchweger, T. Peitl, B. Subercaseaux, and S. Szeider. From the finite to the infinite: Sharper asymptotic bounds on Norin’s conjecture via SAT, 2025. URL: https://arxiv.org/ abs/2511.08386,arXiv:2511.08386. 4
-
[18]
A. Krapivin, B. Przybocki, N. Sanhueza-Matamala, and B. Subercaseaux. Optimal and efficient partite decompositions of hypergraphs, 2025. URL: https://arxiv.org/abs/2511.11855, arXiv:2511.11855. 3, 9, 10, 14
-
[19]
O. Lupanov. On rectifier and switching-and-rectifier circuits.Doklady Academii nauk SSSR, 111, 1956. Available at https://web.vu.lt/mif/s.jukna/boolean/lupanov56.pdf, thanks to Stasys Jukna. 2, 5
work page 1956
-
[20]
N. Manthey, M. J. H. Heule, and A. Biere. Automated reencoding of boolean formulas. In Haifa Verification Conference, pages 102–117. Springer, 2012. 1, 2, 3, 4, 5, 12, 14, 27
work page 2012
-
[21]
E. I. Nechiporuk. The topological principles of self-correction.Problemy Kibernet., (21):5– 102, 1969. Available in Russian at https://web.vu.lt/mif/s.jukna/boolean/Russians/ Nechiporuk-1969a.pdf, thanks to Stasys Jukna. 2, 9
work page 1969
-
[22]
L. Qian, E. Wang, B. Subercaseaux, and M. J. H. Heule. Unfolding boxes with local constraints. InAutomated Deduction – CADE 30: 30th International Conference on Automated Deduction, Stuttgart, Germany, July 28-31, 2025, Proceedings, page 736–754, Berlin, Heidelberg, 2025. Springer-Verlag.doi:10.1007/978-3-031-99984-0_38. 4
-
[23]
Proceedings of SAT Competition 2024 : Solver, Benchmark and Proof Checker Descriptions. 2024.arXiv:10138/584822. 1
work page 2024
-
[24]
B. Subercaseaux. Asymptotically smaller encodings for graph problems and scheduling, 2025. (ModRef workshop). URL:https://arxiv.org/abs/2506.14042,arXiv:2506.14042. 11
-
[25]
B. Subercaseaux and M. J. H. Heule. The Packing Chromatic Number of the Infinite Square Grid is 15. In S. Sankaranarayanan and N. Sharygina, editors,Tools and Algorithms for the Construction and Analysis of Systems - 29th International Conference, TACAS 2023, volume 13993 ofLecture Notes in Computer Science, pages 389–406. Springer, 2023. doi: 10.1007/978...
-
[26]
R. E. Tarjan. Depth-first search and linear graph algorithms.SIAM J. Comput., 1(2):146–160, 1972.doi:10.1137/0201010. 18 A Proofs from Section 3 Lemma 8.Given a satisfiable 2-CNF formula φ, we can compute a formula φ′ in linear time such that (a) φ′ is obtained from φ by replacing some variables by their negations wherever they appear, and (b) every claus...
-
[27]
for everyx∈Xandy∈Y, we have(x, y)∈E; and
-
[28]
for everyy∈Y∩Bandx 1, x2 ∈X, we havep(x 1, y) =p(x 2, y). Definition 11(Biclique Reduction).Let R = (B, A, E, p)be a PRN and( X, Y )a coherent biclique with edge set EX,Y . Then, thebiclique reductionof R with respect to( X, Y )is a new PRN R′ = (B, A′, E′, p′), whereA ′ =A⊔ {z}for some new vertexzand E′ := (E\E X,Y )∪ {(x, z)|x∈X} ∪ {(z, y)|y∈Y}, p′(u, v...
-
[29]
add the clause ( xi ∨ xj) toφ
-
[30]
add the clause ( xi ∨x j) toφ; or
- [31]
-
[32]
= 3(1/2−o(1))n2 ways to carry out this construction, each of which yields a distinct simple 2-CNF formula onnvariables. Proposition 3.There is a monotone 2-CNF formula with n variables whose smallest 2-CNF encoding has at least( 1 4 −o(1)) n2 lgn clauses. Proof. Let g(m) be the number of 2-CNF formulas with at most m clauses, and let h(n) := 2(n
-
[33]
By Lemma 2, lgg (m) ≤ (1 +o(1))m·lgm
be the number of monotone 2-CNF functions on n variables. By Lemma 2, lgg (m) ≤ (1 +o(1))m·lgm . If every monotone 2-CNF function on n variables can be encoded with at most m clauses, then g(m) ≥h (n), and consequently lgg (m) ≥lgh (n). If m≤ (r + o(1))n2/lgn for some constant r, then lgm≤(2 +o(1)) lgn, and thus (1/2−o(1))n 2 ≤lgh(n)≤lgg(m)≤(1 +o(1))m·lgm...
-
[34]
is well-concentrated around 2 lgn , we consider both a satisfiable regime, for which we set k := 1 + ⌊1.2 lgn⌋ , as well as an unsatisfiable regime for which we set k := ⌊30 lgn⌋ in order to keep the runtimes manageable (as k approaches the critical threshold, the instances become harder to solve). Since randomness is involved, all our results, both for c...
work page 1901
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.