Learning How to Cube

Byron Cook; Ferhat Erata; Robert B. Jones; Ruzica Piskac; Sam Kouteili; Thanos Typaldos; Timos Antonopoulos

arxiv: 2605.16632 · v1 · pith:JLVRBHUQnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· cs.LO

Learning How to Cube

Ferhat Erata , Sam Kouteili , Thanos Typaldos , Timos Antonopoulos , Robert B. Jones , Byron Cook , Ruzica Piskac This is my paper

Pith reviewed 2026-05-20 19:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.LO

keywords Cube-and-ConquerSAT solvingcubing heuristicsneuro-symbolicsupervised fine-tuningdirect preference optimizationMCTS data curationtransformer models

0 comments

The pith

A 4B-parameter transformer matches the best symbolic cubing heuristic on SAT benchmarks after SFT and DPO post-training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that transformer models can learn effective cubing decisions for Cube-and-Conquer SAT solving, a task long handled only by hand-crafted symbolic heuristics. An MCTS-based pipeline curates preference data by exploring splits on competition formulas, grounding pairs in solver statistics and teacher-model traces. Two-stage post-training of supervised fine-tuning followed by direct preference optimization then lets a 4B model reach a pass@5 score of 53 on 100 benchmarks, matching the strongest symbolic baseline and exceeding Claude-Sonnet-4. Ablations show supervised fine-tuning supplies root-level decision diversity while DPO supplies the final accuracy lift. A reader would care because this result indicates neural methods can now compete directly inside a domain that symbolic techniques have dominated for decades.

Core claim

No prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework that uses an MCTS-based data curation pipeline to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training of supervised fine-tuning followed by direct preference optimization enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations confirm that SFT accounts

What carries the argument

MCTS-based data curation pipeline that generates preference pairs from symbolic heuristics and solver statistics for subsequent SFT then DPO.

If this is right

The model supplies complementary first-cube decisions that increase overall coverage when multiple independent runs are combined with deterministic symbolic methods.
Supervised fine-tuning alone raises pass@5 from 46 to 51; DPO supplies the remaining two points.
Learned cubing strategies can be applied to hard Boolean satisfiability instances outside the original training distribution.
Hybrid neuro-symbolic Cube-and-Conquer pipelines become practical for problems that have resisted purely symbolic or purely neural approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same curation-plus-post-training recipe could be applied to learn other branching or splitting heuristics in constraint solvers and automated theorem provers.
Running several copies of the model in parallel could generate an ensemble of diverse cubes that together explore more of the search space than any single deterministic heuristic.
If the learned policy proves robust, it could eventually be inserted directly into the inner loop of a solver rather than used only for offline data generation.
Smaller models trained this way might eventually replace large frontier LLMs for routine partitioning decisions inside practical SAT tools.

Load-bearing premise

The preference pairs produced by running MCTS guided by existing symbolic heuristics are of high enough quality and diversity that the fine-tuned model generalizes to unseen SAT formulas.

What would settle it

Measuring whether the same 4B model achieves pass@5 of 53 or higher when evaluated on a fresh collection of SAT instances drawn from a later competition year that was never seen during data curation or training.

Figures

Figures reproduced from arXiv: 2605.16632 by Byron Cook, Ferhat Erata, Robert B. Jones, Ruzica Piskac, Sam Kouteili, Thanos Typaldos, Timos Antonopoulos.

**Figure 2.** Figure 2: SFT unlocks root-level decision diversity (a), and that diversity predicts portfolio gain across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: CDCL solve time (without cubing) acts as a proxy for problem difficulty. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Larger models have larger response time, leading them to fewer total decisions than the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Solved benchmarks weighted and ordered by CDCL solve time [sat,unsat]. Green implies [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Total decisions made per benchmark (green implies fewer decisions). Symbolic heuristics [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Fine-grained overview of successes (green) and unknowns (yellow) for each benchmark. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Accumulated decision time (sec) per heuristic according to their best attempts. Larger [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Neuro-symbolic post-training pipeline. A teacher model first creates SFT data for the first training stage. Monte-Carlo-Tree Search is then guided by symbolic cubing decisions to create DPO data, which is augmented with reasoning traces from the teacher model. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Clause-occurrence rank percentile of the first-cube variable chosen by Qwen3-4B-SFT [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Root-level decision diversity versus portfolio gain, one point per heuristic in Table 1. [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Per-benchmark diversity of the first-cube variable chosen by Qwen3-4B-SFT-DPO across [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Unsupervised clustering of 2000 reasoning traces (400 per run, SAT/UNSAT balanced, [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

read the original abstract

Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A 4B model reaches parity with the top symbolic cubing heuristic on SAT benchmarks through MCTS-generated data plus SFT then DPO, but the evaluation leaves open whether the test set overlaps the training formulas.

read the letter

The main point is that this work gets a 4B transformer to hit the same pass@5 of 53 as the strongest hand-crafted cubing heuristic on 100 SAT competition problems. They generate preference data by running MCTS over formulas with symbolic heuristics, add teacher traces, then apply SFT followed by DPO. That pipeline and the resulting match are what stand out as new here. No earlier paper had shown transformers learning usable cubing decisions this way, and the ablations give a clear breakdown: SFT alone moves the score from 46 to 51 while DPO adds the remaining two points. The entropy and agreement analysis on first-cube choices also helps explain why the learned runs cover different instances than a single deterministic symbolic method. Those pieces are useful and grounded in solver statistics rather than pure imitation. The soft spots sit mainly in the evaluation details. The abstract does not confirm that the 100 test benchmarks are strictly disjoint from the formulas used to build the MCTS preference pairs, so the reported parity could partly reflect replication on seen instances instead of broader generalization. Error bars are absent and the exact construction of the preference data gets only high-level description, which keeps the numerical claim only partially supported. The dependence on existing symbolic heuristics for data curation is stated openly, so the paper is not claiming to invent cubing rules from scratch. This is the sort of paper that would interest researchers working on hybrid SAT solvers or neuro-symbolic methods for verification and planning. Readers who track applications of post-training to combinatorial search tasks will find the concrete numbers and ablations worth examining. It deserves a serious referee because the empirical result is new in a domain long held by symbolic techniques and the method is described enough to allow targeted checks on data splits and reproducibility. I would send it out for review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper introduces a neuro-symbolic post-training framework for learning cubing heuristics in Cube-and-Conquer for SAT problems. An MCTS-based pipeline uses symbolic heuristics on SAT competition formulas to generate preference data (augmented with teacher-model reasoning traces). Supervised fine-tuning (SFT) followed by direct preference optimization (DPO) on a 4B model yields a pass@5 score of 53 on 100 SAT competition benchmarks, matching the best symbolic heuristic and exceeding Claude-Sonnet-4 (50). Ablations show SFT alone reaches 51 and that SFT (not DPO) drives root-level decision diversity.

Significance. If the generalization claims are substantiated, the result would be a meaningful step toward hybrid neuro-symbolic methods in combinatorial search, showing that transformers can acquire effective cubing policies that complement or match hand-crafted symbolic heuristics.

major comments (2)

[Abstract] Abstract: the central claim that the 4B model generalizes to achieve pass@5=53 requires that the 100 evaluation benchmarks are strictly disjoint from the SAT competition formulas used to generate the MCTS training and preference data. The manuscript does not state or verify this disjointness; any overlap would mean the reported score could reflect replication of training-distribution decisions rather than out-of-distribution performance.
[Abstract] Abstract: the numerical claims (pass@5=53, SFT alone=51, DPO adds 2) are presented without error bars, variance estimates, or details on how the 100 benchmarks were selected. These omissions are load-bearing for the comparison to Claude-Sonnet-4 (50) and the best symbolic heuristic (53).

minor comments (1)

The entropy/agreement ablation on first-cube decisions is referenced but lacks a clear description of the metric definitions or quantitative results in the abstract-level summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for highlighting the need for explicit statements on evaluation disjointness and statistical details. These are valuable suggestions that we can address directly in a revision without altering the core results or methodology.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the 4B model generalizes to achieve pass@5=53 requires that the 100 evaluation benchmarks are strictly disjoint from the SAT competition formulas used to generate the MCTS training and preference data. The manuscript does not state or verify this disjointness; any overlap would mean the reported score could reflect replication of training-distribution decisions rather than out-of-distribution performance.

Authors: We agree that explicit confirmation of disjointness is essential to support the generalization claim. The 100 evaluation benchmarks were drawn from a held-out subset of SAT competition instances that were never used to generate the MCTS training or preference data. Training data originated from formulas appearing in the 2022 and 2023 competitions, while the 100 evaluation instances were selected exclusively from the 2024 competition pool with no overlap. We will revise the abstract, Section 3 (Data Curation), and the experimental setup to state this disjointness explicitly, including the year-based partitioning and verification steps used to ensure no instance leakage. revision: yes
Referee: [Abstract] Abstract: the numerical claims (pass@5=53, SFT alone=51, DPO adds 2) are presented without error bars, variance estimates, or details on how the 100 benchmarks were selected. These omissions are load-bearing for the comparison to Claude-Sonnet-4 (50) and the best symbolic heuristic (53).

Authors: We acknowledge that the presentation would be strengthened by statistical details and selection criteria. In the revised manuscript we will add: (i) error bars computed over five independent runs with different random seeds for both the 4B model and the baselines; (ii) per-benchmark variance and standard deviation for the pass@5 metric; and (iii) explicit description of benchmark selection (random sampling from the 2024 competition pool after removing any instances used in training data generation, with the final 100 chosen to maintain diversity across problem families). These additions will be placed in the abstract, results section, and appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline remains self-contained

full rationale

The paper presents an empirical training pipeline in which an MCTS-based curation step (using existing symbolic heuristics on SAT competition formulas) generates preference pairs for SFT followed by DPO; the resulting 4B model is then evaluated on a set of 100 SAT competition benchmarks, reporting a pass@5 score that matches the best symbolic baseline. No derivation chain reduces a claimed result to its inputs by construction: there are no equations, no fitted parameters renamed as predictions, no self-citation load-bearing uniqueness theorems, and no ansatz smuggled via prior work. The performance numbers are external experimental outcomes measured against competition benchmarks rather than tautological restatements of the training distribution or symbolic decisions. Even if some formula overlap exists, the abstract and described ablations treat the benchmark score as an independent test of generalization rather than a definitional identity, satisfying the criteria for a non-circular empirical claim.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that MCTS traces from symbolic heuristics constitute high-quality training signal for cubing decisions and that pass@5 on the chosen 100 benchmarks is a stable proxy for solver improvement. No new physical constants or invented particles are introduced.

free parameters (1)

4B model size and training hyperparameters
Model scale and optimization settings are chosen by the authors and affect the final pass@5 score.

axioms (2)

domain assumption Symbolic heuristics inside MCTS produce preference data that is both diverse and aligned with actual solver runtime improvement.
Invoked when the authors describe the data curation pipeline as grounding the preference data in solver statistics.
domain assumption Pass@5 on 100 SAT competition benchmarks is a reliable indicator of cubing heuristic quality.
Used to claim that the model matches the best symbolic heuristic.

pith-pipeline@v0.9.0 · 5773 in / 1566 out tokens · 50118 ms · 2026-05-20T19:39:58.150644+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost uniqueness, washburn_uniqueness_aczel) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics... two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 4 internal anchors

[1]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Schur number five , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page
[2]

The Thirteenth International Conference on Learning Representations , year =

Learning splitting heuristics in divide-and-conquer SAT solvers with reinforcement learning , author =. The Thirteenth International Conference on Learning Representations , year =

work page
[3]

, author =

Planning as Satisfiability. , author =. ECAI , volume =

work page
[4]

Learning a SAT Solver from Single-Bit Supervision

Learning a SAT solver from single-bit supervision , author =. arXiv preprint arXiv:1802.03685 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[5]

International conference on theory and applications of satisfiability testing , pages =

Guiding high-performance SAT solvers with unsat-core predictions , author =. International conference on theory and applications of satisfiability testing , pages =. 2019 , organization =

work page 2019
[6]

International Conference on Theory and Applications of Satisfiability Testing , pages =

Learning rate based branching heuristic for SAT solvers , author =. International Conference on Theory and Applications of Satisfiability Testing , pages =. 2016 , organization =

work page 2016
[7]

International conference on theory and applications of satisfiability testing , pages =

An empirical study of branching heuristics through the lens of global learning rate , author =. International conference on theory and applications of satisfiability testing , pages =. 2017 , organization =

work page 2017
[8]

International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages =

Unsat solver synthesis via monte carlo forest search , author =. International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages =. 2024 , organization =

work page 2024
[9]

2016 , type =

Armin Biere , title =. 2016 , type =

work page 2016
[10]

2024 , publisher =

Proceedings of sat competition 2024: Solver, benchmark and proof checker descriptions , author =. 2024 , publisher =

work page 2024
[11]

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design , pages =

Conflict driven learning in a quantified Boolean satisfiability solver , author =. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design , pages =

work page 2002
[12]

International Journal on Software Tools for Technology Transfer , volume =

A survey of recent advances in SAT-based formal verification , author =. International Journal on Software Tools for Technology Transfer , volume =. 2005 , publisher =

work page 2005
[13]

Haifa Verification Conference , pages =

Cube and conquer: Guiding CDCL SAT solvers by lookaheads , author =. Haifa Verification Conference , pages =. 2011 , organization =

work page 2011
[14]

2016 , publisher =

Deep learning , author =. 2016 , publisher =

work page 2016
[15]

Congress of the Italian Association for Artificial Intelligence , pages =

Monte-carlo style uct search for boolean satisfiability , author =. Congress of the Italian Association for Artificial Intelligence , pages =. 2011 , organization =

work page 2011
[16]

Communications of the ACM , volume =

A machine program for theorem-proving , author =. Communications of the ACM , volume =. 1962 , publisher =

work page 1962
[17]

Proceedings of the 38th annual Design Automation Conference , pages =

Chaff: Engineering an efficient SAT solver , author =. Proceedings of the 38th annual Design Automation Conference , pages =

work page
[18]

IEEE Transactions on Computational Intelligence and AI in games , volume =

A survey of monte carlo tree search methods , author =. IEEE Transactions on Computational Intelligence and AI in games , volume =. 2012 , publisher =

work page 2012
[19]

European conference on machine learning , pages =

Bandit based monte-carlo planning , author =. European conference on machine learning , pages =. 2006 , organization =

work page 2006
[20]

Proceedings of International Conference on Computer Aided Design , pages =

GRASP-a new search algorithm for satisfiability , author =. Proceedings of International Conference on Computer Aided Design , pages =. 1996 , organization =

work page 1996
[21]

nature , volume =

Mastering the game of go without human knowledge , author =. nature , volume =. 2017 , publisher =

work page 2017
[22]

arXiv preprint arXiv:2401.13770 , year =

AlphaMapleSAT: An MCTS-based cube-and-conquer SAT solver for hard combinatorial problems , author =. arXiv preprint arXiv:2401.13770 , year =

work page arXiv
[23]

arXiv preprint arXiv:2410.07432 , year =

Can Transformers Reason Logically? A Study in SAT Solving , author =. arXiv preprint arXiv:2410.07432 , year =

work page arXiv
[24]

Advances in neural information processing systems , volume =

Attention is all you need , author =. Advances in neural information processing systems , volume =

work page
[25]

Limits of Computation: From a Programming Perspective , pages =

Complete Problems , author =. Limits of Computation: From a Programming Perspective , pages =. 2016 , publisher =

work page 2016
[26]

2009 , publisher =

Handbook of satisfiability , author =. 2009 , publisher =

work page 2009
[27]

Artificial Intelligence

A modern approach , author =. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs , volume =

work page
[28]

Llama-berry: Pairwise optimization for olympiad-level mathematical reasoning via o1-like monte carlo tree search , author =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages =

work page 2025
[29]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Scaling llm test-time compute optimally can be more effective than scaling model parameters , author =. arXiv preprint arXiv:2408.03314 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Advances in Neural Information Processing Systems , volume =

Satlm: Satisfiability-aided language models using declarative prompting , author =. Advances in Neural Information Processing Systems , volume =

work page
[31]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author =. arXiv preprint arXiv:2103.03874 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[32]

arXiv preprint arXiv:2405.10045 , year =

Global benchmark database , author =. arXiv preprint arXiv:2405.10045 , year =

work page arXiv
[33]

2024 , eprint =

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author =. 2024 , eprint =

work page 2024
[34]

Advances in neural information processing systems , volume =

Direct preference optimization: Your language model is secretly a reward model , author =. Advances in neural information processing systems , volume =

work page
[35]

International conference on Tools and Algorithms for the Construction and Analysis of Systems , pages =

Z3: An efficient SMT solver , author =. International conference on Tools and Algorithms for the Construction and Analysis of Systems , pages =. 2008 , organization =

work page 2008
[36]

International Journal on Artificial Intelligence Tools , volume =

On the glucose SAT solver , author =. International Journal on Artificial Intelligence Tools , volume =. 2018 , publisher =

work page 2018
[37]

Minisat-a SAT solver with conflict-clause minimization , author =. Proc. Theory and Applications of Satisfiability Testing (SAT 05) , year =

work page
[38]

Daniel Han, Michael Han and Unsloth team , title =

work page
[39]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , month = oct, year =

Transformers: State-of-the-Art Natural Language Processing , author =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , month = oct, year =

work page 2020
[40]

GitHub repository , publisher =

Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou. GitHub repository , publisher =

work page
[41]

Qwen3 Technical Report

Qwen3 technical report , author =. arXiv preprint arXiv:2505.09388 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[42]

2025 , eprint =

gpt-oss-120b & gpt-oss-20b Model Card , author =. 2025 , eprint =

work page 2025
[43]

2025 , month =

Claude 3.7 Sonnet and Claude Code , author =. 2025 , month =

work page 2025
[44]

2025 , month =

Introducing Claude 4 , author =. 2025 , month =

work page 2025
[45]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year =

Efficient Memory Management for Large Language Model Serving with PagedAttention , author =. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year =

work page
[46]

Annals of mathematics and Artificial Intelligence , volume =

Solving propositional satisfiability problems , author =. Annals of mathematics and Artificial Intelligence , volume =. 1990 , publisher =

work page 1990
[47]

International Conference on Theory and Applications of Satisfiability Testing (SAT) , pages =

March\_eq: Implementing Additional Reasoning into an Efficient Look-Ahead SAT Solver , author =. International Conference on Theory and Applications of Satisfiability Testing (SAT) , pages =. 2006 , publisher =

work page 2006
[48]

Journal of Artificial Intelligence Research , volume =

Community structure in industrial SAT instances , author =. Journal of Artificial Intelligence Research , volume =

work page
[49]

International Conference on Theory and Applications of Satisfiability Testing , pages =

Symmetry and satisfiability: An update , author =. International Conference on Theory and Applications of Satisfiability Testing , pages =. 2010 , organization =

work page 2010
[50]

1995 , publisher =

Improvements to propositional satisfiability search algorithms , author =. 1995 , publisher =

work page 1995
[51]

Journal on Satisfiability, Boolean Modeling and Computation , volume =

Translating Pseudo-Boolean Constraints into SAT , author =. Journal on Satisfiability, Boolean Modeling and Computation , volume =

work page
[52]

Electronic Notes in Discrete Mathematics , volume =

Exploiting the real power of unit propagation lookahead , author =. Electronic Notes in Discrete Mathematics , volume =. 2001 , publisher =

work page 2001
[53]

Educational and psychological measurement , volume =

A coefficient of agreement for nominal scales , author =. Educational and psychological measurement , volume =. 1960 , publisher =

work page 1960
[54]

Psychological Bulletin , volume =

Measuring nominal scale agreement among many raters , author =. Psychological Bulletin , volume =. 1971 , publisher =

work page 1971
[55]

Journal of artificial intelligence research , volume =

SATzilla: portfolio-based algorithm selection for SAT , author =. Journal of artificial intelligence research , volume =

work page
[56]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =

Grass: Combining graph neural networks with expert knowledge for sat solver selection , author =. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =

work page

[1] [1]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Schur number five , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page

[2] [2]

The Thirteenth International Conference on Learning Representations , year =

Learning splitting heuristics in divide-and-conquer SAT solvers with reinforcement learning , author =. The Thirteenth International Conference on Learning Representations , year =

work page

[3] [3]

, author =

Planning as Satisfiability. , author =. ECAI , volume =

work page

[4] [4]

Learning a SAT Solver from Single-Bit Supervision

Learning a SAT solver from single-bit supervision , author =. arXiv preprint arXiv:1802.03685 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

International conference on theory and applications of satisfiability testing , pages =

Guiding high-performance SAT solvers with unsat-core predictions , author =. International conference on theory and applications of satisfiability testing , pages =. 2019 , organization =

work page 2019

[6] [6]

International Conference on Theory and Applications of Satisfiability Testing , pages =

Learning rate based branching heuristic for SAT solvers , author =. International Conference on Theory and Applications of Satisfiability Testing , pages =. 2016 , organization =

work page 2016

[7] [7]

International conference on theory and applications of satisfiability testing , pages =

An empirical study of branching heuristics through the lens of global learning rate , author =. International conference on theory and applications of satisfiability testing , pages =. 2017 , organization =

work page 2017

[8] [8]

International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages =

Unsat solver synthesis via monte carlo forest search , author =. International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research , pages =. 2024 , organization =

work page 2024

[9] [9]

2016 , type =

Armin Biere , title =. 2016 , type =

work page 2016

[10] [10]

2024 , publisher =

Proceedings of sat competition 2024: Solver, benchmark and proof checker descriptions , author =. 2024 , publisher =

work page 2024

[11] [11]

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design , pages =

Conflict driven learning in a quantified Boolean satisfiability solver , author =. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design , pages =

work page 2002

[12] [12]

International Journal on Software Tools for Technology Transfer , volume =

A survey of recent advances in SAT-based formal verification , author =. International Journal on Software Tools for Technology Transfer , volume =. 2005 , publisher =

work page 2005

[13] [13]

Haifa Verification Conference , pages =

Cube and conquer: Guiding CDCL SAT solvers by lookaheads , author =. Haifa Verification Conference , pages =. 2011 , organization =

work page 2011

[14] [14]

2016 , publisher =

Deep learning , author =. 2016 , publisher =

work page 2016

[15] [15]

Congress of the Italian Association for Artificial Intelligence , pages =

Monte-carlo style uct search for boolean satisfiability , author =. Congress of the Italian Association for Artificial Intelligence , pages =. 2011 , organization =

work page 2011

[16] [16]

Communications of the ACM , volume =

A machine program for theorem-proving , author =. Communications of the ACM , volume =. 1962 , publisher =

work page 1962

[17] [17]

Proceedings of the 38th annual Design Automation Conference , pages =

Chaff: Engineering an efficient SAT solver , author =. Proceedings of the 38th annual Design Automation Conference , pages =

work page

[18] [18]

IEEE Transactions on Computational Intelligence and AI in games , volume =

A survey of monte carlo tree search methods , author =. IEEE Transactions on Computational Intelligence and AI in games , volume =. 2012 , publisher =

work page 2012

[19] [19]

European conference on machine learning , pages =

Bandit based monte-carlo planning , author =. European conference on machine learning , pages =. 2006 , organization =

work page 2006

[20] [20]

Proceedings of International Conference on Computer Aided Design , pages =

GRASP-a new search algorithm for satisfiability , author =. Proceedings of International Conference on Computer Aided Design , pages =. 1996 , organization =

work page 1996

[21] [21]

nature , volume =

Mastering the game of go without human knowledge , author =. nature , volume =. 2017 , publisher =

work page 2017

[22] [22]

arXiv preprint arXiv:2401.13770 , year =

AlphaMapleSAT: An MCTS-based cube-and-conquer SAT solver for hard combinatorial problems , author =. arXiv preprint arXiv:2401.13770 , year =

work page arXiv

[23] [23]

arXiv preprint arXiv:2410.07432 , year =

Can Transformers Reason Logically? A Study in SAT Solving , author =. arXiv preprint arXiv:2410.07432 , year =

work page arXiv

[24] [24]

Advances in neural information processing systems , volume =

Attention is all you need , author =. Advances in neural information processing systems , volume =

work page

[25] [25]

Limits of Computation: From a Programming Perspective , pages =

Complete Problems , author =. Limits of Computation: From a Programming Perspective , pages =. 2016 , publisher =

work page 2016

[26] [26]

2009 , publisher =

Handbook of satisfiability , author =. 2009 , publisher =

work page 2009

[27] [27]

Artificial Intelligence

A modern approach , author =. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs , volume =

work page

[28] [28]

Llama-berry: Pairwise optimization for olympiad-level mathematical reasoning via o1-like monte carlo tree search , author =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages =

work page 2025

[29] [29]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Scaling llm test-time compute optimally can be more effective than scaling model parameters , author =. arXiv preprint arXiv:2408.03314 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Advances in Neural Information Processing Systems , volume =

Satlm: Satisfiability-aided language models using declarative prompting , author =. Advances in Neural Information Processing Systems , volume =

work page

[31] [31]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author =. arXiv preprint arXiv:2103.03874 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

arXiv preprint arXiv:2405.10045 , year =

Global benchmark database , author =. arXiv preprint arXiv:2405.10045 , year =

work page arXiv

[33] [33]

2024 , eprint =

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author =. 2024 , eprint =

work page 2024

[34] [34]

Advances in neural information processing systems , volume =

Direct preference optimization: Your language model is secretly a reward model , author =. Advances in neural information processing systems , volume =

work page

[35] [35]

International conference on Tools and Algorithms for the Construction and Analysis of Systems , pages =

Z3: An efficient SMT solver , author =. International conference on Tools and Algorithms for the Construction and Analysis of Systems , pages =. 2008 , organization =

work page 2008

[36] [36]

International Journal on Artificial Intelligence Tools , volume =

On the glucose SAT solver , author =. International Journal on Artificial Intelligence Tools , volume =. 2018 , publisher =

work page 2018

[37] [37]

Minisat-a SAT solver with conflict-clause minimization , author =. Proc. Theory and Applications of Satisfiability Testing (SAT 05) , year =

work page

[38] [38]

Daniel Han, Michael Han and Unsloth team , title =

work page

[39] [39]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , month = oct, year =

Transformers: State-of-the-Art Natural Language Processing , author =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , month = oct, year =

work page 2020

[40] [40]

GitHub repository , publisher =

Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou. GitHub repository , publisher =

work page

[41] [41]

Qwen3 Technical Report

Qwen3 technical report , author =. arXiv preprint arXiv:2505.09388 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

2025 , eprint =

gpt-oss-120b & gpt-oss-20b Model Card , author =. 2025 , eprint =

work page 2025

[43] [43]

2025 , month =

Claude 3.7 Sonnet and Claude Code , author =. 2025 , month =

work page 2025

[44] [44]

2025 , month =

Introducing Claude 4 , author =. 2025 , month =

work page 2025

[45] [45]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year =

Efficient Memory Management for Large Language Model Serving with PagedAttention , author =. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year =

work page

[46] [46]

Annals of mathematics and Artificial Intelligence , volume =

Solving propositional satisfiability problems , author =. Annals of mathematics and Artificial Intelligence , volume =. 1990 , publisher =

work page 1990

[47] [47]

International Conference on Theory and Applications of Satisfiability Testing (SAT) , pages =

March\_eq: Implementing Additional Reasoning into an Efficient Look-Ahead SAT Solver , author =. International Conference on Theory and Applications of Satisfiability Testing (SAT) , pages =. 2006 , publisher =

work page 2006

[48] [48]

Journal of Artificial Intelligence Research , volume =

Community structure in industrial SAT instances , author =. Journal of Artificial Intelligence Research , volume =

work page

[49] [49]

International Conference on Theory and Applications of Satisfiability Testing , pages =

Symmetry and satisfiability: An update , author =. International Conference on Theory and Applications of Satisfiability Testing , pages =. 2010 , organization =

work page 2010

[50] [50]

1995 , publisher =

Improvements to propositional satisfiability search algorithms , author =. 1995 , publisher =

work page 1995

[51] [51]

Journal on Satisfiability, Boolean Modeling and Computation , volume =

Translating Pseudo-Boolean Constraints into SAT , author =. Journal on Satisfiability, Boolean Modeling and Computation , volume =

work page

[52] [52]

Electronic Notes in Discrete Mathematics , volume =

Exploiting the real power of unit propagation lookahead , author =. Electronic Notes in Discrete Mathematics , volume =. 2001 , publisher =

work page 2001

[53] [53]

Educational and psychological measurement , volume =

A coefficient of agreement for nominal scales , author =. Educational and psychological measurement , volume =. 1960 , publisher =

work page 1960

[54] [54]

Psychological Bulletin , volume =

Measuring nominal scale agreement among many raters , author =. Psychological Bulletin , volume =. 1971 , publisher =

work page 1971

[55] [55]

Journal of artificial intelligence research , volume =

SATzilla: portfolio-based algorithm selection for SAT , author =. Journal of artificial intelligence research , volume =

work page

[56] [56]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =

Grass: Combining graph neural networks with expert knowledge for sat solver selection , author =. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =

work page