Blocked Gibbs meets Diffusion Transformers: Unsupervised Learning for Constraint Optimization
Pith reviewed 2026-06-30 11:48 UTC · model grok-4.3
The pith
Blocked Gaussian denoising in diffusion transformers enables large targeted edits for constraint optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BloGDiT replaces standard joint Gaussian denoising with blocked Gaussian denoising in a Diffusion Transformer. It performs iterative block resampling and anneals the block size over time to allow large, targeted edits within blocks of variables. This lets the model address constraint optimization problems with general discrete variables and global reasoning needs, achieving performance that matches or exceeds existing methods on Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut.
What carries the argument
Blocked Gaussian denoising, which replaces joint denoising across all variables with resampling of variable blocks, combined with iterative application and annealing of block size.
If this is right
- BloGDiT extends diffusion models beyond binary variables to general discrete variables.
- It supports constraint structures that require global rather than local reasoning.
- The method matches or outperforms prior approaches on Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut.
- Blocked Gibbs-style diffusion supplies an effective inductive bias for Transformer models in constraint satisfaction and optimization.
Where Pith is reading between the lines
- The success on these four problems suggests the blocked approach could transfer to other combinatorial optimization tasks with similar structure.
- Using transformers instead of graph neural networks may allow handling larger or denser constraint graphs without the locality bias of message passing.
- Annealing the block size could be tuned jointly with the diffusion noise schedule to further improve convergence on hard instances.
Load-bearing premise
The main cause of poor performance in prior diffusion models is the mismatch between small incremental denoising and the need for large targeted variable changes, and that introducing blocked resampling will fix this without creating new issues around solution feasibility or optimality.
What would settle it
Running the same Diffusion Transformer architecture on the test problems but using standard joint Gaussian denoising instead of blocked denoising, and finding that it performs equally well, would show that the blocking step is not necessary.
Figures
read the original abstract
Diffusion models have shown promise in learning to solve constraint optimization problems. However, they are mostly restricted to problems with binary variables and rely on graph neural networks, hindering their application to a broader range of problems such as those with general discrete variables or constraint structures that necessitate global rather than local reasoning. We investigate the use of Diffusion Transformers to address the aforementioned limitations. A naive implementation performs poorly due to a fundamental mismatch between the standard diffusion process and constraint solving: while the former applies small, incremental denoising across all variables, the latter requires substantially altering specific subsets of variables to attain feasibility or optimality. Our method, Blocked Gibbs Diffusion Transformer (BloGDiT), is the first to address this limitation by replacing standard joint Gaussian denoising with blocked Gaussian denoising. BloGDiT uses iterative block resampling and anneals the block size over time to facilitate large, targeted edits within a block of variables. Across Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut, BloGDiT matches or outperforms existing methods, demonstrating that blocked Gibbs-style diffusion provides a highly effective inductive bias for Transformer-based constraint satisfaction and optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BloGDiT, a Diffusion Transformer variant for unsupervised constraint optimization. It argues that standard joint Gaussian denoising is mismatched to constraint solving because it performs only small incremental changes across all variables, whereas feasibility and optimality require large targeted edits to subsets of variables. BloGDiT replaces this with blocked Gaussian denoising, iterative block resampling, and annealing of block size over time. The method is evaluated on Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut, where it is reported to match or outperform prior approaches.
Significance. If the central empirical claims hold after proper controls, the work would demonstrate that a blocked-Gibbs inductive bias can be effectively combined with Transformer backbones for discrete combinatorial problems, extending diffusion models beyond binary variables and GNN architectures. The approach supplies a concrete mechanism (block resampling + annealing) that could be reusable in other generative models for constraint satisfaction.
major comments (3)
- [Abstract, §4] Abstract and §4 (Experiments): the claim that BloGDiT 'matches or outperforms existing methods' is presented without any quantitative tables, baselines, error bars, number of runs, or statistical tests. This prevents verification of the performance statements that are load-bearing for the paper's contribution.
- [§3] §3 (Method): the central argument that the incremental nature of joint denoising is the primary cause of prior diffusion models' underperformance is not isolated. No ablation is described that trains an otherwise identical Diffusion Transformer (same backbone, loss, discretization) with standard joint denoising and shows it fails for the stated reason; without this, the necessity of blocked resampling remains unestablished.
- [§3.2, §4] §3.2 and §4: the block-size annealing schedule and iterative resampling procedure are introduced as the key fix, yet no sensitivity analysis or comparison to fixed-block or non-annealed variants is reported. If these design choices are not load-bearing, the claimed inductive-bias advantage over standard DiT weakens.
minor comments (2)
- [§3] Notation for the blocked denoising distribution and the precise definition of block resampling should be formalized with equations rather than prose descriptions.
- [§2, §3] The manuscript should include a clear statement of the discrete-to-continuous mapping used for the general (non-binary) variables in the four problem domains.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights opportunities to strengthen the empirical support and ablations. We address each major comment below and will revise the manuscript to incorporate the requested elements.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Experiments): the claim that BloGDiT 'matches or outperforms existing methods' is presented without any quantitative tables, baselines, error bars, number of runs, or statistical tests. This prevents verification of the performance statements that are load-bearing for the paper's contribution.
Authors: We agree that explicit quantitative tables with baselines, error bars, run counts, and statistical tests are necessary for verification. In the revision we will add a dedicated results table (or tables) for each task (Sudoku, Graph Coloring, MIS, MaxCut) reporting mean performance and standard deviation over multiple independent runs (minimum 5 seeds), direct comparisons to all cited baselines, and p-values or confidence intervals where appropriate. This will make the 'matches or outperforms' statements directly verifiable. revision: yes
-
Referee: [§3] §3 (Method): the central argument that the incremental nature of joint denoising is the primary cause of prior diffusion models' underperformance is not isolated. No ablation is described that trains an otherwise identical Diffusion Transformer (same backbone, loss, discretization) with standard joint denoising and shows it fails for the stated reason; without this, the necessity of blocked resampling remains unestablished.
Authors: We acknowledge that an explicit ablation isolating joint versus blocked denoising on an otherwise identical DiT backbone would more rigorously establish necessity. While the manuscript provides theoretical motivation and end-to-end performance gains, we will add this controlled ablation in the revised version, training a standard joint-denoising DiT with the same architecture, loss, and discretization schedule and reporting its performance degradation relative to BloGDiT on the same benchmarks. revision: yes
-
Referee: [§3.2, §4] §3.2 and §4: the block-size annealing schedule and iterative resampling procedure are introduced as the key fix, yet no sensitivity analysis or comparison to fixed-block or non-annealed variants is reported. If these design choices are not load-bearing, the claimed inductive-bias advantage over standard DiT weakens.
Authors: We agree that sensitivity analysis is required to confirm the load-bearing nature of annealing and iterative resampling. In the revision we will add experiments comparing BloGDiT against (i) fixed-block-size variants and (ii) non-annealed (constant block size) variants, reporting performance deltas on the four tasks to demonstrate that these components materially affect results. revision: yes
Circularity Check
No circularity: empirical method with no self-referential derivations or load-bearing self-citations
full rationale
The paper introduces BloGDiT as a practical replacement of joint Gaussian denoising with blocked resampling and annealing to address an inductive bias mismatch in diffusion models for constraint problems. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or description that would reduce the central claim to a definition or input by construction. Performance is asserted via benchmark comparisons (Sudoku, Graph Coloring, etc.), which are external and falsifiable. The derivation chain is therefore self-contained against external benchmarks with no detectable circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Learning what to defer for maximum in- dependent sets
Sungsoo Ahn, Younggyo Seo, and Jinwoo Shin. Learning what to defer for maximum in- dependent sets. InInternational conference on machine learning, pages 134–144. PMLR, 2020
2020
-
[2]
On the bottleneck of graph neural networks and its practical implications
Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. InInternational Conference on Learning Representations, 2021. URL https: //openreview.net/forum?id=i80OPhOCVH2
2021
-
[3]
Interleaved gibbs diffusion for constrained generation
Gautham Govind Anil, Sachin Yadav, Dheeraj Mysore Nagaraj, Karthikeyan Shanmugam, and Prateek Jain. Interleaved gibbs diffusion for constrained generation. InICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, 2025
2025
-
[4]
Block diffusion: Interpolating between autoregressive and diffusion language models
Marianne Arriola, Subham Sekhar Sahoo, Aaron Gokaslan, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Justin T Chiu, and V olodymyr Kuleshov. Block diffusion: Interpolating between autoregressive and diffusion language models. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=tyEyYT267x
2025
-
[5]
Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021
Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021
2021
-
[6]
Exploratory combi- natorial optimization with reinforcement learning
Thomas Barrett, William Clements, Jakob Foerster, and Alex Lvovsky. Exploratory combi- natorial optimization with reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 3243–3250, 2020
2020
-
[7]
Thomas D Barrett, Christopher WF Parsonson, and Alexandre Laterre. Learning to solve combi- natorial graph partitioning problems via efficient exploration.arXiv preprint arXiv:2205.14105, 2022
-
[8]
Global Constraint Catalog, 2005
Nicolas Beldiceanu, Mats Carlsson, and Jean-Xavier Rampon. Global Constraint Catalog, 2005. URLhttps://hal.science/hal-00485396. Research Report SICS T2005-08
2005
-
[9]
What’s wrong with deep learning in tree search for combinatorial optimization
Maximilian Böther, Otto Kißig, Martin Taraz, Sarel Cohen, Karen Seidel, and Tobias Friedrich. What’s wrong with deep learning in tree search for combinatorial optimization. InInternational Conference on Learning Representations, 2022. URL https://openreview.net/forum? id=mk0HzdqY7i1
2022
-
[10]
End-to-end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InEuropean conference on computer vision, pages 213–229. Springer, 2020
2020
-
[11]
Analog bits: Generating discrete data using diffusion models with self-conditioning
Ting Chen, Ruixiang ZHANG, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=3itjR9QxFw
2023
-
[12]
Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc
Yilun Du, Conor Durkan, Robin Strudel, Joshua B Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Sussman Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. InInternational conference on machine learning, pages 8489–8510. PMLR, 2023
2023
-
[13]
Tenenbaum
Yilun Du, Jiayuan Mao, and Joshua B. Tenenbaum. Learning iterative reasoning through energy diffusion. InInternational Conference on Machine Learning (ICML), 2024
2024
-
[14]
Regularized langevin dynamics for combinatorial opti- mization
Shengyu Feng and Yiming Yang. Regularized langevin dynamics for combinatorial opti- mization. InForty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=bbJ0QCujU4
2025
-
[15]
Sampling-based approaches to calculating marginal densities.Journal of the American statistical association, 85(410):398–409, 1990
Alan E Gelfand and Adrian FM Smith. Sampling-based approaches to calculating marginal densities.Journal of the American statistical association, 85(410):398–409, 1990
1990
-
[16]
Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on pattern analysis and machine intelligence, (6): 721–741, 1984
Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on pattern analysis and machine intelligence, (6): 721–741, 1984. 12
1984
-
[17]
A tutorial on formulating and using qubo models.arXiv preprint arXiv:1811.11538, 2018
Fred Glover, Gary Kochenberger, and Yu Du. A tutorial on formulating and using qubo models. arXiv preprint arXiv:1811.11538, 2018
-
[18]
Exploring network structure, dynamics, and function using networkx
Aric A Hagberg, Daniel A Schult, and Pieter J Swart. Exploring network structure, dynamics, and function using networkx. InProceedings of the Python in Science Conference, pages 11–15. SciPy, 2008
2008
-
[19]
Monte carlo sampling methods using markov chains and their applications
W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97, 1970
1970
-
[20]
Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
2020
-
[21]
Blocking gibbs sampling in very large probabilistic expert systems.International Journal of Human-Computer Studies, 42(6):647–666, 1995
Claus S Jensen, Uffe Kjærulff, and Augustine Kong. Blocking gibbs sampling in very large probabilistic expert systems.International Journal of Human-Computer Studies, 42(6):647–666, 1995
1995
-
[22]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[23]
Kakade, and Sitan Chen
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham M. Kakade, and Sitan Chen. Train for the worst, plan for the best: Understanding token ordering in masked diffusions. InForty-second International Conference on Machine Learning, 2025. URL https://openreview.net/ forum?id=DjJmre5IkP
2025
-
[24]
Tackling the abstraction and reasoning corpus with vision transformers: the importance of 2d representation, positions, and objects.Transactions on Machine Learning Research, 2025
Wenhao Li, Yudong Xu, Scott Sanner, and Elias Boutros Khalil. Tackling the abstraction and reasoning corpus with vision transformers: the importance of 2d representation, positions, and objects.Transactions on Machine Learning Research, 2025
2025
-
[25]
Combinatorial optimization with graph convolu- tional networks and guided tree search.Advances in neural information processing systems, 31, 2018
Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Combinatorial optimization with graph convolu- tional networks and guided tree search.Advances in neural information processing systems, 31, 2018
2018
-
[26]
Feasibility jump: an lp-free lagrangian mip heuristic
Bjørnar Luteberget and Giorgio Sartor. Feasibility jump: an lp-free lagrangian mip heuristic. Mathematical Programming Computation, 15(2):365–388, 2023
2023
-
[27]
Benchmarking the max-cut problem on the simulated bifurcation machine
Yoshiki Matsuda. Benchmarking the max-cut problem on the simulated bifurcation machine. Benchmarking the MAX-CUT problem on the Simulated Bifurcation Machine, 2019
2019
-
[28]
SDEdit: Guided image synthesis and editing with stochastic differential equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=aBsCjcPu_tE
2022
-
[29]
A simple qubo formulation of sudoku
Sascha Mücke. A simple qubo formulation of sudoku. InProceedings of the Genetic and Evolutionary Computation Conference Companion, pages 1958–1962, 2024
1958
-
[30]
Graph neural networks exponentially lose expressive power for node classification
Kenta Oono and Taiji Suzuki. Graph neural networks exponentially lose expressive power for node classification. InInternational Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1ldO2EFPr
2020
-
[31]
Recurrent relational networks.Advances in neural information processing systems, 31, 2018
Rasmus Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks.Advances in neural information processing systems, 31, 2018
2018
-
[32]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
2023
-
[33]
Laurent Perron and Frédéric Didier. CP-SAT. URL https://developers.google.com/ optimization/cp/cp_solver/
-
[34]
Large neighborhood search
David Pisinger and Stefan Ropke. Large neighborhood search. InHandbook of metaheuristics, pages 99–127. Springer, 2018. 13
2018
-
[35]
A filtering algorithm for constraints of difference in csps
Jean-Charles Régin. A filtering algorithm for constraints of difference in csps. InAAAI, volume 94, pages 362–367, 1994
1994
-
[36]
Elsevier, 2006
Francesca Rossi, Peter Van Beek, and Toby Walsh.Handbook of constraint programming. Elsevier, 2006
2006
-
[37]
Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024
Subham Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024
2024
-
[38]
A diffusion model framework for unsupervised neural combinatorial optimization
Sebastian Sanokowski, Sepp Hochreiter, and Sebastian Lehner. A diffusion model framework for unsupervised neural combinatorial optimization. InForty-first International Conference on Machine Learning, 2024. URLhttps://openreview.net/forum?id=AFfXlKFHXJ
2024
-
[39]
Scalable discrete diffusion samplers: Combinatorial optimization and statistical physics
Sebastian Sanokowski, Wilhelm Franz Berghammer, Haoyu Peter Wang, Martin Ennemoser, Sepp Hochreiter, and Sebastian Lehner. Scalable discrete diffusion samplers: Combinatorial optimization and statistical physics. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=peNgxpbdxB
2025
-
[40]
Using constraint programming and local search methods to solve vehicle routing problems
Paul Shaw. Using constraint programming and local search methods to solve vehicle routing problems. InInternational conference on principles and practice of constraint programming, pages 417–431. Springer, 1998
1998
-
[41]
Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, and Lennart Svensson. Mcmc-correction of score-based diffusion models for model composition.arXiv preprint arXiv:2307.14012, 2023
-
[42]
Deep unsuper- vised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015
2015
-
[43]
The minizinc challenge 2008–2013.AI magazine, 35(2):55–60, 2014
Peter J Stuckey, Thibaut Feydy, Andreas Schutt, Guido Tack, and Julien Fischer. The minizinc challenge 2008–2013.AI magazine, 35(2):55–60, 2014
2008
-
[44]
Re- visiting sampling for combinatorial optimization
Haoran Sun, Katayoon Goshvadi, Azade Nova, Dale Schuurmans, and Hanjun Dai. Re- visiting sampling for combinatorial optimization. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceed- ings of the 40th International Conference on Machine Learning, volume 202 ofProceed- ings of Machine Le...
2023
-
[45]
Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023
Zhiqing Sun and Yiming Yang. Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023
2023
-
[46]
Graph neural networks for maximum constraint satisfaction.Frontiers in artificial intelligence, 3:580607, 2021
Jan Toenshoff, Martin Ritzert, Hinrikus Wolf, and Martin Grohe. Graph neural networks for maximum constraint satisfaction.Frontiers in artificial intelligence, 3:580607, 2021
2021
-
[47]
Jan Tönshoff, Berke Kisin, Jakob Lindner, and Martin Grohe. One model, any csp: graph neural networks as fast global search heuristics for constraint satisfaction. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI ’23, 2023. ISBN 978-1-956792-03-4. doi: 10.24963/ijcai.2023/476. URL https://doi.org/10.24963...
-
[48]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
2017
-
[49]
Remasking discrete diffusion models with inference-time scaling
Guanghan Wang, Yair Schiff, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Remasking discrete diffusion models with inference-time scaling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum? id=IJryQAOy0p. 14
2025
-
[50]
Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver
Po-Wei Wang, Priya Donti, Bryan Wilder, and Zico Kolter. Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. InInternational Conference on Machine Learning, pages 6545–6554. PMLR, 2019
2019
-
[51]
Energy-based diffusion generator for efficient sampling of boltzmann distributions.Neural Networks, page 108126, 2025
Yan Wang, Ling Guo, Hao Wu, and Tao Zhou. Energy-based diffusion generator for efficient sampling of boltzmann distributions.Neural Networks, page 108126, 2025
2025
-
[52]
John Wiley & Sons, 2020
Laurence A Wolsey.Integer programming. John Wiley & Sons, 2020
2020
-
[53]
A simple model to generate hard satisfiable instances
Ke Xu, Frederic Boussemart, Fred Hemery, and Christophe Lecoutre. A simple model to generate hard satisfiable instances. InProceedings of the 19th international joint conference on Artificial intelligence, pages 337–342, 2005
2005
-
[54]
Self-supervised transformers as iterative solution improvers for constraint satisfaction
Yudong Xu, Wenhao Li, Scott Sanner, and Elias Boutros Khalil. Self-supervised transformers as iterative solution improvers for constraint satisfaction. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 69432–69450. PMLR, 13–19 Jul 2025
2025
-
[55]
Large neighborhood search meets iterative neural constraint heuristics
Yudong Xu, Wenhao Li, Scott Sanner, and Elias Boutros Khalil. Large neighborhood search meets iterative neural constraint heuristics. InThe 23rd International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, 2026
2026
-
[56]
Learning to solve constraint satisfaction problems with recurrent transformer
Zhun Yang, Adam Ishay, and Joohyung Lee. Learning to solve constraint satisfaction problems with recurrent transformer. InThe Eleventh International Conference on Learning Representa- tions, 2023. URLhttps://openreview.net/forum?id=udNhDCr2KQe
2023
-
[57]
Beyond autoregression: Discrete diffusion for complex reasoning and planning
Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Beyond autoregression: Discrete diffusion for complex reasoning and planning. In The Thirteenth International Conference on Learning Representations, 2025. URL https: //openreview.net/forum?id=NRYgUzSPZz
2025
-
[58]
The gset datasethttps://web.stanford.edu/~yyye/yyye/Gset/, 2003
Yinyu Ye. The gset datasethttps://web.stanford.edu/~yyye/yyye/Gset/, 2003
2003
-
[59]
Let the flows tell: Solving graph combinatorial problems with gflownets.Advances in neural information processing systems, 36:11952–11969, 2023
Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron C Courville, Yoshua Bengio, and Ling Pan. Let the flows tell: Solving graph combinatorial problems with gflownets.Advances in neural information processing systems, 36:11952–11969, 2023
2023
-
[60]
TX t=1 logq θ(Xt−1 |X t, mt) # | {z } (2) Reverse entropy / reconstruction (11) −E X0:T ,m1:T ∼qθ
Hang Zhao, Kexiong Yu, Yuhang Huang, Renjiao Yi, Chenyang Zhu, and Kai Xu. Disco: Efficient diffusion solver for large-scale combinatorial optimization problems.arXiv preprint arXiv:2406.19705, 2024. 15 A Loss derivation To derive the objective detailed in Equation (7), we first expand the reverse KL divergence over the joint distribution: KL qθ(X0:T , m1...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.