arxiv: 2605.13935 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.CL

Recognition: 2 theorem links

· Lean Theorem

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Saba Ahmadi , Prasanna Parthasarathi , Yufei Cui

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords diffusion language modelstrajectory balancepost-trainingtrajectory lockingreward maximizationmathematical reasoningcode generation

0 comments

The pith

A trajectory-balance objective stops diffusion language models from locking onto narrow denoising paths during post-training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Diffusion language models lose coverage of valid solutions when post-trained with standard reward-maximizing updates, because sampled trajectories concentrate on a small set of denoising paths. TraFL replaces this with a trajectory-balance objective that tilts the model toward higher-reward outputs while keeping the distribution anchored to a frozen reference model. The method is made practical for diffusion models through a sequence-level surrogate loss and a learned prompt-dependent normalization term. Experiments across mathematical reasoning and code generation benchmarks show that TraFL alone improves over the base model in every length setting, and that these gains remain when more samples are drawn at inference time. The same pattern holds on held-out evaluations such as Minerva Math and every difficulty split of LiveCodeBench.

Core claim

Standard reward-driven post-training for diffusion language models produces trajectory locking by over-concentrating probability mass on narrow sets of denoising paths and thereby reducing coverage of alternative correct solutions. TraFL corrects this by training the policy to match a reward-tilted target distribution that is anchored to a frozen reference model. The objective is realized with a diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization. On mathematical reasoning and code generation tasks, TraFL is the only evaluated post-training method that raises performance above the base model in every benchmark-length setting, with gains that persist as

What carries the argument

TraFL, a trajectory-balance objective that trains the policy toward a reward-tilted target distribution anchored to a frozen reference model, implemented via a diffusion-compatible sequence-level surrogate and learned prompt-dependent normalization.

If this is right

Diffusion language models can receive post-training that raises reasoning accuracy while preserving coverage of multiple valid solution paths.
Performance advantages from TraFL remain or grow when inference draws more samples, unlike mode-seeking alternatives.
Improvements transfer to held-out mathematical and code benchmarks without retraining.
TraFL is the only tested post-training approach that beats the base model uniformly across all length settings and sampling budgets examined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same balancing principle could be tested on other non-autoregressive generators that suffer from path concentration.
A reference-model anchor may allow TraFL to be combined with existing alignment pipelines without extra hyper-parameter search.
Practitioners could apply the method to tasks that reward multiple distinct outputs rather than a single best answer.
Scaling the learned normalization to longer contexts or larger models would test whether the surrogate remains faithful.

Load-bearing premise

The sequence-level surrogate and prompt-dependent normalization accurately approximate the trajectory-balance objective without creating new collapse modes or needing task-specific tuning.

What would settle it

If TraFL performance falls below the base model on a held-out benchmark once the sampling budget exceeds a modest threshold while other methods continue to improve, the claim that gains persist with increased sampling would be falsified.

Figures

Figures reproduced from arXiv: 2605.13935 by Prasanna Parthasarathi, Saba Ahmadi, Yufei Cui.

**Figure 1.** Figure 1: TraFL improves over the base model and strong prior post-training methods across sampling budgets and temperatures. (a) Average Pass@k of TraFL on GSM8K, MATH-500, HumanEval, and MBPP for T ∈ {0.3, 0.6, 0.9}. (b) Pass@k gap to LLaDA-8B-Instruct under matched decoding. (c) Baseline comparison at T = 0.6. TraFL leads ESPO, JustGRPO, and the base model from Pass@1 through Pass@16. All results use n = 16 sampl… view at source ↗

**Figure 2.** Figure 2: TraFL improves Pass@5 across math and coding benchmarks at comparable denoising cost. Pass@5 on GSM8K, MATH-500, HumanEval, and MBPP at maximum completion lengths 256 and 512 (n = 16, T = 0.6). Error bars show standard error. of the learned policies. LLaDA-8B-Instruct, ESPO, and TraFL use comparable denoising budgets, while JustGRPO terminates much earlier without consistent Pass@5 gains. Shorter denoising… view at source ↗

**Figure 3.** Figure 3: Gains transfer to held-out math and coding benchmarks. Pass@5 on Minerva Math (left) using the GSM8K-trained and MATH-trained checkpoints, and on LiveCodeBench (right) by difficulty split, both at maximum completion lengths 256 and 512. On Minerva Math, TraFL stays above the base model at both lengths and substantially outperforms JustGRPO for both GSM8K-trained and MATH-trained checkpoints. ESPO is strong… view at source ↗

read the original abstract

Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled reward-driven updates over-concentrate probability mass onto a narrow set of denoising paths, reducing coverage of alternative correct solutions under repeated sampling. To address this, we propose TraFL (Trajectory Flow baLancing), a trajectory-balance objective that trains the policy toward a reward-tilted target distribution anchored to a frozen reference model. We make this practical for diffusion language models with a diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization. Across mathematical reasoning and code generation benchmarks, TraFL is the only evaluated post-training method that improves over the base model in every benchmark-length setting, with gains that persist as the sampling budget increases. The improvements transfer to held-out evaluations: TraFL stays above the base model on Minerva Math and is the strongest method on every LiveCodeBench difficulty split.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TraFL targets trajectory locking in diffusion LMs via a balance objective and shows the only consistent gains over the base model, but the surrogate approximation lacks direct checks.

read the letter

The main point is that this paper names trajectory locking as a distinct failure mode in reward-driven post-training of diffusion language models, where updates narrow down to few denoising paths. TraFL counters it with a trajectory-balance objective adapted through a sequence-level surrogate and prompt-dependent normalization, and the results claim it is the sole method that beats the base across every benchmark and length while holding up at higher sampling budgets.

Referee Report

2 major / 2 minor

Summary. The paper identifies trajectory locking as a failure mode in reward-maximizing post-training of diffusion language models, where updates over-concentrate on narrow denoising paths. It proposes TraFL, a trajectory-balance objective anchored to a frozen reference model, implemented via a diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization. Experiments on math reasoning and code generation benchmarks show TraFL as the only evaluated method that improves over the base model in every benchmark-length setting, with gains persisting at higher sampling budgets and transferring to held-out evaluations like Minerva Math and LiveCodeBench splits.

Significance. If the surrogate faithfully implements trajectory balancing without introducing new collapse modes, the result would be significant for diffusion LM post-training: it provides a non-mode-seeking alternative that improves coverage and robustness to sampling budget, grounded by the frozen reference. The consistent outperformance across settings and transfer to held-out tasks would strengthen the case for trajectory-balance objectives over standard RLHF-style methods in this architecture.

major comments (2)

[§3.2] §3.2 (sequence-level surrogate): the central claim that the diffusion-compatible surrogate plus learned normalization implements the trajectory-balance objective without new collapse modes lacks direct validation. No diagnostics (e.g., KL divergence, coverage metrics, or path-probability comparisons) are reported between the surrogate-induced distribution and the true reward-tilted target, especially for long sequences or out-of-distribution prompts; this makes it unclear whether observed robustness to sampling budget arises from genuine balancing or from the normalization term acting as regularization.
[Table 3, §5.1] Table 3 and §5.1 (benchmark results): the headline result that TraFL alone improves over the base model in every setting depends on the surrogate claim, yet the paper reports no statistical significance tests, exact baseline implementations, or ablation removing the prompt-dependent normalization. Without these, it is difficult to isolate whether gains are load-bearing for trajectory balance or artifacts of the learned normalization.

minor comments (2)

[§3.3] Notation for the learned normalization factor is introduced without an explicit equation number; adding one would clarify its dependence on the prompt.
[Figure 2] Figure 2 caption does not specify the exact sampling budgets used for the coverage curves, making it hard to reproduce the persistence claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment below and outline revisions to strengthen the validation of the surrogate and the statistical presentation of results.

read point-by-point responses

Referee: [§3.2] §3.2 (sequence-level surrogate): the central claim that the diffusion-compatible surrogate plus learned normalization implements the trajectory-balance objective without new collapse modes lacks direct validation. No diagnostics (e.g., KL divergence, coverage metrics, or path-probability comparisons) are reported between the surrogate-induced distribution and the true reward-tilted target, especially for long sequences or out-of-distribution prompts; this makes it unclear whether observed robustness to sampling budget arises from genuine balancing or from the normalization term acting as regularization.

Authors: We acknowledge the value of direct diagnostics. Exact KL divergence over all denoising paths is computationally intractable for long sequences. We instead provide indirect support via consistent gains in coverage and robustness to sampling budget, which would be unlikely under new collapse modes. In revision we will add coverage metrics (unique correct solutions under repeated sampling) and path-probability comparisons on representative prompt subsets, including OOD cases, to better separate the contributions of the surrogate and normalization. revision: partial
Referee: [Table 3, §5.1] Table 3 and §5.1 (benchmark results): the headline result that TraFL alone improves over the base model in every setting depends on the surrogate claim, yet the paper reports no statistical significance tests, exact baseline implementations, or ablation removing the prompt-dependent normalization. Without these, it is difficult to isolate whether gains are load-bearing for trajectory balance or artifacts of the learned normalization.

Authors: We agree these elements would improve clarity. We will add statistical significance tests (bootstrap confidence intervals and paired comparisons) to Table 3. Baseline implementations are detailed in the appendix; we will reference them explicitly in §5.1. The ablation removing prompt-dependent normalization already appears in Appendix B.3 and shows degraded performance and higher variance; we will highlight these results in the main text. These additions will better isolate the trajectory-balance contribution while preserving the headline observation that TraFL is the only method improving over the base model across all settings. revision: yes

Circularity Check

0 steps flagged

Derivation anchored externally with no reduction to self-fitted inputs

full rationale

The trajectory-balance objective is defined relative to a frozen reference model, and the diffusion-compatible surrogate plus learned normalization is presented as a practical implementation rather than a redefinition of the target. No equations or claims in the provided description reduce the central prediction to a fitted parameter or self-citation chain by construction. The benchmark improvements are reported as empirical outcomes without evident circularity in the derivation steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that a sequence-level surrogate can approximate trajectory balance in the diffusion setting and that a learned prompt-dependent normalization suffices to stabilize training.

free parameters (1)

prompt-dependent normalization
Learned parameter introduced to make the objective practical for diffusion LMs.

axioms (1)

domain assumption The reward-tilted target distribution can be approximated via the proposed diffusion-compatible sequence-level surrogate loss.
Invoked to justify the practical implementation of TraFL.

pith-pipeline@v0.9.0 · 5478 in / 1203 out tokens · 26568 ms · 2026-05-15T04:56:08.731453+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TraFL trains the policy toward a reward-tilted target distribution anchored to a frozen reference model... diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 7 internal anchors

[1]

Natural gradient works efficiently in learning.Neural computation, 10(2):251– 276, 1998

Shun-Ichi Amari. Natural gradient works efficiently in learning.Neural computation, 10(2):251– 276, 1998

work page 1998
[2]

Program Synthesis with Large Language Models

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

work page 2003
[4]

Hu, Mo Tiwari, and Emmanuel Bengio

Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, and Emmanuel Bengio. Gflownet foundations.Journal of Machine Learning Research, 24(210):1–55, 2023

work page 2023
[5]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021
[6]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

W. J. Conover.Practical Nonparametric Statistics. John Wiley & Sons, New York, 3 edition, 1999

work page 1999
[8]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, ...

work page 2025
[9]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[10]

Livecodebench: Holistic and contamination free evaluation of large language models for code

Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Ar- mando Solar-Lezama, Koushik Sen, and Ion Stoica. Livecodebench: Holistic and contamination free evaluation of large language models for code. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[11]

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

Vishnu Teja Kunde, Fatemeh Doudi, Mahdi Farahbakhsh, Dileep Kalathil, Krishna Narayanan, and Jean-Francois Chamberland. Reinforcement learning for diffusion llms with entropy-guided step selection and stepwise advantages.arXiv preprint arXiv:2603.12554, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Solving quantitative reasoning problems with language models, 2022

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra. Solving quantitative reasoning problems with language models, 2022

work page 2022
[13]

Let’s verify step by step, 2023

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step, 2023

work page 2023
[14]

Understanding r1-zero-like training: A critical perspective, 2025

Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding r1-zero-like training: A critical perspective, 2025

work page 2025
[15]

The flexibility trap: Why arbitrary order limits reasoning potential in diffusion language models.arXiv preprint arXiv:2601.15165, 2026

Zanlin Ni, Shenzhi Wang, Yang Yue, Tianyu Yu, Weilin Zhao, Yeguo Hua, Tianyi Chen, Jun Song, Cheng Yu, Bo Zheng, and Gao Huang. The flexibility trap: Why arbitrary order limits reasoning potential in diffusion language models.arXiv preprint arXiv:2601.15165, 2026

work page arXiv 2026
[16]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Principled RL for diffusion LLMs emerges from a sequence-level perspective

Jingyang Ou, Jiaqi Han, Minkai Xu, Shaoxuan Xu, Jianwen Xie, Stefano Ermon, Yi Wu, and Chongxuan Li. Principled RL for diffusion LLMs emerges from a sequence-level perspective. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[18]

Code llama: Open foundation models for code, 2024

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nico...

work page 2024
[19]

wd1: Weighted policy optimization for reasoning in diffusion language models.arXiv preprint arXiv:2507.08838,

Xindi Tang, Radu Dolga, Sangwon Yoon, and Ilija Bogunovic. wd1: Weighted policy optimiza- tion for reasoning in diffusion language models.arXiv preprint arXiv:2507.08838, 2025

work page arXiv 2025
[20]

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Cheng Wang, Paria Rashidinejad, Dong Su, Shixin Jiang, Siyuan Wang, Shuo Zhao, Cheng Zhou, Sheng Shen Shen, Fan Chen, Tommi Jaakkola, et al. Spg: Sandwiched policy gradient for masked diffusion language models.arXiv preprint arXiv:2510.09541, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Revolution- izing reinforcement learning framework for diffusion large language models.arXiv preprint arXiv:2509.06949, 2025

Yifei Wang, Lei Yang, Bowen Li, Yuhang Tian, Kai Shen, and Mengdi Wang. Revolution- izing reinforcement learning framework for diffusion large language models.arXiv preprint arXiv:2509.06949, 2025

work page arXiv 2025
[22]

Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding, 2025

Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding, 2025

work page 2025
[23]

Kodcode: A diverse, challenging, and verifiable synthetic dataset for coding.arXiv, 2025

Zhangchen Xu, Yang Liu, Yueqin Yin, Mingyuan Zhou, and Radha Poovendran. Kodcode: A diverse, challenging, and verifiable synthetic dataset for coding.arXiv, 2025. 11

work page 2025
[24]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Acecoder: Acing coder rl via automated test-case synthesis.ArXiv, abs/2207.01780, 2025

Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, and Wenhu Chen. Acecoder: Acing coder rl via automated test-case synthesis.ArXiv, abs/2207.01780, 2025

work page arXiv 2025
[26]

d1: Scaling reasoning in diffusion large language models via reinforcement learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng, and Aditya Grover. d1: Scaling reasoning in diffusion large language models via reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[27]

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

Fengqi Zhu, Rongzhen Wang, Shen Nie, Xiaolu Zhang, Chunwei Wu, Jun Hu, Jun Zhou, Jianfei Chen, Yankai Lin, Ji-Rong Wen, et al. Llada 1.5: Variance-reduced preference optimization for large language diffusion models.arXiv preprint arXiv:2505.19223, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

FlowRL: Matching reward distributions for LLM reasoning

Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, and Zhouhan Lin. FlowRL: Matching reward distributions for LLM reasoning. InThe ...

work page 2026
[29]

Subject {subject_id}, Trial {trial_type}

Yuchen Zhu, Wei Guo, Jaemoo Choi, Petr Molodyk, Bo Yuan, Molei Tao, and Yongxin Chen. Enhancing reasoning for diffusion llms via distribution matching policy optimization, 2026. 12 Overview of Appendix ATraining Algorithm BDerivation of the Normalized Log-Probability Surrogate CTheory: Path Diversity, Mode Coverage, and Trajectory Balance DPrompt Template...

work page 2026
[30]

Identify the distinct approach types in each set

work page
[31]

Compare the number and substance of these approach types

work page
[32]

set_a_approaches

Prefer TIE if the difference is mostly superficial or unclear. [Response Format] Return exactly this JSON: {{ "set_a_approaches": ["short names of distinct approaches"], "set_b_approaches": ["short names of distinct approaches"], "winner": "A" | "B" | "TIE", "confidence": "low" | "medium" | "high", "reason": "one concise sentence" }} 21 H Limitations and ...

work page