Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

Liantao Ma; Tianlong Wang; Weibin Liao; Xin Gao; Xinyu Ma; Yang Lin; Yasha Wang; Yuhang Wang

arxiv: 2606.28589 · v1 · pith:75YK5MPRnew · submitted 2026-06-26 · 💻 cs.AI

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

Tianlong Wang , Yuhang Wang , Weibin Liao , Xin Gao , Xinyu Ma , Yang Lin , Yasha Wang , Liantao Ma This is my paper

Pith reviewed 2026-06-30 00:33 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM reasoningrepresentation editingdynamic steeringtruth disentanglementMATH benchmarklookahead entropyFisher-LDAreasoning trajectories

0 comments

The pith

DynaSteer steers LLM reasoning trajectories toward truth by clustering patterns and projecting purified vectors at early high-entropy forks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the geometry of truth inside unfolding LLM reasoning chains and shows that truth is encoded at the sentence level yet remains entangled with latent patterns. It identifies that effective steering obeys an uncertainty principle and decay effect, so interventions must target early high-entropy forks, and that naive steering vectors introduce noise that can harm correct paths. From these observations the authors build DynaSteer, which clusters reasoning manifolds, applies Fisher-LDA to isolate a cleaner truth direction, and uses lookahead entropy to decide when to edit and when to roll back. Experiments confirm higher accuracy on MATH benchmarks together with generalization to out-of-domain coding tasks.

Core claim

Truth is encoded at the sentence level and entangled with latent reasoning patterns. Effective intervention follows an Uncertainty Principle and a Decay Effect, requiring localization to early, high-entropy forks. Naive steering vectors suffer from noise and risk collateral damage. DynaSteer therefore employs pattern clustering to disentangle reasoning manifolds, Fisher-LDA to project purified truth, and dynamic lookahead entropy monitoring to steer and roll back trajectories only when necessary.

What carries the argument

DynaSteer, a dynamic representation editing framework that disentangles reasoning manifolds via pattern clustering and projects purified truth using Fisher-LDA, guided by lookahead entropy monitoring for selective intervention.

If this is right

Higher accuracy on MATH benchmark tasks through selective steering at uncertain points.
Generalization to out-of-domain coding tasks without task-specific retraining.
Avoidance of collateral damage by rolling back when entropy monitoring indicates the trajectory remains on track.
Shift from simply encouraging longer thinking to actively directing trajectories toward truth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same entropy-monitoring logic could be applied to detect likely errors in deployed LLM systems before they complete a response.
Insights on sentence-level truth encoding may inform how other control methods, such as activation patching, locate truthful directions.
The decay effect suggests that steering windows shrink as reasoning chains lengthen, which could guide scheduling of interventions in longer tasks.

Load-bearing premise

Truth can be reliably disentangled from latent reasoning patterns via Fisher-LDA projection without collateral damage to correct trajectories.

What would settle it

An experiment in which the DynaSteer interventions at the identified early high-entropy forks produce no accuracy gain or reduce accuracy on MATH problems relative to the unsteered baseline.

Figures

Figures reproduced from arXiv: 2606.28589 by Liantao Ma, Tianlong Wang, Weibin Liao, Xin Gao, Xinyu Ma, Yang Lin, Yasha Wang, Yuhang Wang.

**Figure 1.** Figure 1: Illustration of the key differences between (a) existing reasoning intervention approaches, (b) static representation editing and (c) our dynamic representation editing during reasoning trajectories. 1. Introduction Large Language Models (LLMs) have demonstrated remarkable proficiency in complex reasoning tasks (Huang & Chang, 2023; Plaat et al., 2024; Wang et al., 2025b), largely driven by the Chain-of-… view at source ↗

**Figure 2.** Figure 2: Truth discrimination accuracy across individual attention heads of the large language model. Lower rows correspond to shallower layers of the LLM. Expanding token-level representations to the sentence-level and incorporating pattern clustering substantially improve the accuracy of the linear probes. RepE assumes that high-level semantic concepts are linearly encoded in the activation space. Specifically, … view at source ↗

**Figure 3.** Figure 3: The relationship between the Recovery Rate of erroneous reasoning trajectories under representation editing and Entropy Quantiles across different Temporal Segments. for intervention. We analyze this along two orthogonal axes: the spatial locus (identifying critical decision points) and the temporal locus (determining the effective window of opportunity). Spatial Locus: Entropy-Based Selection. Reasoning i… view at source ↗

**Figure 4.** Figure 4: Distributional comparison of Truth and Fallacy representations under Mean-Difference and Fisher LDA projections [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Intervention strength of representation editing on Truth and Fallacy under Mean-Difference and Fisher-LDA. indicator R(s) is defined as: R(s) = I[∃n ∈ {1, . . . , N} : yn = 1] , (5) where I[·] is the indicator function. The Recovery Rate for a group of segments is then calculated as the expected value of R(s). As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Overview of the DynaSteer framework. (a) illustrates the data construction phase, where DynaSteer samples high-entropy sentences and uses consistency to distinguish between Truth and Fallacy. (b) presents the use of probing techniques to localize Truthrelevant attention heads within the LLM. In (c), DynaSteer utilizes pattern clustering for manifold disentanglement, followed by Fisher-LDA to derive pure T… view at source ↗

**Figure 7.** Figure 7: t-SNE Visualization of Pattern Clustering. F.2. Comparison with Causal Intervention Baselines To further validate the effectiveness of our probing-based truth localization strategy, we compare DynaSteer against two prominent causal intervention methods, ROME (Meng et al., 2022) and MEMIT (Meng et al.). These methods explicitly target specific attention heads based on causal tracing to edit associations wit… view at source ↗

**Figure 8.** Figure 8: Sensitivity Analysis of Hyper-Parameter. Cluster number in pattern clustering, M We hypothesize that the geometry of truth is entangled with latent reasoning patterns. By varying the cluster number M ∈ {1, 3, 5, 8}, we observe the effect of manifold disentanglement. • Global vs. Local Geometry: The baseline M = 1 (global mean-difference) yields an accuracy of 72.8%. Increasing M to 5 results in a peak perf… view at source ↗

**Figure 9.** Figure 9: Case Study on Trigonometry. The figure contrasts the model’s original tendency to perform approximate calculation (Red) with the steered trajectory (Green). DynaSteer successfully guides the model to retrieve the relevant symbolic identity, avoiding calculation errors and directly deriving the integer solution. G.1.2. NUMBER BASE CONVERSION Analysis of the Original Trajectory (Fallacy) When converting from… view at source ↗

**Figure 10.** Figure 10: Case Study on Number Base Conversion. The model originally misaligns the 3-bit grouping boundaries (Red), producing an incorrect octal representation. DynaSteer steers the model to first pad the binary string to a multiple of 3 digits before grouping, yielding the correct conversion. G.1.3. LOGARITHMIC EQUATION DOMAIN ANALYSIS Analysis of the Original Trajectory (Fallacy) Faced with finding conditions for… view at source ↗

**Figure 11.** Figure 11: Case Study on Logarithmic Equations. The model originally reduces “exactly one solution” to “discriminant equals zero” (Red), a common reasoning shortcut that ignores how domain restrictions on logarithms can convert a two-root quadratic into a one-solution scenario. DynaSteer steers the model toward a domain-aware case analysis (Green), recovering the full count. G.1.4. PIECEWISE FUNCTION SURJECTIVITY An… view at source ↗

**Figure 12.** Figure 12: Case Study on Piecewise Functions. The model originally overlooks the domain restriction of the linear piece (x < a) and incorrectly concludes it covers all of R (Red). DynaSteer steers the model to correctly compute the bounded range of each piece and derive the no-gap inequality a 3 ≤ a 2 + 2a, identifying a = 2 as the critical threshold. G.1.5. RECURSIVE SEQUENCE PERIODICITY Analysis of the Original Tr… view at source ↗

**Figure 13.** Figure 13: Case Study on Recursive Sequences. The model attempts brute-force computation but accumulates arithmetic errors at x9 (Red), derailing period detection. DynaSteer steers the model to identify the anti-symmetry xn+5 = −xn and period-10 structure, enabling direct index reduction and an exact answer. G.1.6. COMBINATORIAL PROBABILITY Analysis of the Original Trajectory (Fallacy) In calculating the probability… view at source ↗

**Figure 14.** Figure 14: Case Study on Combinatorial Probability. The model correctly identifies that Bob must return the transferred color, but overlooks the duplicate ball in Bob’s bag, computing 1 6 instead of 2 6 (Red). DynaSteer steers the model to recognize the duplicate and arrive at the correct probability 1 3 . G.2. Multi-Steering Case Studies In more complex problems, a single intervention may not be sufficient to fully… view at source ↗

**Figure 15.** Figure 15: Multi-Steering Case Study: Polynomial Factorization. The model commits two cascading errors: (1) it enumerates only single-product terms for each degree, ignoring cross-terms from the three non-leading coefficients of f(x) (Red #1), which collapses the quotient h(x) to x 6 + a3x 3 + a0; (2) believing a3 and a0 are free, it assigns arbitrary values instead of solving the determined linear system (Red #2). … view at source ↗

**Figure 16.** Figure 16: Multi-Steering Case Study: GCD/LCM Counting. The model makes two compounding errors in combinatorial counting: (1) it applies the formula 2 k−1 without clarifying whether it yields ordered or unordered pairs (Red #1); (2) it then redundantly divides by 2 a second time, halving the already-correct count from 8 to 4 (Red #2). DynaSteer steers at both junctures—first to establish the explicit 2 4 → 8 derivat… view at source ↗

**Figure 17.** Figure 17: Multi-Steering Case Study: Triangle Trigonometry. The model encounters two distinct reasoning failures: (1) it computes tan A as a symbolic expression in a, b, c without recognizing that the area constraint determines a unique numeric value (Red #1); (2) when guided to the single-variable equation sin A = 4(1 − cos A), it re-introduces side-length variables instead of applying half-angle identities, produ… view at source ↗

read the original abstract

Current approaches to enhance Large Language Model (LLM) reasoning, such as Chain-of-Thought and "Wait" prompts, primarily encourage models to think more, yet often fail to guide them toward Truth. While Representation Editing (RepE) offers a intrinsic control, its application to dynamic reasoning trajectories remains underexplored. In this work, we bridge this gap by investigating the geometry of truth within unfolding reasoning chains. We uncover three critical insights: (1) Truth is encoded at the sentence level and is entangled with latent reasoning patterns; (2) Effective intervention follows an Uncertainty Principle and a Decay Effect, requiring localization to early, high-entropy forks; (3) Naive steering vectors suffer from noise, risking collateral damage to correct trajectories. Based on these findings, we propose DynaSteer, a dynamic RepE framework. DynaSteer employs pattern clustering to disentangle reasoning manifolds and utilizes Fisher-LDA to project purified truth. By dynamically monitoring lookahead entropy, it selectively steers and rolls back trajectories only when necessary. Comprehensive experimental results on several MATH benchmark verify the effectiveness of DynaSteer, and experiments on out-of-domain coding tasks further confirm its generalization ability. Our code is publicly available at https://github.com/tianlwang/DynaSteer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DynaSteer combines pattern clustering, Fisher-LDA projection, and lookahead entropy for selective steering and rollback in LLM reasoning, but the geometric assumptions about truth disentanglement remain unverified in the provided details.

read the letter

The paper's main contribution is DynaSteer, a dynamic version of representation editing that clusters latent patterns, projects a purified truth direction with Fisher-LDA, and uses lookahead entropy to trigger steering only at early high-entropy forks, with rollback to avoid damage. This is new relative to static RepE or basic CoT prompting, and the three insights about sentence-level encoding, uncertainty/decay effects, and noise in naive vectors give a coherent rationale for the selective mechanism.

It does a reasonable job framing the problem as one of trajectory control rather than just more thinking, and releasing the code helps. The out-of-domain coding generalization claim is a plus if the numbers hold.

The soft spots are exactly where the stress-test note points: the central claims rest on Fisher-LDA cleanly separating truth from other manifolds after clustering, and on early high-entropy points being the only safe intervention spots. Without ablations on projection quality, timing sensitivity, or checks for collateral damage on correct paths, those assumptions stay untested. The abstract-only view makes it hard to judge effect sizes or robustness, so the reported MATH gains could be fragile.

This is for people already working on internal LLM steering and control techniques. It is coherent enough on its own terms to deserve a serious referee rather than a desk reject, though the review will need to focus on whether the geometry actually delivers what is claimed.

Referee Report

2 major / 1 minor

Summary. The paper claims that by investigating the geometry of truth in unfolding LLM reasoning chains, three insights are uncovered—(1) truth is encoded at the sentence level and entangled with latent reasoning patterns, (2) effective intervention obeys an Uncertainty Principle and Decay Effect localized to early high-entropy forks, and (3) naive steering vectors introduce noise risking collateral damage—and these motivate DynaSteer, a dynamic RepE framework that applies pattern clustering, Fisher-LDA projection for purified truth vectors, and lookahead entropy monitoring for selective steering with rollback. The method is reported to improve reasoning accuracy on MATH benchmarks and to generalize to out-of-domain coding tasks.

Significance. If the geometric assumptions and empirical gains hold, the work would advance intrinsic control of LLM reasoning trajectories beyond static prompting or CoT methods by providing a dynamic, geometry-aware editing approach. The public release of code at https://github.com/tianlwang/DynaSteer is a positive contribution that enables direct reproducibility of the reported results.

major comments (2)

[Abstract / method overview] The central performance claims rest on Fisher-LDA successfully disentangling sentence-level truth from other reasoning manifolds after pattern clustering (insight 1 and method description). No quantitative check—such as cosine similarity of the projected vectors to ground-truth answers, class-separability metrics, or an ablation removing the LDA step—is described to confirm that the projection avoids mixing truth with high-variance correct patterns or introducing collateral damage.
[Abstract / insight 2] Insight 2 (Uncertainty Principle and Decay Effect) asserts that intervention is both necessary and safe only at early high-entropy forks. The manuscript provides no ablation on intervention timing, no comparison of early vs. late rollback outcomes, and no evidence that later low-entropy corrections are never required; if this timing assumption fails, the selective-rollback mechanism could either miss fixes or cause unnecessary damage.

minor comments (1)

The abstract states the three insights as “uncovered” but does not indicate where in the manuscript the supporting geometric observations or visualizations are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify important areas where additional evidence would strengthen the claims regarding the Fisher-LDA projection and intervention timing. We address each point below and commit to revisions that incorporate the requested analyses.

read point-by-point responses

Referee: [Abstract / method overview] The central performance claims rest on Fisher-LDA successfully disentangling sentence-level truth from other reasoning manifolds after pattern clustering (insight 1 and method description). No quantitative check—such as cosine similarity of the projected vectors to ground-truth answers, class-separability metrics, or an ablation removing the LDA step—is described to confirm that the projection avoids mixing truth with high-variance correct patterns or introducing collateral damage.

Authors: We agree that direct quantitative validation of the LDA projection would provide stronger support for Insight 1. The reported gains on MATH and out-of-domain coding offer indirect evidence, but we will add explicit checks in the revision: (1) cosine similarity between projected truth vectors and ground-truth answer embeddings, (2) class-separability metrics (e.g., between-class vs. within-class variance ratios) pre- and post-projection, and (3) an ablation comparing full DynaSteer against a version without the LDA step. These additions will directly address concerns about mixing with high-variance patterns or collateral damage. revision: yes
Referee: [Abstract / insight 2] Insight 2 (Uncertainty Principle and Decay Effect) asserts that intervention is both necessary and safe only at early high-entropy forks. The manuscript provides no ablation on intervention timing, no comparison of early vs. late rollback outcomes, and no evidence that later low-entropy corrections are never required; if this timing assumption fails, the selective-rollback mechanism could either miss fixes or cause unnecessary damage.

Authors: We acknowledge the absence of explicit timing ablations in the current manuscript. The Uncertainty Principle and Decay Effect are motivated by our geometric observations of entropy and pattern separation along trajectories. In the revision we will include a new ablation that varies intervention timing (early high-entropy forks vs. later low-entropy points), reports rollback success rates, accuracy deltas, and any collateral effects. This will empirically test whether late corrections are required and confirm the safety of selective early steering. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework validated externally

full rationale

The paper presents three geometric insights derived from representation analysis, then constructs DynaSteer by applying standard tools (pattern clustering, Fisher-LDA projection, entropy monitoring) to those observations. Performance is reported via direct experiments on MATH benchmarks and out-of-domain coding tasks rather than any closed derivation or fitted parameter renamed as prediction. No self-citations appear in the abstract or described method, Fisher-LDA is invoked as an off-the-shelf technique, and the central claims do not reduce by the paper's own equations to its inputs. The work is therefore self-contained as an engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The framework rests on domain assumptions about the geometry of truth in LLM activations and the timing of effective interventions; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (3)

domain assumption Truth is encoded at the sentence level and entangled with latent reasoning patterns
Stated as insight (1) in the abstract.
domain assumption Effective intervention follows an Uncertainty Principle and a Decay Effect, requiring localization to early, high-entropy forks
Stated as insight (2) in the abstract.
domain assumption Naive steering vectors suffer from noise and risk collateral damage
Stated as insight (3) in the abstract.

pith-pipeline@v0.9.1-grok · 5782 in / 1446 out tokens · 44587 ms · 2026-06-30T00:33:12.736598+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 25 canonical work pages · 13 internal anchors

[1]

Findings of the association for computational linguistics: ACL 2023 , pages=

Towards reasoning in large language models: A survey , author=. Findings of the association for computational linguistics: ACL 2023 , pages=

2023
[2]

arXiv preprint arXiv:2407.11511 , year=

Reasoning with large language models, a survey , author=. arXiv preprint arXiv:2407.11511 , year=

work page arXiv
[3]

arXiv preprint arXiv:2410.12854 , year=

TPO: Aligning large language models with multi-branch & multi-step preference trees , author=. arXiv preprint arXiv:2410.12854 , year=

work page arXiv
[4]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

2025
[5]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[6]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
[7]

Proceedings of the 33rd annual ACM conference on human factors in computing systems , pages=

Wait-learning: Leveraging wait time for second language education , author=. Proceedings of the 33rd annual ACM conference on human factors in computing systems , pages=
[8]

Why Do Reasoning Models Loop? , author=

Wait, Wait, Wait... Why Do Reasoning Models Loop? , author=. arXiv preprint arXiv:2512.12895 , year=

work page arXiv
[9]

arXiv preprint arXiv:2504.02956 , year=

Understanding aha moments: from external observations to internal mechanisms , author=. arXiv preprint arXiv:2504.02956 , year=

work page arXiv
[10]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Safekey: Amplifying aha-moment insights for safety reasoning , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[11]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[12]

Proceedings of the ACM on Web Conference 2025 , pages=

Adaptive activation steering: A tuning-free llm truthfulness improvement method for diverse hallucinations categories , author=. Proceedings of the ACM on Web Conference 2025 , pages=

2025
[13]

Advances in Neural Information Processing Systems , volume=

Inference-time intervention: Eliciting truthful answers from a language model , author=. Advances in Neural Information Processing Systems , volume=
[14]

arXiv preprint arXiv:2501.14371 , year=

DRESSing up LLM: Efficient stylized question-answering via style subspace editing , author=. arXiv preprint arXiv:2501.14371 , year=

work page arXiv
[15]

Advances in Neural Information Processing Systems , volume=

Towards safe concept transfer of multi-modal diffusion via causal representation editing , author=. Advances in Neural Information Processing Systems , volume=
[16]

arXiv preprint arXiv:2508.04530 , year=

Balancing Stylization and Truth via Disentangled Representation Steering , author=. arXiv preprint arXiv:2508.04530 , year=

work page arXiv
[17]

arXiv preprint arXiv:2510.01243 , year=

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing , author=. arXiv preprint arXiv:2510.01243 , year=

work page arXiv
[18]

Thought anchors: Which llm reasoning steps matter? arXiv preprint arXiv:2506.19143,

Thought Anchors: Which LLM Reasoning Steps Matter? , author=. arXiv preprint arXiv:2506.19143 , year=

work page arXiv
[19]

arXiv preprint arXiv:2407.06645 , year=

Entropy law: The story behind data compression and llm performance , author=. arXiv preprint arXiv:2407.06645 , year=

work page arXiv
[20]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning , author=. arXiv preprint arXiv:2506.01939 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

Rethinking entropy interventions in rlvr: An entropy change perspective , author=. arXiv preprint arXiv:2510.10150 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Qwen Technical Report

Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[23]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Program Synthesis with Large Language Models

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

arXiv preprint arXiv:2402.19255 , year=

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers , author=. arXiv preprint arXiv:2402.19255 , year=

work page arXiv
[27]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[28]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Huggingface's transformers: State-of-the-art natural language processing , author=. arXiv preprint arXiv:1910.03771 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910
[29]

The Twelfth International Conference on Learning Representations,

William Rudman and Carsten Eickhoff , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

2024
[30]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top-down approach to ai transparency , author=. arXiv preprint arXiv:2310.01405 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , author=. arXiv preprint arXiv:2310.06824 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

The Twelfth International Conference on Learning Representations , year=

Let's verify step by step , author=. The Twelfth International Conference on Learning Representations , year=
[33]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
[35]

Is Your Code Generated by Chat

Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , booktitle =. Is Your Code Generated by Chat. 2023 , url =

2023
[36]

2013 , publisher=

The theory of sound, Volume One , author=. 2013 , publisher=

2013
[37]

Designing and interpreting probes with control tasks , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (emnlp-ijcnlp) , pages=

2019
[38]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Truthx: Alleviating hallucinations by editing large language models in truthful space , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[39]

TruthFlow: Truthful LLM Generation via Representation Flow Correction , author=
[40]

Advances in neural information processing systems , volume=

Locating and editing factual associations in gpt , author=. Advances in neural information processing systems , volume=
[41]

Mass-Editing Memory in a Transformer , author=
[42]

Advances in Neural Information Processing Systems , volume=

Magical: Medical lay language generation via semantic invariance and layperson-tailored adaptation , author=. Advances in Neural Information Processing Systems , volume=
[43]

arXiv preprint arXiv:2504.02327 , year=

Learnat: Learning nl2sql with ast-guided task decomposition for large language models , author=. arXiv preprint arXiv:2504.02327 , year=

work page arXiv
[44]

Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

HyFunc: Accelerating LLM-based Function Calls for Agentic AI through Hybrid-Model Cascade and Dynamic Templating , author=. Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=
[45]

arXiv preprint arXiv:2510.10071 , year=

ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning , author=. arXiv preprint arXiv:2510.10071 , year=

work page arXiv
[46]

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs

Promed: Shapley information gain guided reinforcement learning for proactive medical llms , author=. arXiv preprint arXiv:2508.13514 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[48]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Toward better EHR reasoning in llms: Reinforcement learning with expert attention guidance , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[49]

GraphWalker: Patient Analogy Meets Information Gain for Clinical Reasoning with Large Language Models

GraphWalker: Graph-Guided In-Context Learning for Clinical Reasoning on Electronic Health Records , author=. arXiv preprint arXiv:2604.06684 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Findings of the association for computational linguistics: ACL 2023 , pages=

Towards reasoning in large language models: A survey , author=. Findings of the association for computational linguistics: ACL 2023 , pages=

2023

[2] [2]

arXiv preprint arXiv:2407.11511 , year=

Reasoning with large language models, a survey , author=. arXiv preprint arXiv:2407.11511 , year=

work page arXiv

[3] [3]

arXiv preprint arXiv:2410.12854 , year=

TPO: Aligning large language models with multi-branch & multi-step preference trees , author=. arXiv preprint arXiv:2410.12854 , year=

work page arXiv

[4] [4]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

2025

[5] [5]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[6] [6]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

[7] [7]

Proceedings of the 33rd annual ACM conference on human factors in computing systems , pages=

Wait-learning: Leveraging wait time for second language education , author=. Proceedings of the 33rd annual ACM conference on human factors in computing systems , pages=

[8] [8]

Why Do Reasoning Models Loop? , author=

Wait, Wait, Wait... Why Do Reasoning Models Loop? , author=. arXiv preprint arXiv:2512.12895 , year=

work page arXiv

[9] [9]

arXiv preprint arXiv:2504.02956 , year=

Understanding aha moments: from external observations to internal mechanisms , author=. arXiv preprint arXiv:2504.02956 , year=

work page arXiv

[10] [10]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Safekey: Amplifying aha-moment insights for safety reasoning , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[11] [11]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[12] [12]

Proceedings of the ACM on Web Conference 2025 , pages=

Adaptive activation steering: A tuning-free llm truthfulness improvement method for diverse hallucinations categories , author=. Proceedings of the ACM on Web Conference 2025 , pages=

2025

[13] [13]

Advances in Neural Information Processing Systems , volume=

Inference-time intervention: Eliciting truthful answers from a language model , author=. Advances in Neural Information Processing Systems , volume=

[14] [14]

arXiv preprint arXiv:2501.14371 , year=

DRESSing up LLM: Efficient stylized question-answering via style subspace editing , author=. arXiv preprint arXiv:2501.14371 , year=

work page arXiv

[15] [15]

Advances in Neural Information Processing Systems , volume=

Towards safe concept transfer of multi-modal diffusion via causal representation editing , author=. Advances in Neural Information Processing Systems , volume=

[16] [16]

arXiv preprint arXiv:2508.04530 , year=

Balancing Stylization and Truth via Disentangled Representation Steering , author=. arXiv preprint arXiv:2508.04530 , year=

work page arXiv

[17] [17]

arXiv preprint arXiv:2510.01243 , year=

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing , author=. arXiv preprint arXiv:2510.01243 , year=

work page arXiv

[18] [18]

Thought anchors: Which llm reasoning steps matter? arXiv preprint arXiv:2506.19143,

Thought Anchors: Which LLM Reasoning Steps Matter? , author=. arXiv preprint arXiv:2506.19143 , year=

work page arXiv

[19] [19]

arXiv preprint arXiv:2407.06645 , year=

Entropy law: The story behind data compression and llm performance , author=. arXiv preprint arXiv:2407.06645 , year=

work page arXiv

[20] [20]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning , author=. arXiv preprint arXiv:2506.01939 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

Rethinking entropy interventions in rlvr: An entropy change perspective , author=. arXiv preprint arXiv:2510.10150 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Qwen Technical Report

Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Program Synthesis with Large Language Models

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

arXiv preprint arXiv:2402.19255 , year=

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers , author=. arXiv preprint arXiv:2402.19255 , year=

work page arXiv

[27] [27]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Huggingface's transformers: State-of-the-art natural language processing , author=. arXiv preprint arXiv:1910.03771 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910

[29] [29]

The Twelfth International Conference on Learning Representations,

William Rudman and Carsten Eickhoff , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

2024

[30] [30]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top-down approach to ai transparency , author=. arXiv preprint arXiv:2310.01405 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , author=. arXiv preprint arXiv:2310.06824 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

The Twelfth International Conference on Learning Representations , year=

Let's verify step by step , author=. The Twelfth International Conference on Learning Representations , year=

[33] [33]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

[35] [35]

Is Your Code Generated by Chat

Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , booktitle =. Is Your Code Generated by Chat. 2023 , url =

2023

[36] [36]

2013 , publisher=

The theory of sound, Volume One , author=. 2013 , publisher=

2013

[37] [37]

Designing and interpreting probes with control tasks , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (emnlp-ijcnlp) , pages=

2019

[38] [38]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Truthx: Alleviating hallucinations by editing large language models in truthful space , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[39] [39]

TruthFlow: Truthful LLM Generation via Representation Flow Correction , author=

[40] [40]

Advances in neural information processing systems , volume=

Locating and editing factual associations in gpt , author=. Advances in neural information processing systems , volume=

[41] [41]

Mass-Editing Memory in a Transformer , author=

[42] [42]

Advances in Neural Information Processing Systems , volume=

Magical: Medical lay language generation via semantic invariance and layperson-tailored adaptation , author=. Advances in Neural Information Processing Systems , volume=

[43] [43]

arXiv preprint arXiv:2504.02327 , year=

Learnat: Learning nl2sql with ast-guided task decomposition for large language models , author=. arXiv preprint arXiv:2504.02327 , year=

work page arXiv

[44] [44]

Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

HyFunc: Accelerating LLM-based Function Calls for Agentic AI through Hybrid-Model Cascade and Dynamic Templating , author=. Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=

[45] [45]

arXiv preprint arXiv:2510.10071 , year=

ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning , author=. arXiv preprint arXiv:2510.10071 , year=

work page arXiv

[46] [46]

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs

Promed: Shapley information gain guided reinforcement learning for proactive medical llms , author=. arXiv preprint arXiv:2508.13514 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[48] [48]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Toward better EHR reasoning in llms: Reinforcement learning with expert attention guidance , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[49] [49]

GraphWalker: Patient Analogy Meets Information Gain for Clinical Reasoning with Large Language Models

GraphWalker: Graph-Guided In-Context Learning for Clinical Reasoning on Electronic Health Records , author=. arXiv preprint arXiv:2604.06684 , year=

work page internal anchor Pith review Pith/arXiv arXiv