A comprehensive survey on trustworthiness in reasoning with large language models

Wang, C · 2025 · arXiv 2509.03871

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries

cs.LG · 2026-05-17 · conditional · novelty 6.0

Swapping the reasoning trace prefill on unlearned weights can replicate or reverse the parser-split bypass gap, showing that the gap alone does not identify or rule out weight-level memorization.

Pause or Fabricate? Training Language Models for Grounded Reasoning

cs.CL · 2026-04-21 · conditional · novelty 6.0

GRIL uses stage-specific RL rewards to train LLMs to detect missing premises, pause proactively, and resume grounded reasoning after clarification, yielding up to 45% better premise detection and 30% higher task success on insufficient math datasets.

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

cs.CR · 2026-02-12 · conditional · novelty 6.0

TRACE-RPS drops LLM attribute inference accuracy from around 50% to below 5% via fine-grained anonymization plus a two-stage rejection optimization.

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

cs.CR · 2026-04-06 · unverdicted · novelty 5.0

A 16-factor structured prompt framework strengthens CoT reasoning in LLMs for security analysis, yielding up to 40% reasoning gains in smaller models and stable accuracy improvements validated by human raters with Cohen's k > 0.80.

citing papers explorer

Showing 5 of 5 citing papers.

Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries cs.LG · 2026-05-17 · conditional · none · ref 14
Swapping the reasoning trace prefill on unlearned weights can replicate or reverse the parser-split bypass gap, showing that the gap alone does not identify or rule out weight-level memorization.
Pause or Fabricate? Training Language Models for Grounded Reasoning cs.CL · 2026-04-21 · conditional · none · ref 40
GRIL uses stage-specific RL rewards to train LLMs to detect missing premises, pause proactively, and resume grounded reasoning after clarification, yielding up to 45% better premise detection and 30% higher task success on insufficient math datasets.
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space cs.LG · 2026-04-15 · unverdicted · none · ref 59
PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.
Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs cs.CR · 2026-02-12 · conditional · none · ref 15
TRACE-RPS drops LLM attribute inference accuracy from around 50% to below 5% via fine-grained anonymization plus a two-stage rejection optimization.
Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework cs.CR · 2026-04-06 · unverdicted · none · ref 25
A 16-factor structured prompt framework strengthens CoT reasoning in LLMs for security analysis, yielding up to 40% reasoning gains in smaller models and stable accuracy improvements validated by human raters with Cohen's k > 0.80.

A comprehensive survey on trustworthiness in reasoning with large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer