Internal consistency and self-feedback in large language models: A survey.arXiv preprint arXiv:2407.14507

Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Yi Wang, Zhonghao Wang, Feiyu Xiong, Zhiyu Li · 2024 · arXiv 2407.14507

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

representative citing papers

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.

FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification

cs.AR · 2026-03-06 · unverdicted · novelty 7.0

FVRuleLearner introduces an Operator Reasoning Tree to learn operator-specific rules that improve natural-language to SystemVerilog assertion generation, raising syntax correctness by 3.95% and functional correctness by 31.17% over baselines.

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

cs.AI · 2025-07-22 · unverdicted · novelty 7.0

Deliberative Searcher integrates retrieval search, multi-step verification, and RL training with a soft reliability constraint to improve alignment between LLM confidence and correctness in open-domain QA.

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Introduces Wasserstein equilibrium decoding that improves accuracy and convergence speed for small VLMs on medical VQA benchmarks by using semantic consensus instead of lexical order.

Large Language Models Decide Early and Explain Later

cs.CL · 2026-04-24 · unverdicted · novelty 6.0

LLMs settle on their answer after a minority of CoT tokens and produce an average 760 more as post-decision explanation, enabling early stopping that saves 500 tokens per query at a 2% accuracy cost.

Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding

cs.CL · 2025-07-15 · unverdicted · novelty 5.0

Temperature and persona variations shape consensus speed in LLM multi-agent coding but produce no robust accuracy gains over single agents on human-annotated tutoring transcripts.

citing papers explorer

Showing 6 of 6 citing papers.

Reasoning Portability: Guiding Continual Learning for MLLMs in the RLVR Era cs.LG · 2026-05-17 · unverdicted · none · ref 26
Formalizes Reasoning Portability (RP) and proposes RDB-CL to modulate per-sample KL regularization in RLVR for MLLM continual learning, achieving +12.0% Last accuracy over vanilla RLVR baseline by preserving reusable reasoning on high-RP samples.
FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification cs.AR · 2026-03-06 · unverdicted · none · ref 43
FVRuleLearner introduces an Operator Reasoning Tree to learn operator-specific rules that improve natural-language to SystemVerilog assertion generation, raising syntax correctness by 3.95% and functional correctness by 31.17% over baselines.
Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints cs.AI · 2025-07-22 · unverdicted · none · ref 1
Deliberative Searcher integrates retrieval search, multi-step verification, and RL training with a soft reliability constraint to improve alignment between LLM confidence and correctness in open-domain QA.
Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering cs.CV · 2026-05-18 · unverdicted · none · ref 17
Introduces Wasserstein equilibrium decoding that improves accuracy and convergence speed for small VLMs on medical VQA benchmarks by using semantic consensus instead of lexical order.
Large Language Models Decide Early and Explain Later cs.CL · 2026-04-24 · unverdicted · none · ref 5
LLMs settle on their answer after a minority of CoT tokens and produce an average 760 more as post-decision explanation, enabling early stopping that saves 500 tokens per query at a 2% accuracy cost.
Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding cs.CL · 2025-07-15 · unverdicted · none · ref 40
Temperature and persona variations shape consensus speed in LLM multi-agent coding but produce no robust accuracy gains over single agents on human-annotated tutoring transcripts.

Internal consistency and self-feedback in large language models: A survey.arXiv preprint arXiv:2407.14507

fields

years

verdicts

representative citing papers

citing papers explorer