pith. sign in

Can chatgpt defend its belief in truth? evaluating llm reasoning via debate

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

background 2 unclear 1

representative citing papers

User-Assistant Bias in LLMs

cs.CL · 2025-08-16 · unverdicted · novelty 7.0

LLMs show strong user bias in role-tagged contexts that is amplified by preference alignment and can be reduced or controlled through targeted fine-tuning and DPO.

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

TeamTR is a trust-region framework for multi-agent LLM fine-tuning that resamples trajectories after each update to convert quadratic compounding occupancy shift into linear scaling and yields per-update improvement lower bounds.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

cs.CL · 2024-12-25 · unverdicted · novelty 6.0

HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16 · unverdicted · novelty 2.0

The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

citing papers explorer

Showing 8 of 8 citing papers.

  • User-Assistant Bias in LLMs cs.CL · 2025-08-16 · unverdicted · none · ref 17

    LLMs show strong user bias in role-tagged contexts that is amplified by preference alignment and can be reduced or controlled through targeted fine-tuning and DPO.

  • TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination cs.LG · 2026-05-01 · unverdicted · none · ref 39

    TeamTR is a trust-region framework for multi-agent LLM fine-tuning that resamples trajectories after each update to convert quadratic compounding occupancy shift into linear scaling and yields per-update improvement lower bounds.

  • To trust or not to trust: Attention-based Trust Management for LLM Multi-Agent Systems cs.CR · 2025-06-03 · unverdicted · none · ref 24

    Introduces six-dimension trustworthiness definition and attention-based A-Trust score with a TMS to improve LLM-MAS robustness against malicious or unreliable messages.

  • HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs cs.CL · 2024-12-25 · unverdicted · none · ref 78

    HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

  • Large Language Models Cannot Self-Correct Reasoning Yet cs.CL · 2023-10-03 · unverdicted · none · ref 19

    LLMs cannot reliably self-correct reasoning mistakes using only their internal capabilities and often degrade in performance without external feedback.

  • A Survey on LLM-as-a-Judge cs.CL · 2024-11-23 · unverdicted · none · ref 163

    A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

  • LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 241

    A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

  • Reinforcement Learning from Human Feedback cs.LG · 2025-04-16 · unverdicted · none · ref 293

    The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.