pith. sign in

hub

Controlled decoding from language models

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

representative citing papers

TRAM: Test-Time Risk Adaptation with Mixture of Agents

cs.LG · 2024-08-16 · unverdicted · novelty 7.0

TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.

Selective Safety Steering via Value-Filtered Decoding

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Value-filtered decoding steers LLM outputs for safety at decoding time using a value criterion with an explicit bound on false interventions controlled by one threshold hyperparameter.

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

cs.AI · 2024-08-13 · unverdicted · novelty 6.0

Agent Q integrates MCTS-guided search, self-critique, and off-policy DPO to train LLM agents that outperform behavior cloning and reinforced fine-tuning baselines in WebShop and achieve up to 95.4% success in real-world booking scenarios.

citing papers explorer

Showing 11 of 11 citing papers.