pith. sign in

arxiv: 2502.02584 · v1 · pith:WCHC7N4Jnew · submitted 2025-02-04 · 💻 cs.LG · cs.AI

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

classification 💻 cs.LG cs.AI
keywords languageqlassagentsstepwiseagentguidanceinferencemodel
0
0 comments X
read the original abstract

Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process reward modeling, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis. We will release our code and data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

    cs.AI 2026-05 unverdicted novelty 6.0

    COMPASS uses MCTS-guided cognitive exploration and introspective step-wise alignment to improve safety-utility trade-off in LLM search agents with less training data.

  2. GUI agent: Guided Exploration of User-Sensitive Screens

    cs.AI 2026-06 unverdicted novelty 4.0

    An explorer agent is developed to identify user-sensitive queries in GUI environments by systematic exploration starting from a single demonstrated task.

  3. From System 1 to System 2: A Survey of Reasoning Large Language Models

    cs.AI 2025-02 accept novelty 3.0

    The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.