Title resolution pending

Li, Hao, Yang, Xue, Wang, Zhaokai, Zhu, Xizhou, Zhou, Jie, Qiao, Yu · 2024 · arXiv 2312.09238

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

QVal is a new evaluation framework that directly measures dense supervision quality via Q-alignment to a reference policy, showing simple prompting baselines outperform 21 other methods across environments and models.

MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models

cs.SE · 2026-04-09 · unverdicted · novelty 6.0

MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.

citing papers explorer

Showing 2 of 2 citing papers after filters.

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents cs.LG · 2026-06-30 · unverdicted · none · ref 23
QVal is a new evaluation framework that directly measures dense supervision quality via Q-alignment to a reference policy, showing simple prompting baselines outperform 21 other methods across environments and models.
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models cs.SE · 2026-04-09 · unverdicted · none · ref 11
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer