QVal is a new evaluation framework that directly measures dense supervision quality via Q-alignment to a reference policy, showing simple prompting baselines outperform 21 other methods across environments and models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.
citing papers explorer
-
QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents
QVal is a new evaluation framework that directly measures dense supervision quality via Q-alignment to a reference policy, showing simple prompting baselines outperform 21 other methods across environments and models.
-
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.