COVERT generates verifiable synthetic tool-use environments for RL by validated trajectory synthesis and oracle-preserving augmentations, improving tool-use accuracy on BFCL v3 and ACEBench while remaining complementary to SFT.
MAG-V: A multi-agent framework for synthetic data generation and verification
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A-MBER is a new benchmark for evaluating AI models on using interaction history to recognize and explain a user's present affective state across judgment, retrieval, and explanation tasks.
citing papers explorer
-
Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning
COVERT generates verifiable synthetic tool-use environments for RL by validated trajectory synthesis and oracle-preserving augmentations, improving tool-use accuracy on BFCL v3 and ACEBench while remaining complementary to SFT.
-
A-MBER: Affective Memory Benchmark for Emotion Recognition
A-MBER is a new benchmark for evaluating AI models on using interaction history to recognize and explain a user's present affective state across judgment, retrieval, and explanation tasks.