A multimodal VLM-TO-IRL framework with GAIL learns professional-specific reward functions from telemetry and tactical commentary to rank esports players by stylistic alignment.
Adams, Tyler Cody, and Peter A
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
support 1representative citing papers
Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable evolution to predict OOD behavior from training history.
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.
citing papers explorer
-
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
-
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.