LARY benchmark finds general visual foundation models outperform specialized latent action models and latent visual spaces align better to physical actions than pixel spaces.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
WLDS applies large models with factual and logical calibration to produce diverse text-and-image deductions of emergency scenarios beyond what traditional fixed simulations can generate.
citing papers explorer
-
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment
LARY benchmark finds general visual foundation models outperform specialized latent action models and latent visual spaces align better to physical actions than pixel spaces.
-
What Will Happen Next: Large Models-Driven Deduction for Emergency Instances
WLDS applies large models with factual and logical calibration to produce diverse text-and-image deductions of emergency scenarios beyond what traditional fixed simulations can generate.