GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.
Advances in neural information processing systems36, 46595–46623 (2023)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
dataset 1polarities
use dataset 1representative citing papers
S^2tory uses narratological theory and a Narrative Expert Agent to identify plot nuclei in movie scripts for high-fidelity summarization at 3.5x compression, with strong zero-shot generalization to books.
SurgCheck benchmark reveals that vision-language models for surgical VQA often depend on linguistic shortcuts rather than visual reasoning, shown by consistent performance drops on less-biased questions.
ANVIL automates analogy-based instructional animations for computer science by chaining LLM analogy generation, screenplay structuring, manim code production with repair, and mixed human-automated evaluations.
SMART uses marginal benefit-cost analysis to dynamically build efficient speculative trees, achieving 15-20% additional speedup in LLM and MLLM inference.
citing papers explorer
-
GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models
GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.
-
S^2tory: Story Spine Distillation for Movie Script Summarization
S^2tory uses narratological theory and a Narrative Expert Agent to identify plot nuclei in movie scripts for high-fidelity summarization at 3.5x compression, with strong zero-shot generalization to books.
-
SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA?
SurgCheck benchmark reveals that vision-language models for surgical VQA often depend on linguistic shortcuts rather than visual reasoning, shown by consistent performance drops on less-biased questions.
-
ANVIL: Analogies and Videos for Lecturers
ANVIL automates analogy-based instructional animations for computer science by chaining LLM analogy generation, screenplay structuring, manim code production with repair, and mixed human-automated evaluations.
-
SMART: When is it Actually Worth Expanding a Speculative Tree?
SMART uses marginal benefit-cost analysis to dynamically build efficient speculative trees, achieving 15-20% additional speedup in LLM and MLLM inference.