Aurora introduces a VLM-based agent that converts raw user video edit requests into structured conditioning inputs for a unified diffusion transformer, improving performance on underspecified tasks via a new benchmark.
Mora: Enabling generalist video generation via a multi-agent framework
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
DIRECT uses a three-level multi-agent framework to solve video mashup creation as a multimodal coherency problem, outperforming baselines on a new benchmark.
MAVEN is a multi-agent prompt refinement framework that improves cultural fidelity in text-to-video generation, demonstrated on a new benchmark of 243 prompts and 972 videos across Chinese, American, and Romanian cultures.
SWIFT introduces a semantic injection cache with head-wise updates and an adaptive dynamic window plus segment anchors to achieve efficient multi-prompt long video generation at 22.6 FPS while preserving quality in causal diffusion models.
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.
CiteAudit supplies a human-validated benchmark and multi-agent verification system that outperforms existing LLMs and commercial tools at detecting hallucinated scientific references.
citing papers explorer
-
Aurora: Unified Video Editing with a Tool-Using Agent
Aurora introduces a VLM-based agent that converts raw user video edit requests into structured conditioning inputs for a unified diffusion transformer, improving performance on underspecified tasks via a new benchmark.
-
DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing
DIRECT uses a three-level multi-agent framework to solve video mashup creation as a multimodal coherency problem, outperforming baselines on a new benchmark.
-
MAVEN A Multi-Agent Framework for Multicultural Text-to-Video Generation
MAVEN is a multi-agent prompt refinement framework that improves cultural fidelity in text-to-video generation, demonstrated on a new benchmark of 243 prompts and 972 videos across Chinese, American, and Romanian cultures.
-
SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
SWIFT introduces a semantic injection cache with head-wise updates and an adaptive dynamic window plus segment anchors to achieve efficient multi-prompt long video generation at 22.6 FPS while preserving quality in causal diffusion models.
-
Detecting AI-Generated Videos with Spiking Neural Networks
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.
-
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
CiteAudit supplies a human-validated benchmark and multi-agent verification system that outperforms existing LLMs and commercial tools at detecting hallucinated scientific references.