A pair-centric set-prediction model unifies present HOI detection and multi-horizon anticipation in video by modeling future interactions as residual transitions from current pair states, backed by a temporally corrected benchmark.
Boosting human-object interaction de- tection with text-to-image diffusion model
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.
citing papers explorer
-
Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation
A pair-centric set-prediction model unifies present HOI detection and multi-horizon anticipation in video by modeling future interactions as residual transitions from current pair states, backed by a temporally corrected benchmark.
-
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.