Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents

cs.CV · 2025-12-19 · unverdicted · novelty 7.0

LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.

From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.

Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning

cs.CV · 2026-05-05 · unverdicted · novelty 4.0

A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.

citing papers explorer

Showing 3 of 3 citing papers.

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents cs.CV · 2025-12-19 · unverdicted · none · ref 27
LangDriveCTRL decomposes driving videos into 3D scene graphs and uses an agentic pipeline with specialized multi-modal agents to perform language-controlled object and behavior edits, achieving nearly 2x higher instruction alignment than prior state-of-the-art methods.
From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 35
AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.
Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning cs.CV · 2026-05-05 · unverdicted · none · ref 20
A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer