arXiv:2204.13653 , year=

Grit: General robust image task benchmark , author= · 2022 · arXiv 2204.13653

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

DroneFINE: Domain-Aware Parameter-Efficient Fine-Tuning of Vision-Language Detectors for Drone Images

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

DroneFINE is a domain-aware PEFT approach for VLM-based drone detectors using foreground-aware multi-path adaptation and text-conditioned background suppression, outperforming standard PEFT and matching full fine-tuning on VisDrone and UAVDT with fewer trainable parameters.

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

Qwen-RobotWorld is a language-conditioned video world model using Double-Stream MMDiT, an 8.6M-frame embodied corpus, and progressive curriculum training that ranks first on EWMBench and DreamGen Bench.

citing papers explorer

Showing 2 of 2 citing papers after filters.

DroneFINE: Domain-Aware Parameter-Efficient Fine-Tuning of Vision-Language Detectors for Drone Images cs.CV · 2026-07-01 · unverdicted · none · ref 10
DroneFINE is a domain-aware PEFT approach for VLM-based drone detectors using foreground-aware multi-path adaptation and text-conditioned background suppression, outperforming standard PEFT and matching full fine-tuning on VisDrone and UAVDT with fewer trainable parameters.
Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation cs.CV · 2026-06-15 · unverdicted · none · ref 152
Qwen-RobotWorld is a language-conditioned video world model using Double-Stream MMDiT, an 8.6M-frame embodied corpus, and progressive curriculum training that ranks first on EWMBench and DreamGen Bench.

arXiv:2204.13653 , year=

fields

years

verdicts

representative citing papers

citing papers explorer