Mixed citations

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp

Zhai, X · 2023 · DOI 10.1109/iccv51070

Mixed citation behavior. Most common role is background (38%).

15 Pith papers citing it

Background 38% of classified citations

open at publisher browse 15 citing papers

citation-role summary

background 3 dataset 3 method 2

citation-polarity summary

background 3 use dataset 3 use method 2

representative citing papers

ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue

cs.RO · 2026-05-02 · unverdicted · novelty 7.0

ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.

Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.

BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

cs.CV · 2025-06-03 · unverdicted · novelty 7.0

BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.

SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.

Creative Robot Tool Use by Counterfactual Reasoning

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

Robots discover causal tool features through VLM suggestions and physics-based counterfactual perturbations in simulation, then transfer manipulation skills via conditioned keypoint matching.

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

cs.MM · 2026-05-01 · unverdicted · novelty 6.0

CustomDancer achieves state-of-the-art text-to-dance retrieval with 10.23% Recall@1 on the new TD-Data dataset by aligning text, music, and motion features through a CLIP-based framework.

ICPR 2026 Competition on Low-Resolution License Plate Recognition

cs.CV · 2026-04-24 · accept · novelty 6.0

The ICPR 2026 LRLPR competition on real low-quality license plate images drew 99 valid submissions, with the winning team reaching 82.13% recognition rate and four teams exceeding 80%.

GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

GOLD-BEV learns dense BEV semantic maps including dynamic agents from ego-centric sensors by using synchronized aerial imagery for training supervision and pseudo-label generation.

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.

ERIS: Enhancing Privacy and Scalability in Federated Learning via Federated Shard Aggregation

cs.LG · 2026-02-09 · unverdicted · novelty 6.0

ERIS partitions client updates into shards aggregated across multiple client-side nodes to reduce communication bottlenecks, limit information exposure, and preserve FedAvg-level utility while improving resistance to inference attacks.

ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers

cs.CV · 2025-05-26 · unverdicted · novelty 6.0

ViTaPEs uses two-stage positional encodings in a multimodal transformer to learn task-agnostic visuotactile representations that outperform baselines on recognition tasks, show zero-shot generalization, and improve robotic grasp success prediction.

Rethinking the Good Enough Embedding for Easy Few-Shot Learning

cs.CV · 2026-05-13 · conditional · novelty 5.0

Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.

Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

cs.CV · 2025-08-28 · unverdicted · novelty 3.0

A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

citing papers explorer

Showing 15 of 15 citing papers.

ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue cs.RO · 2026-05-02 · unverdicted · none · ref 36
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On cs.CV · 2026-04-30 · unverdicted · none · ref 32
A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 55
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations cs.CV · 2025-06-03 · unverdicted · none · ref 24
BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.
SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation cs.CV · 2026-05-17 · unverdicted · none · ref 26 · 2 links
SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.
Creative Robot Tool Use by Counterfactual Reasoning cs.RO · 2026-05-06 · unverdicted · none · ref 59
Robots discover causal tool features through VLM suggestions and physics-based counterfactual perturbations in simulation, then transfer manipulation skills via conditioned keypoint matching.
CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval cs.MM · 2026-05-01 · unverdicted · none · ref 1
CustomDancer achieves state-of-the-art text-to-dance retrieval with 10.23% Recall@1 on the new TD-Data dataset by aligning text, music, and motion features through a CLIP-based framework.
ICPR 2026 Competition on Low-Resolution License Plate Recognition cs.CV · 2026-04-24 · accept · none · ref 25
The ICPR 2026 LRLPR competition on real low-quality license plate images drew 99 valid submissions, with the winning team reaching 82.13% recognition rate and four teams exceeding 80%.
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes cs.CV · 2026-04-21 · unverdicted · none · ref 32
GOLD-BEV learns dense BEV semantic maps including dynamic agents from ego-centric sensors by using synchronized aerial imagery for training supervision and pseudo-label generation.
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing cs.CV · 2026-04-03 · unverdicted · none · ref 6
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
ERIS: Enhancing Privacy and Scalability in Federated Learning via Federated Shard Aggregation cs.LG · 2026-02-09 · unverdicted · none · ref 11
ERIS partitions client updates into shards aggregated across multiple client-side nodes to reduce communication bottlenecks, limit information exposure, and preserve FedAvg-level utility while improving resistance to inference attacks.
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers cs.CV · 2025-05-26 · unverdicted · none · ref 7
ViTaPEs uses two-stage positional encodings in a multimodal transformer to learn task-agnostic visuotactile representations that outperform baselines on recognition tasks, show zero-shot generalization, and improve robotic grasp success prediction.
Rethinking the Good Enough Embedding for Easy Few-Shot Learning cs.CV · 2026-05-13 · conditional · none · ref 4
Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.
Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning cs.CV · 2026-04-07 · unverdicted · none · ref 15
GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding cs.CV · 2025-08-28 · unverdicted · none · ref 57
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer