hub Mixed citations

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al · 2023

Mixed citation behavior. Most common role is background (44%).

20 Pith papers citing it

Background 44% of classified citations

browse 20 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 4 baseline 2 dataset 2 method 1

citation-polarity summary

background 4 baseline 2 use dataset 2 use method 1

representative citing papers

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.

RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis

eess.IV · 2025-07-07 · unverdicted · novelty 8.0

Introduces RAM-W600, the first public multi-task dataset of wrist conventional radiographs with instance segmentation annotations and Sharp/van der Heijde bone erosion scores for rheumatoid arthritis research.

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

Presents CUActSpot benchmark and renderer-LLM data synthesis that lets a 4B model outperform larger open-source models on complex computer interactions.

TOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Models

cs.CV · 2026-05-11 · conditional · novelty 7.0 · 2 refs

TOC-Bench is a new diagnostic benchmark that reveals major weaknesses in temporal object consistency for Video-LLMs, including event counting, ordering, identity reasoning, and hallucination avoidance.

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

cs.CV · 2025-06-10 · unverdicted · novelty 7.0

AVA-Bench evaluates vision foundation models by disentangling 14 atomic visual abilities with aligned training-test distributions to reveal precise ability fingerprints.

Motion Cues from Image-based Point Tracking for LiDAR Scene Flow Estimation

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

TrackCue uses dense image-space trajectories from point tracking and ego-motion compensation to improve static-dynamic classification and supervision for LiDAR scene flow estimation.

No One Knows the State of the Art in Geospatial Foundation Models

cs.CV · 2026-05-12 · accept · novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation with no minority examples in training.

ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.

UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces

cs.NE · 2026-04-30 · unverdicted · novelty 6.0

UniBCI is a unified pretrained model for invasive neural spike data that uses CST tokenization, IAA attention, and self-supervised masked reconstruction to achieve SOTA downstream performance with better generalization and efficiency.

Precise Aggressive Aerial Maneuvers with Sensorimotor Policies

cs.RO · 2026-04-07 · unverdicted · novelty 6.0

Reinforcement learning sensorimotor policies enable quadrotors to traverse narrow gaps at extreme tilts with 5 cm clearance using only vision and proprioception, including reactive traversal of moving gaps.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

ImgEdit: A Unified Image Editing Dataset and Benchmark

cs.CV · 2025-05-26 · conditional · novelty 6.0

ImgEdit supplies 1.2 million curated edit pairs and a three-part benchmark that let a VLM-based model outperform prior open-source editors on adherence, quality, and detail preservation.

RadGenome-Anatomy: A Large-Scale Anatomy-Labeled Chest Radiograph Dataset via Physically Grounded Volumetric Projection

cs.CV · 2026-05-17 · unverdicted · novelty 5.0

RadGenome-Anatomy is a large-scale chest radiograph dataset with anatomy labels obtained by projecting 3D CT masks into 2D radiographic space for 210 structures in 25,692 studies.

HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization

cs.CV · 2026-05-13 · unverdicted · novelty 5.0 · 2 refs

HarmoGS adds semantic consistency-guided masking and dual-view orthogonal gradient harmonization to 3D Gaussian Splatting to reduce artifacts from distractors and cross-view illumination inconsistencies.

Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

cs.CV · 2025-07-03 · unverdicted · novelty 5.0

MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.

LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models

cs.CV · 2025-05-21 · unverdicted · novelty 5.0

LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

cs.CV · 2025-05-14 · conditional · novelty 5.0

BLIP3-o uses a diffusion transformer to generate CLIP image features and a sequential pretraining strategy to build open models that perform strongly on both image understanding and generation benchmarks.

Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering

cs.CV · 2025-06-02

citing papers explorer

Showing 20 of 20 citing papers.

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views cs.CV · 2026-05-08 · unverdicted · none · ref 32
GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.
RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis eess.IV · 2025-07-07 · unverdicted · none · ref 36
Introduces RAM-W600, the first public multi-task dataset of wrist conventional radiographs with instance segmentation annotations and Sharp/van der Heijde bone erosion scores for rheumatoid arthritis research.
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark cs.CV · 2026-05-12 · unverdicted · none · ref 40
Presents CUActSpot benchmark and renderer-LLM data synthesis that lets a 4B model outperform larger open-source models on complex computer interactions.
TOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Models cs.CV · 2026-05-11 · conditional · none · ref 17 · 2 links
TOC-Bench is a new diagnostic benchmark that reveals major weaknesses in temporal object consistency for Video-LLMs, including event counting, ordering, identity reasoning, and hallucination avoidance.
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models cs.CV · 2025-06-10 · unverdicted · none · ref 44
AVA-Bench evaluates vision foundation models by disentangling 14 atomic visual abilities with aligned training-test distributions to reveal precise ability fingerprints.
Motion Cues from Image-based Point Tracking for LiDAR Scene Flow Estimation cs.CV · 2026-05-16 · unverdicted · none · ref 51
TrackCue uses dense image-space trajectories from point tracking and ego-motion compensation to improve static-dynamic classification and supervision for LiDAR scene flow estimation.
No One Knows the State of the Art in Geospatial Foundation Models cs.CV · 2026-05-12 · accept · none · ref 39
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs cs.CV · 2026-05-11 · unverdicted · none · ref 19
Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation with no minority examples in training.
ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking cs.CV · 2026-05-04 · unverdicted · none · ref 16
ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.
UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces cs.NE · 2026-04-30 · unverdicted · none · ref 44
UniBCI is a unified pretrained model for invasive neural spike data that uses CST tokenization, IAA attention, and self-supervised masked reconstruction to achieve SOTA downstream performance with better generalization and efficiency.
Precise Aggressive Aerial Maneuvers with Sensorimotor Policies cs.RO · 2026-04-07 · unverdicted · none · ref 51
Reinforcement learning sensorimotor policies enable quadrotors to traverse narrow gaps at extreme tilts with 5 cm clearance using only vision and proprioception, including reactive traversal of moving gaps.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 11
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
ImgEdit: A Unified Image Editing Dataset and Benchmark cs.CV · 2025-05-26 · conditional · none · ref 31
ImgEdit supplies 1.2 million curated edit pairs and a three-part benchmark that let a VLM-based model outperform prior open-source editors on adherence, quality, and detail preservation.
RadGenome-Anatomy: A Large-Scale Anatomy-Labeled Chest Radiograph Dataset via Physically Grounded Volumetric Projection cs.CV · 2026-05-17 · unverdicted · none · ref 22
RadGenome-Anatomy is a large-scale chest radiograph dataset with anatomy labels obtained by projecting 3D CT masks into 2D radiographic space for 210 structures in 25,692 studies.
HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization cs.CV · 2026-05-13 · unverdicted · none · ref 11 · 2 links
HarmoGS adds semantic consistency-guided masking and dual-view orthogonal gradient harmonization to 3D Gaussian Splatting to reduce artifacts from distractors and cross-view illumination inconsistencies.
Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 8
FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details cs.CV · 2025-07-03 · unverdicted · none · ref 29
MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models cs.CV · 2025-05-21 · unverdicted · none · ref 39
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset cs.CV · 2025-05-14 · conditional · none · ref 13
BLIP3-o uses a diffusion transformer to generate CLIP image features and a sequential pretraining strategy to build open models that perform strongly on both image understanding and generation benchmarks.
Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering cs.CV · 2025-06-02 · unreviewed · ref 2

Segment anything

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer