hub

arXiv preprint arXiv:2306.12156 (2023) 31

Zhao, X · 2023 · arXiv 2306.12156

37 Pith papers cite this work. Polarity classification is still indexing.

37 Pith papers citing it

read on arXiv browse 37 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

GaussLite: Online Task-Conditioned 3D Gaussian Splatting for Real-Time Robotic Mapping

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

GaussLite conditions 3D Gaussian Splatting seeding density, gradient flow, and scaling on task relevance masks derived from LLM-parsed natural language and open-vocabulary detection, yielding +2.72 dB ROI PSNR gains on Replica and +2.23 dB on real hardware at fixed budget.

OpenSGA: Efficient 3D Scene Graph Alignment in the Open World

cs.CV · 2026-05-11 · conditional · novelty 7.0

OpenSGA fuses vision-language, textual, and geometric features via a distance-gated attention encoder and minimum-cost-flow allocator to outperform prior methods on both frame-to-scan and subscan-to-subscan 3D scene graph alignment, backed by a new 700k-sample ScanNet-SG dataset.

LAGO: Language-Guided Adaptive Object-Region Focus for Zero-Shot Visual-Text Alignment

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

LAGO achieves state-of-the-art zero-shot performance with fewer image regions by using class-agnostic object discovery followed by confidence-controlled language-guided refinement and dual-channel aggregation.

Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection

cs.CV · 2026-04-13 · conditional · novelty 7.0

Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.

Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks

cs.CV · 2026-04-13 · accept · novelty 7.0

Boxes2Pixels distills noisy SAM pseudo-masks into a compact DINOv2-based student with auxiliary localization and one-sided self-correction, delivering +6.97 anomaly mIoU and +9.71 binary IoU gains over baselines on wind turbine data with 80% fewer parameters.

OmniOVCD: Streamlining Open-Vocabulary Change Detection with SAM 3

cs.CV · 2026-01-20 · conditional · novelty 7.0

OmniOVCD uses SAM 3's decoupled outputs and an SFID strategy to achieve state-of-the-art IoU scores of 67.2, 66.5, 24.5, and 27.1 on four OVCD benchmarks, surpassing prior methods.

Segmenting, Fast and Slow: Real-Time Open-Vocabulary Video Instance Segmentation with Dual-Path Processing

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

SegFS is a dual-path architecture that uses sparse keyframe open-vocabulary predictions to condition a fast feature-space network for efficient temporal instance segmentation in videos.

MV-GEL: Language-Driven Multi-View Geometric Entity Localization on Meshes

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

MV-GEL localizes fine-grained geometric entities on 3D meshes from natural language by ranking informative views with GELviews, applying VLM segmentation, and lifting masks via geometry-aware ray casting, reporting up to 1.7X face IoU and 4.5X edge F1 gains over baselines.

Rethinking Foundation Model Collaboration: Enhancing Specialized Models through Proxy Task Reasoning

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

FAT decomposes structured prediction into specialist hypothesis generation and foundation-model proxy reasoning, yielding consistent gains over baselines on detection, trajectory, and segmentation tasks.

Agentic Collaborative Cognition for Zero-Shot 3D Understanding

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

A collaborative Planning-Perception agent framework using MLLMs constructs a holistic cognitive map through iterative viewpoint supplementation and achieves reported SOTA gains on six 3D benchmarks.

ReA-OVCD: Reliability-Aware Open-Vocabulary Change Detection via Semantic and Spatial Refinement

cs.CV · 2026-06-18 · unverdicted · novelty 6.0

ReA-OVCD is a training-free reliability-aware method for open-vocabulary change detection that uses semantic change reasoning and boundary-aware refinement to reduce artifacts and improve accuracy on remote sensing datasets.

Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

Presents MMIOC-1M benchmark with 1M+ samples across 14 super-categories and RTVPNet with domain projection, sparse sampling, and bidirectional interaction, claiming SOTA on MMIOC-1M, LVIS, and COCO.

Meridian: Metric-Semantic Primitive Matching for Cross-View Geo-Localization Beyond Urban Environments

cs.RO · 2026-06-04 · unverdicted · novelty 6.0

Meridian matches metric-semantic primitives across aerial and ground views for training-free global localization in diverse natural environments, reporting 2.4 m average trajectory error over 19 km.

Self-Supervised Online Robot-Agnostic Traversability Estimation for Open-World Environments

cs.RO · 2026-05-27 · unverdicted · novelty 6.0

COTRATE is an online self-supervised framework that uses proprioceptive terrain assessment to supervise visual traversability estimation with alignment loss and diversity-aware replay for continual robot-agnostic learning.

Enabling Extensible Embodied Capabilities with Tools

cs.RO · 2026-05-26 · unverdicted · novelty 6.0

Introduces Embodied Tool Protocol and tool externalization to improve embodied AI performance on perception and cognition tasks, with measured gains but limits on execution capabilities.

InstructSAM: Segment Any Instance with Any Instructions

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

InstructSAM uses learnable queries in a VLM to condition SAM3 for single-pass multi-instance segmentation from arbitrary instructions, with a new Inst2Seg benchmark.

RepSAM: Bridging Foundation Models to Robotic Vision via Representation-Guided Adaptation

cs.RO · 2026-05-25 · unverdicted · novelty 6.0

RepSAM applies CKA-guided rank allocation in PEFT plus multi-modal fusion to adapt SAM, reaching 97.9% of full fine-tuning mIoU with 158x fewer parameters on robotic benchmarks.

P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

P2DNav proposes a three-part hierarchical framework (panorama-to-downview reasoning, sliding-window dialogue memory, and reflective reorientation) that reports large success-rate gains on the R2R-CE zero-shot VLN benchmark.

SparseSAM: Structured Sparsification of Activations in Segment Anything Models

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

SparseSAM achieves 2x faster inference and 2.8x memory reduction in SAM with only 0.004 mIoU loss at 0.4 density via Stripe-Sort Attention and Residual-Consistency MLP.

StateScribe: Towards Accessible Change Awareness Across Real-World Revisits

cs.HC · 2026-04-26 · unverdicted · novelty 6.0

StateScribe uses a dual-layer memory architecture for episodic scenes and object-centric changes to deliver live and historical descriptions, achieving 83.1% F1 accuracy across revisits in evaluations and user studies with BLV participants.

GRAIL: Autonomous Concept Grounding for Neuro-Symbolic Reinforcement Learning

cs.AI · 2026-04-18 · unverdicted · novelty 6.0

GRAIL autonomously grounds relational concepts in NeSy-RL by using LLM weak supervision followed by interaction-based refinement, matching or exceeding manually defined concepts on Atari games.

H-SPAM: Hierarchical Superpixel Anything Model

cs.CV · 2026-04-13 · conditional · novelty 6.0

H-SPAM produces accurate, regular, and perfectly nested hierarchical superpixels that outperform prior hierarchical methods and match recent non-hierarchical state-of-the-art.

Simulation-Driven Evolutionary Motion Parameterization for Contact-Rich Granular Scooping with a Soft Conical Robotic Hand

cs.RO · 2026-04-07 · unverdicted · novelty 6.0

A deformable soft conical hand is modeled in physics simulation and its scooping trajectories are optimized via evolutionary search, enabling effective contact-rich granular tasks validated in both simulation and physical robot experiments.

AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning

cs.CV · 2025-09-30 · unverdicted · novelty 6.0

AIM-CoT enhances interleaved multimodal chain-of-thought reasoning by adding context-enhanced attention generation, active visual probing via information foraging, and dynamic attention-shift triggering.

citing papers explorer

Showing 15 of 15 citing papers after filters.

GaussLite: Online Task-Conditioned 3D Gaussian Splatting for Real-Time Robotic Mapping cs.CV · 2026-06-29 · unverdicted · none · ref 43
GaussLite conditions 3D Gaussian Splatting seeding density, gradient flow, and scaling on task relevance masks derived from LLM-parsed natural language and open-vocabulary detection, yielding +2.72 dB ROI PSNR gains on Replica and +2.23 dB on real hardware at fixed budget.
LAGO: Language-Guided Adaptive Object-Region Focus for Zero-Shot Visual-Text Alignment cs.CV · 2026-05-04 · unverdicted · none · ref 40
LAGO achieves state-of-the-art zero-shot performance with fewer image regions by using class-agnostic object discovery followed by confidence-controlled language-guided refinement and dual-channel aggregation.
Segmenting, Fast and Slow: Real-Time Open-Vocabulary Video Instance Segmentation with Dual-Path Processing cs.CV · 2026-06-30 · unverdicted · none · ref 52
SegFS is a dual-path architecture that uses sparse keyframe open-vocabulary predictions to condition a fast feature-space network for efficient temporal instance segmentation in videos.
MV-GEL: Language-Driven Multi-View Geometric Entity Localization on Meshes cs.CV · 2026-06-30 · unverdicted · none · ref 53
MV-GEL localizes fine-grained geometric entities on 3D meshes from natural language by ranking informative views with GELviews, applying VLM segmentation, and lifting masks via geometry-aware ray casting, reporting up to 1.7X face IoU and 4.5X edge F1 gains over baselines.
Rethinking Foundation Model Collaboration: Enhancing Specialized Models through Proxy Task Reasoning cs.CV · 2026-06-30 · unverdicted · none · ref 31
FAT decomposes structured prediction into specialist hypothesis generation and foundation-model proxy reasoning, yielding consistent gains over baselines on detection, trajectory, and segmentation tasks.
Agentic Collaborative Cognition for Zero-Shot 3D Understanding cs.CV · 2026-06-23 · unverdicted · none · ref 60
A collaborative Planning-Perception agent framework using MLLMs constructs a holistic cognitive map through iterative viewpoint supplementation and achieves reported SOTA gains on six 3D benchmarks.
ReA-OVCD: Reliability-Aware Open-Vocabulary Change Detection via Semantic and Spatial Refinement cs.CV · 2026-06-18 · unverdicted · none · ref 17
ReA-OVCD is a training-free reliability-aware method for open-vocabulary change detection that uses semantic change reasoning and boundary-aware refinement to reduce artifacts and improve accuracy on remote sensing datasets.
InstructSAM: Segment Any Instance with Any Instructions cs.CV · 2026-05-25 · unverdicted · none · ref 20
InstructSAM uses learnable queries in a VLM to condition SAM3 for single-pass multi-instance segmentation from arbitrary instructions, with a new Inst2Seg benchmark.
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation cs.CV · 2026-05-19 · unverdicted · none · ref 30
P2DNav proposes a three-part hierarchical framework (panorama-to-downview reasoning, sliding-window dialogue memory, and reflective reorientation) that reports large success-rate gains on the R2R-CE zero-shot VLN benchmark.
SparseSAM: Structured Sparsification of Activations in Segment Anything Models cs.CV · 2026-05-17 · unverdicted · none · ref 37
SparseSAM achieves 2x faster inference and 2.8x memory reduction in SAM with only 0.004 mIoU loss at 0.4 density via Stripe-Sort Attention and Residual-Consistency MLP.
AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning cs.CV · 2025-09-30 · unverdicted · none · ref 18
AIM-CoT enhances interleaved multimodal chain-of-thought reasoning by adding context-enhanced attention generation, active visual probing via information foraging, and dynamic attention-shift triggering.
ESAM++: Efficient Online 3D Perception on the Edge cs.CV · 2026-05-28 · unverdicted · none · ref 44
ESAM++ introduces a 3D Sparse Feature Pyramid Network for efficient online 3D scene perception on edge devices, claiming competitive accuracy with up to 3x faster inference and 2x smaller model size than ESAM on four benchmarks.
RoadGIE: Towards A Global-Scale Aerial Benchmark for Generalizable Interactive Road Extraction cs.CV · 2026-05-26 · unverdicted · none · ref 51
Introduces the largest global aerial road segmentation dataset and RoadGIE, an interactive model using topology-aware prompts that reports SOTA accuracy and connectivity on the new benchmark with a 3.7M parameter network.
Weight Group-wise Post-Training Quantization for Medical Foundation Model cs.CV · 2026-04-09 · unverdicted · none · ref 29
Permutation-COMQ is a new post-training quantization algorithm that reorders weights within layers and uses only dot-product and rounding steps to deliver the highest reported accuracy for 2-, 4-, and 8-bit medical foundation models.
Semantic-Fast-SAM: Efficient Semantic Segmenter cs.CV · 2026-04-22 · unverdicted · none · ref 7
Semantic-Fast-SAM matches prior SAM-based semantic segmentation accuracy on Cityscapes and ADE20K while running about 20 times faster by combining FastSAM with SSA labeling and CLIP for open-vocabulary cases.

arXiv preprint arXiv:2306.12156 (2023) 31

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer