Mixed citations

Title resolution pending

K.-I. Goh, C.-M. Ghim, B. Kahng, D. Kim · 2003 · Physical Review Letters · DOI 10.1103/physrevlett.91.189804

Mixed citation behavior. Most common role is background (60%).

40 Pith papers citing it

22 external citations · Crossref

Background 60% of classified citations

open at publisher browse 40 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 6 method 2 baseline 1 dataset 1

citation-polarity summary

background 6 use method 2 baseline 1 use dataset 1

representative citing papers

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation

cs.CV · 2026-04-13 · unverdicted · novelty 8.0

The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

cs.CV · 2026-04-10 · accept · novelty 8.0

HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection

cs.CV · 2026-04-09 · unverdicted · novelty 8.0

The first benchmark for AI-generated scientific figure detection shows existing detectors fail in zero-shot transfer, overfit to specific generators, and break under common image corruptions.

Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings

cs.IR · 2026-05-13 · unverdicted · novelty 7.0

Image meanings grow more context-dependent with semantic abstraction, requiring narrative grounding for accurate retrieval at higher levels.

SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

SpatialGrammar provides a grid-based DSL and compiler that lets LLMs generate collision-free 3D indoor scenes more reliably than raw-coordinate or code-based approaches.

MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation

cs.CV · 2026-04-26 · unverdicted · novelty 7.0 · 2 refs

MuSS is a new movie-sourced dataset and benchmark that enables AI models to generate multi-shot videos with improved narrative coherence and subject identity preservation.

CFSR: Geometry-Conditioned Shadow Removal via Physical Disentanglement

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

CFSR reframes shadow removal as a physics-constrained process using geometric and semantic priors from depth, DINO, CLIP, and frequency decoupling to achieve claimed state-of-the-art results.

Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection

cs.CV · 2026-04-13 · conditional · novelty 7.0

Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.

VidAudio-Bench: Benchmarking V2A and VT2A Generation across Four Audio Categories

cs.SD · 2026-04-12 · unverdicted · novelty 7.0

VidAudio-Bench benchmarks V2A and VT2A models across four audio categories, revealing poor speech/singing performance and a tension between visual alignment and text instruction following.

DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

DIRECT uses a three-level multi-agent framework to solve video mashup creation as a multimodal coherency problem, outperforming baselines on a new benchmark.

When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models

cs.CV · 2026-03-29 · unverdicted · novelty 7.0

A wrinkle-field perturbation method creates photorealistic non-rigid image changes that degrade state-of-the-art VLMs on image captioning and VQA more effectively than prior baselines.

Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

A feature supervision approach using SigLIP 2 extracts multi-granularity vision-aligned text representations to supervise MM-DiT image branches, pushing the Pareto frontier for portrait generation across alignment, realism, and aesthetics.

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.

Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems

cs.AI · 2026-05-03 · unverdicted · novelty 6.0 · 3 refs

FLP uses multi-persona foresight simulation to detect infections via response diversity and applies local purification to reduce maximum cumulative infection rates in multi-agent systems from over 95% to below 5.47%.

FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

cs.RO · 2026-04-27 · unverdicted · novelty 6.0

FreqCache uses frequency domain properties to adaptively select, refresh, and budget token caches in VLN models, delivering 1.59x speedup with negligible overhead.

Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Q-Gate dynamically routes keyframe selection in long videos via query-modulated gating across visual grounding, global matching, and contextual alignment experts to improve MLLM performance.

ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

ArtifactWorld restores artifacts in 3D Gaussian Splatting by training a video diffusion backbone on 107.5K paired clips with an isomorphic predictor for artifact heatmaps and an Artifact-Aware Triplet Fusion mechanism to achieve better sparse-view novel synthesis.

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.

ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

cs.IR · 2026-04-08 · unverdicted · novelty 6.0

ReAlign improves visual document retrieval by training retrievers to match query-induced rankings with rankings derived from VLM-generated, region-focused descriptions of relevant page content.

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

CoME-VL fuses contrastive and self-supervised vision encoders via entropy-guided multi-layer aggregation and RoPE cross-attention to improve vision-language model performance on benchmarks.

STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.

The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

cs.HC · 2026-01-14 · conditional · novelty 6.0

LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.

citing papers explorer

Showing 40 of 40 citing papers.

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation cs.CV · 2026-04-13 · unverdicted · none · ref 21
The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing cs.CV · 2026-04-10 · accept · none · ref 41
HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.
SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection cs.CV · 2026-04-09 · unverdicted · none · ref 34
The first benchmark for AI-generated scientific figure detection shows existing detectors fail in zero-shot transfer, overfit to specific generators, and break under common image corruptions.
Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings cs.IR · 2026-05-13 · unverdicted · none · ref 14
Image meanings grow more context-dependent with semantic abstraction, requiring narrative grounding for accurate retrieval at higher levels.
SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation cs.AI · 2026-04-30 · unverdicted · none · ref 22
SpatialGrammar provides a grid-based DSL and compiler that lets LLMs generate collision-free 3D indoor scenes more reliably than raw-coordinate or code-based approaches.
MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation cs.CV · 2026-04-26 · unverdicted · none · ref 31 · 2 links
MuSS is a new movie-sourced dataset and benchmark that enables AI models to generate multi-shot videos with improved narrative coherence and subject identity preservation.
CFSR: Geometry-Conditioned Shadow Removal via Physical Disentanglement cs.CV · 2026-04-20 · unverdicted · none · ref 39
CFSR reframes shadow removal as a physics-constrained process using geometric and semantic priors from depth, DINO, CLIP, and frequency decoupling to achieve claimed state-of-the-art results.
Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment cs.CV · 2026-04-18 · unverdicted · none · ref 31
RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 40
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection cs.CV · 2026-04-13 · conditional · none · ref 57
Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.
VidAudio-Bench: Benchmarking V2A and VT2A Generation across Four Audio Categories cs.SD · 2026-04-12 · unverdicted · none · ref 38
VidAudio-Bench benchmarks V2A and VT2A models across four audio categories, revealing poor speech/singing performance and a tension between visual alignment and text instruction following.
DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing cs.CV · 2026-04-06 · unverdicted · none · ref 25
DIRECT uses a three-level multi-agent framework to solve video mashup creation as a multimodal coherency problem, outperforming baselines on a new benchmark.
When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models cs.CV · 2026-03-29 · unverdicted · none · ref 27
A wrinkle-field perturbation method creates photorealistic non-rigid image changes that degrade state-of-the-art VLMs on image captioning and VQA more effectively than prior baselines.
Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics cs.CV · 2026-05-20 · unverdicted · none · ref 19
A feature supervision approach using SigLIP 2 extracts multi-granularity vision-aligned text representations to supervise MM-DiT image branches, pushing the Pareto frontier for portrait generation across alignment, realism, and aesthetics.
OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models cs.AI · 2026-05-18 · unverdicted · none · ref 34
OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.
Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems cs.AI · 2026-05-03 · unverdicted · none · ref 32 · 3 links
FLP uses multi-persona foresight simulation to detect infections via response diversity and applies local purification to reduce maximum cumulative infection rates in multi-agent systems from over 95% to below 5.47%.
FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching cs.RO · 2026-04-27 · unverdicted · none · ref 20
FreqCache uses frequency domain properties to adaptively select, refresh, and budget token caches in VLN models, delivering 1.59x speedup with negligible overhead.
Where to Focus: Query-Modulated Multimodal Keyframe Selection for Long Video Understanding cs.CV · 2026-04-19 · unverdicted · none · ref 23
Q-Gate dynamically routes keyframe selection in long videos via query-modulated gating across visual grounding, global matching, and contextual alignment experts to improve MLLM performance.
ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models cs.CV · 2026-04-14 · unverdicted · none · ref 17
ArtifactWorld restores artifacts in 3D Gaussian Splatting by training a video diffusion backbone on 107.5K paired clips with an isomorphic predictor for artifact heatmaps and an Artifact-Aware Triplet Fusion mechanism to achieve better sparse-view novel synthesis.
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis cs.CV · 2026-04-08 · unverdicted · none · ref 26
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment cs.IR · 2026-04-08 · unverdicted · none · ref 64
ReAlign improves visual document retrieval by training retrievers to match query-induced rankings with rankings derived from VLM-generated, region-focused descriptions of relevant page content.
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning cs.CV · 2026-04-03 · unverdicted · none · ref 51
CoME-VL fuses contrastive and self-supervised vision encoders via entropy-guided multi-layer aggregation and RoPE cross-attention to improve vision-language model performance on benchmarks.
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models cs.CV · 2026-04-03 · unverdicted · none · ref 37
STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.
The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor cs.HC · 2026-01-14 · conditional · none · ref 80
LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.
ScaleDoc: Scaling LLM-based Predicates over Large Document Collections cs.DB · 2025-09-16 · unverdicted · none · ref 34
ScaleDoc achieves over 2x end-to-end speedup and up to 85% fewer LLM invocations for semantic predicates on large document collections via offline LLM representations, contrastive-trained proxy filtering, and adaptive cascades.
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset cs.CR · 2025-06-20 · unverdicted · none · ref 112
An empirical audit of one web-scraped ML training dataset reveals persistent PII after sanitization, which the authors combine with legal analysis to highlight privacy risks and advocate redefining 'publicly available' data for AI training.
MINT: Multi-Vector Search Index Tuning cs.DB · 2025-04-28 · unverdicted · none · ref 58
MINT defines multi-vector search index tuning and provides algorithms that achieve 2.1X to 8.3X latency speedup over baselines under storage and recall constraints.
Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models cs.CV · 2026-05-21 · unverdicted · none · ref 98
DEX is a modular network using dynamically activated experts and a group-EMA director to learn emergent modular representations for multi-modality medical vision foundation models, evaluated on a new 4M-image benchmark across 10 modalities and 26 downstream tasks.
DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer cs.GR · 2026-05-15 · unverdicted · none · ref 45
DealMaTe proposes a simplified diffusion framework for material transfer that injects multi-dimensional 3D conditions via Multi-Dim 3D Shader LoRA and Shader Causal Mutual Attention with KV caching.
Observe Less, Understand More: Cost-aware Cross-scale Observation for Remote Sensing Understanding cs.CV · 2026-04-13 · unverdicted · none · ref 36
A unified cost-aware formulation couples fine-grained high-resolution sampling decisions with cross-patch representation prediction to achieve superior performance-cost trade-offs on remote sensing recognition and retrieval tasks using a new 10M-image benchmark.
Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection cs.CV · 2026-04-13 · unverdicted · none · ref 36
A text-guided fusion method for RGB-IR object detection aligns modalities via semantic bridging and incorporates both consensus and discrepancy cues through dynamic recalibration.
On Semiotic-Grounded Interpretive Evaluation of Generative Art cs.CV · 2026-04-09 · unverdicted · none · ref 71
SemJudge uses a Hierarchical Semiosis Graph based on Peircean theory to evaluate deeper artistic meaning in generative art and aligns better with human judgments than prior metrics.
WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval cs.CV · 2026-04-07 · unverdicted · none · ref 49
WRF4CIR uses weight-regularized fine-tuning with adversarial perturbations to mitigate overfitting in composed image retrieval and narrows the generalization gap on benchmarks.
EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development cs.HC · 2026-04-06 · unverdicted · none · ref 83
EcoAssist embeds energy estimation and optimization into AI-assisted frontend coding, reducing website energy use by 13-16% in benchmarks while preserving developer productivity.
Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation cs.IR · 2025-11-24 · unverdicted · none · ref 36
HaNoRec dynamically weights harder preference samples and applies Gaussian perturbations to output distributions to improve multimodal LLM performance on sequential recommendation tasks.
Zoom In, Reason Out: Efficient Far-field Anomaly Detection in Expressway Surveillance Videos via Focused VLM Reasoning Guided by Bayesian Inference cs.CV · 2026-04-26 · unverdicted · none · ref 28
VIBES uses Bayesian inference to trigger focused VLM reasoning on localized far-field regions in expressway videos, improving anomaly detection accuracy and efficiency.
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning cs.LG · 2026-05-22 · unreviewed · ref 71
Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs cs.AI · 2026-05-18 · unreviewed · ref 38
Eulerian Motion Guidance: Robust Image Animation via Bidirectional Geometric Consistency cs.CV · 2026-05-07 · unreviewed · ref 23 · 2 links
Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings cs.CV · 2026-04-24 · unreviewed · ref 39

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer