hub

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author= · 2014

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

browse 16 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

Training on automatically generated hard negative captions improves vision-language models' zero-shot detection of fine-grained image-text mismatches and robustness to noisy inputs.

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra

MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.

VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

VPD-100K is a large-scale fine-grained visual privacy dataset with 100k images and 33 classes, accompanied by a frequency-domain attention module that improves detection on image and video benchmarks.

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Head Similarity extends identity recognition to structured whole-head similarity by capturing intra-identity appearance variations via hierarchical supervision on a weakly-labeled video benchmark.

Geometric Decoupling: Diagnosing the Structural Instability of Latent

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

Latent diffusion models exhibit geometric decoupling where curvature in out-of-distribution generation is misallocated to unstable semantic boundaries instead of image details, identifying geometric hotspots as the structural cause of editing instability.

STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

STAR-IOD applies scale-decoupled topology alignment and K-Means-based pseudo-label refinement to reduce catastrophic forgetting in remote sensing incremental object detection, reporting 1.7% and 2.1% mAP gains on new DIOR-IOD and DOTA-IOD datasets.

Stable and Near-Reversible Diffusion ODE Solvers for Image Editing

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.

Information theoretic underpinning of self-supervised learning by clustering

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.

APEX: Assumption-free Projection-based Embedding eXamination Metric for Image Quality Assessment

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

APEX is an assumption-free image quality metric using Sliced Wasserstein Distance on CLIP and DINOv2 embeddings that claims superior robustness to degradations and cross-dataset stability.

Unifying Deep Stochastic Processes for Image Enhancement

cs.CV · 2026-05-02 · unverdicted · novelty 5.0

Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.

LLM Benchmark Datasets Should Be Contamination-Resistant

cs.LG · 2026-05-19 · unverdicted · novelty 4.0

Authors call for contamination-resistant LLM benchmarks that exploit Transformer training-inference asymmetry and require new mathematical methods for cross-architecture interoperability.

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

cs.CV · 2026-05-14

VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation

cs.CL · 2026-05-03

citing papers explorer

Showing 16 of 16 citing papers.

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing cs.CL · 2026-05-07 · unverdicted · none · ref 88
VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities cs.CL · 2026-05-06 · unverdicted · none · ref 2
Training on automatically generated hard negative captions improves vision-language models' zero-shot detection of fine-grained image-text mismatches and robustness to noisy inputs.
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety cs.CR · 2026-04-21 · unverdicted · none · ref 102
ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training cs.CV · 2026-05-18 · unverdicted · none · ref 62
SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra
MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation cs.CV · 2026-05-12 · unverdicted · none · ref 16
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.
VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection cs.CV · 2026-05-11 · unverdicted · none · ref 84
VPD-100K is a large-scale fine-grained visual privacy dataset with 100k images and 33 classes, accompanied by a frequency-domain attention module that improves detection on image and video benchmarks.
Head Similarity: Modeling Structured Whole-Head Appearance Beyond Face Recognition cs.CV · 2026-05-08 · unverdicted · none · ref 69
Head Similarity extends identity recognition to structured whole-head similarity by capturing intra-identity appearance variations via hierarchical supervision on a weakly-labeled video benchmark.
Geometric Decoupling: Diagnosing the Structural Instability of Latent cs.CV · 2026-04-20 · unverdicted · none · ref 50
Latent diffusion models exhibit geometric decoupling where curvature in out-of-distribution generation is misallocated to unstable semantic boundaries instead of image details, identifying geometric hotspots as the structural cause of editing instability.
STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection cs.CV · 2026-05-20 · unverdicted · none · ref 162
STAR-IOD applies scale-decoupled topology alignment and K-Means-based pseudo-label refinement to reduce catastrophic forgetting in remote sensing incremental object detection, reporting 1.7% and 2.1% mAP gains on new DIOR-IOD and DOTA-IOD datasets.
Stable and Near-Reversible Diffusion ODE Solvers for Image Editing cs.CV · 2026-05-12 · unverdicted · none · ref 23
Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.
Information theoretic underpinning of self-supervised learning by clustering cs.LG · 2026-05-12 · unverdicted · none · ref 31
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
APEX: Assumption-free Projection-based Embedding eXamination Metric for Image Quality Assessment cs.CV · 2026-05-08 · unverdicted · none · ref 11
APEX is an assumption-free image quality metric using Sliced Wasserstein Distance on CLIP and DINOv2 embeddings that claims superior robustness to degradations and cross-dataset stability.
Unifying Deep Stochastic Processes for Image Enhancement cs.CV · 2026-05-02 · unverdicted · none · ref 10
Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.
LLM Benchmark Datasets Should Be Contamination-Resistant cs.LG · 2026-05-19 · unverdicted · none · ref 99
Authors call for contamination-resistant LLM benchmarks that exploit Transformer training-inference asymmetry and require new mathematical methods for cross-architecture interoperability.
Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers cs.CV · 2026-05-14 · unreviewed · ref 41
VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation cs.CL · 2026-05-03 · unreviewed · ref 104

European conference on computer vision , pages=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer