hub

Finecir: Explicit parsing of fine- grained modification semantics for composed image re- trieval.https://arxiv.org/abs/2503.21309

FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval · 2025 · arXiv 2503.21309

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 baseline 1 method 1

citation-polarity summary

baseline 1 unclear 1 use method 1

representative citing papers

FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

cs.CV · 2026-04-11 · unverdicted · novelty 8.0

FashionMV introduces product-level multi-view CIR, a 127K-product dataset built via automated LMM pipeline, and a 0.8B ProCIR model that beats larger baselines on three fashion benchmarks.

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

cs.CL · 2026-06-09 · unverdicted · novelty 6.0

First end-to-end RAG on mobile NPU delivers 18.1x faster prefilling, 4x lower latency and energy than CPU on Snapdragon X Elite with equivalent quality.

COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

COMBINER proposes a new architecture for composed image retrieval using adaptive semantic disentanglement, unified prototype-based composition, and dual attribute-based relation modeling to address visually similar but attribute-unrelated samples.

Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

cs.CL · 2026-05-29 · unverdicted · novelty 6.0 · 2 refs

TOPD improves on-policy distillation for LLM reasoning by using near-future guidance to identify divergent states, raising average accuracy from 47.8% to 52.2% on math benchmarks including AIME24 and AIME25.

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

Rock Tokens in on-policy distillation persist at high loss, account for up to 18% of outputs, absorb large gradient norms, but add negligible value to reasoning performance.

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.

ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

ReTrack calibrates directional bias in composed video features using semantic disentanglement and bidirectional evidence alignment to improve retrieval performance on CVR and CIR tasks.

AgentIAD: Agentic Industrial Anomaly Detection via Adaptive Memory Augmentation

cs.CV · 2025-12-15 · unverdicted · novelty 6.0

AgentIAD introduces an agentic VLM with Perceptive Zoomer, Web Searcher, and Comparative Retriever tools plus two-stage SFT-then-RL training, achieving 5.92% higher classification accuracy than prior SOTA on the MMAD benchmark.

R^3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking

cs.CV · 2026-05-31 · unverdicted · novelty 5.0

R^3 is a zero-shot pipeline that generates reasoning traces to augment composed video queries, fuses scores via agreement-gated residual, and re-ranks candidates for the CoVR-R challenge.

MHMamba: Multi-Head Mamba for 3D Brain Tumor Segmentation

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.

RankVR: Low-Rank Structure Perception and Value Recalibration for Robust Composed Image Retrieval

cs.CV · 2026-06-10 · unverdicted · novelty 4.0

RankVR introduces GSCP and ASVC modules to improve CIR robustness by decoupling clean samples via low-rank structure and dynamically scoring triplet value in noisy datasets.

IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval

cs.CV · 2026-06-06 · unverdicted · novelty 4.0

IMAGINE uses adaptive schema-imagery via dynamic multimodal prototypes to incorporate implicit semantics into composed video retrieval, claiming SOTA results on CVR and CIR benchmarks.

Semantic Granularity Navigation in Image Editing

cs.CV · 2026-05-20 · unverdicted · novelty 4.0

NaviEdit is a training-free inference-time controller that decouples edit progress from model scale traversal in diffusion-based image editing via self-consistency, reporting average gains across editors and backbones.

Mitigating Hallucination on Hallucination in RAG via Ensemble Voting

cs.CL · 2026-03-28 · unverdicted · novelty 4.0

VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks with a parallelizable design.

Hermes: A Multi-Scale Spatial-Temporal Hypergraph Network for Stock Time Series Forecasting

cs.LG · 2025-09-28 · unverdicted · novelty 4.0

Hermes is a multi-scale spatial-temporal hypergraph network that improves stock forecasting accuracy by capturing inter-industry lead-lag dependencies and fusing information across scales.

On the Controllability-Fidelity Frontier in Diffusion Editing

cs.GR · 2026-06-05 · unverdicted · novelty 3.0

A study deriving mathematical formulations and bounds for diffusion editing objectives while empirically comparing methods on fidelity and control metrics and discussing ethical issues.

EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge

cs.CV · 2026-05-23 · unverdicted · novelty 3.0

EgoAdapt improves VQA on the HD-EPIC egocentric benchmark via category-conditioned routing, calibrated option scoring, and test-time consistency adaptation.

EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026

cs.CV · 2026-05-23 · unverdicted · novelty 3.0

EgoAction uses decoupled verb-noun temporal detectors on VideoMAE features and Dynamic Weighted Fusion of boundaries based on classification confidences for the EPIC-KITCHENS action detection challenge.

OmniEgo-R$^2$: A Routed Reasoning Framework for the 1st Cross-Domain EgoCross Challenge at CVPR 2026

cs.CV · 2026-05-23 · unverdicted · novelty 3.0

OmniEgo-R² is a competition system that combines domain-specific VL models with temporal normalization, capability routing, and answer calibration to reach 66.35-66.77% accuracy on the EgoCross challenge.

TempRet: Temporal Enhancement and Two-Stage Reranking for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge

cs.CV · 2026-05-23 · unverdicted · novelty 3.0

TempRet enhances a CLIP dual-encoder with temporal modeling and two-stage reranking to report 67.97% mAP and 82.92% nDCG on the EK-100 MIR benchmark.

citing papers explorer

Showing 2 of 2 citing papers after filters.

AgentIAD: Agentic Industrial Anomaly Detection via Adaptive Memory Augmentation cs.CV · 2025-12-15 · unverdicted · none · ref 27
AgentIAD introduces an agentic VLM with Perceptive Zoomer, Web Searcher, and Comparative Retriever tools plus two-stage SFT-then-RL training, achieving 5.92% higher classification accuracy than prior SOTA on the MMAD benchmark.
Hermes: A Multi-Scale Spatial-Temporal Hypergraph Network for Stock Time Series Forecasting cs.LG · 2025-09-28 · unverdicted · none · ref 31
Hermes is a multi-scale spatial-temporal hypergraph network that improves stock forecasting accuracy by capturing inter-industry lead-lag dependencies and fusing information across scales.

Finecir: Explicit parsing of fine- grained modification semantics for composed image re- trieval.https://arxiv.org/abs/2503.21309

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer