STFER uses LVLM-generated identity-consistent semantic text to drive visual token filtering and expert routing for improved any-time person re-identification under clothing changes and modality shifts.
When large vision-language models meet person re-identification
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A multi-view semantic reformulation and feature compensation method using LLMs and VLMs improves text-to-image person retrieval accuracy without training and reaches SOTA on three datasets.
citing papers explorer
-
Beyond Visual Cues: Semantic-Driven Token Filtering and Expert Routing for Anytime Person ReID
STFER uses LVLM-generated identity-consistent semantic text to drive visual token filtering and expert routing for improved any-time person re-identification under clothing changes and modality shifts.
-
Towards Robust Text-to-Image Person Retrieval: Multi-View Reformulation for Semantic Compensation
A multi-view semantic reformulation and feature compensation method using LLMs and VLMs improves text-to-image person retrieval accuracy without training and reaches SOTA on three datasets.