arxiv: 2605.07766 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Head Similarity: Modeling Structured Whole-Head Appearance Beyond Face Recognition

Yingfeng Wang , Yuxuan Xiao , Shengcai Liao

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords head similaritywhole-head appearanceface recognitionappearance variationidentity consistencyhierarchical supervisionvideo benchmarkweakly-supervised

0 comments

The pith

Head Similarity extends face recognition to model structured whole-head appearance variations including hairstyle and styling changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that standard face recognition models, by forcing all images of a person into one invariant representation, lose information about changeable appearance features like hair or accessories. This limits their usefulness in scenarios where identity must be consistent despite such changes or when faces are not visible. The authors propose Head Similarity as a way to explicitly model these variations through hierarchical ordering of similarities at both identity and appearance levels. They support this with a new benchmark built from video data using weak supervision on appearance states and a training framework that combines identity and appearance objectives.

Core claim

Head Similarity is a formulation that extends identity-centric recognition to structured whole-head similarity modeling by capturing intra-identity appearance variation and enforcing hierarchical similarity ordering across identity and appearance states, demonstrated feasible via a framework using hierarchical supervision and identity-aware distillation on a video-derived benchmark.

What carries the argument

The Head Similarity formulation, which explicitly captures intra-identity appearance variation and enforces hierarchical similarity ordering across identity and appearance states.

If this is right

Meaningful similarity comparisons remain possible even under occlusion or rear-view conditions where facial cues are absent.
Conventional face recognition models are shown to fail at capturing appearance-dependent similarity.
Applications requiring identity consistency beyond strict biometric recognition can use whole-head cues.
A large-scale benchmark from long-form videos enables training for diverse poses and temporal changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such models could improve person re-identification in videos with frequent appearance changes.
Embedding spaces might need to represent multiple appearance states per identity rather than single points.
Future work could test generalization to real-world surveillance footage without video-based weak labels.

Load-bearing premise

A large-scale benchmark from long-form videos with weakly-supervised appearance states sufficiently captures diverse poses, occlusions, and temporal changes to train effective models.

What would settle it

A standard face recognition model trained on the same benchmark achieves comparable accuracy on tasks measuring appearance-dependent similarity and hierarchical ordering as the proposed Head Similarity framework.

Figures

Figures reproduced from arXiv: 2605.07766 by Shengcai Liao, Yingfeng Wang, Yuxuan Xiao.

**Figure 1.** Figure 1: Failure cases of AdaFace on whole-head similarity. The goal is not to verify legal identity, but to preserve the perception that the sequence depicts the same person. Conventional face recognition is designed for a different objective. Modern systems learn identity-invariant embeddings with margin-based metric learning losses Deng et al. [2019], Kim et al. [2022], deliberately suppressing intra-identity… view at source ↗

**Figure 2.** Figure 2: Conceptual comparison between identity-centric face recognition and our proposed Head [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the hierarchical similarity structure [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Overall training framework for Head Similarity. A dual-CLS Vision Transformer backbone [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Pipeline of the Head Similarity dataset construction. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: ROC curves under aligned and wholehead inputs. We analyze the effect of adapting face-recognition models from aligned-face inputs to unaligned whole-head images. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: ROC curves comparison on the HeadSim-Head dataset. As reported in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Top-3 retrieval results on the HeadSim-Head test set for AdaFace and our method under [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: ROC curves on HeadSim-Head for different configurations. Dual-CLS consistently outperforms other variants. To analyze the conflict between identity invariance and appearance-sensitive similarity, we evaluate different architectural variants and loss assignments, as summarized in [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearance variations such as hairstyle or styling changes into a single representation, limiting their use in appearance-sensitive scenarios. To address this limitation, we introduce Head Similarity, a new formulation that extends identity-centric recognition to structured whole-head similarity modeling. Our approach explicitly captures intra-identity appearance variation and enforces hierarchical similarity ordering across identity and appearance states, enabling meaningful comparison even under occlusion or rear-view conditions. We construct a large-scale benchmark from long-form videos with weakly-supervised appearance states, covering diverse poses, occlusions, and temporal changes. As a first step, we develop a simple yet effective framework that jointly models identity discrimination and appearance-sensitive similarity through hierarchical supervision and identity-aware distillation. Experiments show that conventional face recognition models fail to capture appearance-dependent similarity, while our approach demonstrates the feasibility of structured whole-head similarity modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines Head Similarity as a structured extension beyond face recognition but its claims rest on unvalidated weak labels from video.

read the letter

The main takeaway is that this work identifies a real limitation in standard face recognition—its tendency to collapse hairstyle, styling, and occlusion variations into one identity vector—and proposes Head Similarity to model whole-head appearance with explicit hierarchical ordering across identity and appearance states. They release a benchmark built from long-form videos and a simple framework using hierarchical supervision plus identity-aware distillation to enforce that structure. This is a clean, practical step for applications like surveillance or video editing where non-frontal or partial views matter. The formulation itself is new enough and the motivation is grounded in existing face-rec shortcomings. The soft spot is the reliance on weakly-supervised appearance-state labels. The abstract gives no validation of those labels, no noise analysis, and no inter-annotator checks, so it is unclear whether the reported gaps versus baselines come from the modeling or from annotation artifacts under pose and occlusion. Experiments are described only at a high level with no metrics, ablations, or error bars visible here. This paper is for computer-vision researchers who need identity consistency beyond strict biometrics in video settings. A reader working on robust head modeling or appearance-aware tracking would find the benchmark and hierarchy idea worth examining. It deserves peer review because the problem is well-posed and the contribution is distinct, even if the data side needs tightening to be convincing.

Referee Report

2 major / 2 minor

Summary. The paper introduces Head Similarity as a new formulation extending identity-centric face recognition to structured whole-head similarity modeling that explicitly captures intra-identity appearance variations (e.g., hairstyle, styling, occlusion). It constructs a large-scale benchmark from long-form videos using weakly-supervised appearance state labels and proposes a simple framework combining hierarchical supervision with identity-aware distillation. Experiments are presented to show that conventional face recognition collapses appearance variation while the proposed approach demonstrates feasibility of appearance-dependent similarity under diverse poses and views.

Significance. If the central claims hold after addressing validation gaps, the work could meaningfully advance computer vision applications needing nuanced identity consistency beyond biometrics, such as video-based re-identification or non-frontal analysis. The benchmark and hierarchical supervision idea provide a concrete starting point for future research on appearance-sensitive modeling. Credit is due for framing the problem clearly and releasing a new data resource, though the significance is tempered by the absence of label-quality diagnostics that would allow readers to trust the reported gaps versus baselines.

major comments (2)

[§3] §3 (Benchmark Construction): The weakly-supervised appearance state labels extracted from long-form videos are load-bearing for the hierarchical similarity ordering and all downstream claims, yet the section provides no quantitative validation (e.g., label accuracy vs. manual annotation, inter-state consistency under pose variation, or noise-robustness checks). Without such evidence, it remains possible that performance differences versus face-recognition baselines arise from label artifacts rather than the modeling approach.
[§5] §5 (Experiments): The claim that conventional face recognition models fail to capture appearance-dependent similarity while the proposed method succeeds is central, but the reported results lack concrete metrics, error bars, statistical significance tests, or ablation isolating the contribution of hierarchical supervision versus identity-aware distillation. This makes it difficult to evaluate whether the feasibility demonstration is robust.

minor comments (2)

[Abstract] Abstract: The high-level description of the benchmark and framework is clear, but adding one sentence on dataset scale (number of identities, videos, and appearance states) would help readers gauge its coverage of pose/occlusion diversity.
[Method] Notation: The distinction between identity discrimination loss and appearance-sensitive similarity loss could be clarified with a short equation or diagram in the method section to avoid ambiguity for readers unfamiliar with distillation setups.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which has helped us identify areas to strengthen the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [§3] §3 (Benchmark Construction): The weakly-supervised appearance state labels extracted from long-form videos are load-bearing for the hierarchical similarity ordering and all downstream claims, yet the section provides no quantitative validation (e.g., label accuracy vs. manual annotation, inter-state consistency under pose variation, or noise-robustness checks). Without such evidence, it remains possible that performance differences versus face-recognition baselines arise from label artifacts rather than the modeling approach.

Authors: We acknowledge that explicit validation of the weakly-supervised labels is essential for establishing trust in the benchmark. The labels are generated via a temporal consistency and clustering pipeline applied to long-form video tracks, but the current manuscript does not include quantitative diagnostics. In the revised version, we will add a new subsection under §3 that reports: (i) agreement metrics (accuracy, Cohen’s kappa) on a manually annotated subset of 1,000 randomly sampled tracks stratified by pose and occlusion; (ii) inter-state consistency analysis by computing intra- and inter-state similarity distributions under frontal vs. non-frontal views; and (iii) a noise-robustness check by injecting controlled label flips and re-running key experiments. These additions will allow readers to assess whether performance gaps reflect modeling improvements rather than label artifacts. revision: yes
Referee: [§5] §5 (Experiments): The claim that conventional face recognition models fail to capture appearance-dependent similarity while the proposed method succeeds is central, but the reported results lack concrete metrics, error bars, statistical significance tests, or ablation isolating the contribution of hierarchical supervision versus identity-aware distillation. This makes it difficult to evaluate whether the feasibility demonstration is robust.

Authors: We agree that the experimental presentation requires greater rigor to support the central claims. In the revision we will: (1) report all similarity metrics with error bars computed over five independent training runs using different random seeds; (2) include paired statistical significance tests (e.g., t-tests with p-values) comparing our method against each baseline; (3) add a dedicated ablation table that isolates hierarchical supervision (by removing the appearance-state ordering loss) and identity-aware distillation (by removing the distillation term) while keeping all other components fixed; and (4) expand the metric suite to include mean average precision and rank-1 accuracy in addition to the current similarity scores. These changes will make the feasibility demonstration more robust and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: new formulation and framework with no derivations or self-referential reductions

full rationale

The paper introduces Head Similarity as a new formulation extending face recognition to structured whole-head modeling, constructs a benchmark from long-form videos using weakly-supervised appearance states, and proposes a framework with hierarchical supervision plus identity-aware distillation. No equations, parameter fittings, predictions, or derivations are present in the abstract or described approach. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. The central claim is a feasibility demonstration via experiments comparing to conventional models, which remains independent of any input reduction or self-definition. This qualifies as a self-contained new-task proposal with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no mathematical details, so no free parameters, axioms, or invented entities can be identified; the work relies on standard deep learning practices and a new benchmark construction approach at a conceptual level.

pith-pipeline@v0.9.0 · 5468 in / 1194 out tokens · 132356 ms · 2026-05-11T02:51:08.051073+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Head Similarity requires the hierarchical ordering sθ(xi,xj)>sθ(xi,xk)>sθ(xi,xℓ) for (i,j)∈R1, (i,k)∈R2, (i,ℓ)∈R3
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt a dual-CLS Vision Transformer... Lsim = Softplus(m1 + san1 − sap) + ...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

Partial fc: Training 10 million identities on a single machine

Xiang An, Xuhan Zhu, Yuan Gao, Yang Xiao, Yongle Zhao, Ziyong Feng, Lan Wu, Bin Qin, Ming Zhang, Debing Zhang, et al. Partial fc: Training 10 million identities on a single machine. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1445--1449, 2021

work page 2021
[2]

Vggface2: A dataset for recognising faces across pose and age

Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 67--74. IEEE, 2018

work page 2018
[3]

Hairnerf: Geometry-aware image synthesis for hairstyle transfer

Seunggyu Chang, Gihoon Kim, and Hayeon Kim. Hairnerf: Geometry-aware image synthesis for hairstyle transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2448--2458, 2023

work page 2023
[5]

Arcface: Additive angular margin loss for deep face recognition

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690--4699, 2019

work page 2019
[6]

Retinaface: Single-shot multi-level face localisation in the wild

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5203--5212, 2020

work page 2020
[8]

Hyperbolic metric learning for visual outlier detection

Alvaro Gonzalez-Jimenez, Simone Lionetti, Dena Bazazian, Philippe Gottfrois, Fabian Gr \"o ger, Alexander Navarini, and Marc Pouly. Hyperbolic metric learning for visual outlier detection. In European Conference on Computer Vision, pages 327--344. Springer, 2024

work page 2024
[9]

Clothes-changing person re-identification with rgb modality only

Xinqian Gu, Hong Chang, Bingpeng Ma, Shutao Bai, Shiguang Shan, and Xilin Chen. Clothes-changing person re-identification with rgb modality only. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1060--1069, 2022

work page 2022
[10]

Dimensionality reduction by learning an invariant mapping

Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), volume 2, pages 1735--1742. IEEE, 2006

work page 2006
[11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016

work page 2016
[12]

Transreid: Transformer-based object re-identification

Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15013--15022, 2021

work page 2021
[13]

Head360: Learning a parametric 3d full-head for free-view synthesis in 360 ^

Yuxiao He, Yiyu Zhuang, Yanwen Wang, Yao Yao, Siyu Zhu, Xiaoyu Li, Qi Zhang, Xun Cao, and Hao Zhu. Head360: Learning a parametric 3d full-head for free-view synthesis in 360 ^ . In European Conference on Computer Vision, pages 254--272. Springer, 2024

work page 2024
[14]

Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis

Rui Huang, Shu Zhang, Tianyu Li, and Ran He. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE international conference on computer vision, pages 2439--2448, 2017

work page 2017
[15]

Supervised contrastive learning

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. Advances in neural information processing systems, 33: 0 18661--18673, 2020

work page 2020
[16]

Adaface: Quality adaptive margin for face recognition

Minchul Kim, Anil K Jain, and Xiaoming Liu. Adaface: Quality adaptive margin for face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18750--18759, 2022

work page 2022
[17]

Hier: Metric learning beyond class labels via hierarchical regularization

Sungyeon Kim, Boseung Jeong, and Suha Kwak. Hier: Metric learning beyond class labels via hierarchical regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19903--19912, 2023

work page 2023
[18]

Partial face recognition: Alignment-free approach

Shengcai Liao, Anil K Jain, and Stan Z Li. Partial face recognition: Alignment-free approach. IEEE Transactions on pattern analysis and machine intelligence, 35 0 (5): 0 1193--1205, 2012

work page 2012
[19]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll \'a r, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740--755. Springer, 2014

work page 2014
[20]

No fuss distance metric learning using proxies

Yair Movshovitz-Attias, Alexander Toshev, Thomas K Leung, Sergey Ioffe, and Saurabh Singh. No fuss distance metric learning using proxies. In Proceedings of the IEEE international conference on computer vision, pages 360--368, 2017

work page 2017
[21]

Long-term cloth-changing person re-identification

Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, and Xiangyang Xue. Long-term cloth-changing person re-identification. In Proceedings of the Asian conference on computer vision, 2020

work page 2020
[23]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815--823, 2015

work page 2015
[24]

First order motion model for image animation

Aliaksandr Siarohin, St \'e phane Lathuili \`e re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. First order motion model for image animation. Advances in neural information processing systems, 32, 2019

work page 2019
[25]

Everybody’s talkin’: Let me talk as you want

Linsen Song, Wayne Wu, Chen Qian, Ran He, and Chen Change Loy. Everybody’s talkin’: Let me talk as you want. IEEE Transactions on Information Forensics and Security, 17: 0 585--598, 2022

work page 2022
[26]

Learning part-based convolutional features for person re-identification

Yifan Sun, Liang Zheng, Yali Li, Yi Yang, Qi Tian, and Shengjin Wang. Learning part-based convolutional features for person re-identification. IEEE transactions on pattern analysis and machine intelligence, 43 0 (3): 0 902--917, 2019

work page 2019
[27]

Disentangled representation learning gan for pose-invariant face recognition

Luan Tran, Xi Yin, and Xiaoming Liu. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1415--1424, 2017

work page 2017
[28]

Video abstraction: A systematic review and classification

Ba Tu Truong and Svetha Venkatesh. Video abstraction: A systematic review and classification. ACM transactions on multimedia computing, communications, and applications (TOMM), 3 0 (1): 0 3--es, 2007

work page 2007
[29]

Occlusion robust face recognition based on mask learning

Weitao Wan and Jiansheng Chen. Occlusion robust face recognition based on mask learning. In 2017 IEEE international conference on image processing (ICIP), pages 3795--3799. IEEE, 2017

work page 2017
[30]

Learning discriminative features with multiple granularities for person re-identification

Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. Learning discriminative features with multiple granularities for person re-identification. In Proceedings of the 26th ACM international conference on Multimedia, pages 274--282, 2018 a

work page 2018
[31]

Cosface: Large margin cosine loss for deep face recognition

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5265--5274, 2018 b

work page 2018
[34]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Arcface: Additive angular margin loss for deep face recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[35]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Cosface: Large margin cosine loss for deep face recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[36]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Adaface: Quality adaptive margin for face recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[37]

Qwen3-Omni Technical Report

Qwen3-omni technical report , author=. arXiv preprint arXiv:2509.17765 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[38]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[39]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010
[40]

IEEE Transactions on pattern analysis and machine intelligence , volume=

Partial face recognition: Alignment-free approach , author=. IEEE Transactions on pattern analysis and machine intelligence , volume=. 2012 , publisher=

work page 2012
[41]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Partial fc: Training 10 million identities on a single machine , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[42]

IEEE transactions on pattern analysis and machine intelligence , volume=

Learning part-based convolutional features for person re-identification , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=

work page 2019
[43]

Proceedings of the 26th ACM international conference on Multimedia , pages=

Learning discriminative features with multiple granularities for person re-identification , author=. Proceedings of the 26th ACM international conference on Multimedia , pages=

work page
[44]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Transreid: Transformer-based object re-identification , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[45]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Disentangled representation learning gan for pose-invariant face recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[46]

Proceedings of the IEEE international conference on computer vision , pages=

Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[47]

2017 IEEE international conference on image processing (ICIP) , pages=

Occlusion robust face recognition based on mask learning , author=. 2017 IEEE international conference on image processing (ICIP) , pages=. 2017 , organization=

work page 2017
[48]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Facenet: A unified embedding for face recognition and clustering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[49]

2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06) , volume=

Dimensionality reduction by learning an invariant mapping , author=. 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06) , volume=. 2006 , organization=

work page 2006
[50]

Proceedings of the IEEE international conference on computer vision , pages=

No fuss distance metric learning using proxies , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[51]

Advances in neural information processing systems , volume=

Supervised contrastive learning , author=. Advances in neural information processing systems , volume=

work page
[52]

V oxceleb2: Deep speaker recognition,

Voxceleb2: Deep speaker recognition , author=. arXiv preprint arXiv:1806.05622 , year=

work page arXiv
[53]

2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) , pages=

Vggface2: A dataset for recognising faces across pose and age , author=. 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) , pages=. 2018 , organization=

work page 2018
[54]

Y., Xu, Y

Treelora: Efficient continual learning via layer-wise loras guided by a hierarchical gradient-similarity tree , author=. arXiv preprint arXiv:2506.10355 , year=

work page arXiv
[55]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Open-ended hierarchical streaming video understanding with vision language models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[56]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Hier: Metric learning beyond class labels via hierarchical regularization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[57]

European Conference on Computer Vision , pages=

Hyperbolic metric learning for visual outlier detection , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[58]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Hairnerf: Geometry-aware image synthesis for hairstyle transfer , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[59]

European Conference on Computer Vision , pages=

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360 ^ , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[60]

arXiv preprint arXiv:2106.11297 , year=

Tokenlearner: What can 8 learned tokens do for images and videos? , author=. arXiv preprint arXiv:2106.11297 , year=

work page arXiv
[61]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Videomae v2: Scaling video masked autoencoders with dual masking , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[62]

Video-to-video synthesis

Video-to-video synthesis , author=. arXiv preprint arXiv:1808.06601 , year=

work page arXiv
[63]

IEEE Transactions on Information Forensics and Security , volume=

Everybody’s talkin’: Let me talk as you want , author=. IEEE Transactions on Information Forensics and Security , volume=. 2022 , publisher=

work page 2022
[64]

Advances in neural information processing systems , volume=

First order motion model for image animation , author=. Advances in neural information processing systems , volume=

work page
[65]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Clothes-changing person re-identification with rgb modality only , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[66]

Proceedings of the Asian conference on computer vision , year=

Long-term cloth-changing person re-identification , author=. Proceedings of the Asian conference on computer vision , year=

work page
[67]

ACM transactions on multimedia computing, communications, and applications (TOMM) , volume=

Video abstraction: A systematic review and classification , author=. ACM transactions on multimedia computing, communications, and applications (TOMM) , volume=. 2007 , publisher=

work page 2007
[68]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Retinaface: Single-shot multi-level face localisation in the wild , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[69]

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

work page 2014