Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-Identification

Haijun Liu; Jian Cheng

arxiv: 1907.09659 · v1 · pith:NWKWLBE6new · submitted 2019-07-23 · 💻 cs.CV

Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-Identification

Haijun Liu , Jian Cheng This is my paper

Pith reviewed 2026-05-24 18:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords visible-thermal person re-identificationcross-modality discrepancyintra-modality variationsskip connectiondual-modality triplet losstwo-stream CNNdiscriminative feature learning

0 comments

The pith

Skip connections for mid-level features plus a dual-modality triplet loss enhance discriminative learning in visible-thermal person re-identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets visible-thermal cross-modality person re-identification, a setting where visible cameras fail at night and thermal images must be matched to visible ones despite large appearance shifts. It introduces two minimal additions inside a two-stream CNN: skip connections that fold mid-level features into the final representation, and a dual-modality triplet loss that penalizes both cross-modality and within-modality distances at the same time. These changes are meant to produce person features that remain stable across modalities while still separating different identities. If the additions work, they would let surveillance systems keep matching people reliably across day and night without heavier models or extra data. Experiments on two public datasets report large gains over prior methods.

Core claim

A two-stream CNN equipped with skip connections that incorporate mid-level features and trained with a dual-modality triplet loss reduces both cross-modality discrepancy and intra-modality variations, yielding person features that are more discriminative and robust for visible-thermal re-identification.

What carries the argument

The EDFL method that adds skip connections for mid-level feature incorporation and a dual-modality triplet loss inside a two-stream CNN.

If this is right

Mid-level features passed via skip connections add robustness that high-level features alone do not provide across modalities.
The dual-modality triplet loss simultaneously shrinks distances between visible and thermal images of the same person and expands distances between different people within each modality.
A two-stream architecture produces shared features usable by both visible and thermal inputs.
The combined changes produce measurable accuracy lifts on existing visible-thermal benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The emphasis on mid-level features suggests that identity cues useful across modalities sit at intermediate depths rather than only at the deepest layers.
The same pair of modifications could be tested on other cross-modal re-identification pairs such as RGB-infrared or visible-depth without changing the overall training recipe.
If the dual triplet loss proves decisive, future work could explore weighting the cross-modality and intra-modality terms separately on different datasets.

Load-bearing premise

These two lightweight changes alone will close the modality gap and variation problems without needing deeper redesigns or more training data.

What would settle it

Running the same network on a new visible-thermal dataset where accuracy gains shrink to within a few percent of the best prior method would show the enhancements are not generally sufficient.

Figures

Figures reproduced from arXiv: 1907.09659 by Haijun Liu, Jian Cheng.

**Figure 2.** Figure 2: The location visualization of different stages of ResNet50 [7] model [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The proposed EDFL framework for VT Re-ID. Two-stream CNN structure is adopted to extract person features, one stream for visible images and [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: The performances of our proposed enhancing discriminative feature [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Existing person re-identification has achieved great progress in the visible domain, capturing all the person images with visible cameras. However, in a 24-hour intelligent surveillance system, the visible cameras may be noneffective at night. In this situation, thermal cameras are the best supplemental components, which capture images without depending on visible light. Therefore, in this paper, we investigate the visible-thermal cross-modality person re-identification (VT Re-ID) problem. In VT Re-ID, there are two knotty problems should be well handled, cross-modality discrepancy and intra-modality variations. To address these two issues, we propose focusing on enhancing the discriminative feature learning (EDFL) with two extreme simple means from two core aspects, (1) skip-connection for mid-level features incorporation to improve the person features with more discriminability and robustness, and (2) dual-modality triplet loss to guide the training procedures by simultaneously considering the cross-modality discrepancy and intra-modality variations. Additionally, the two-stream CNN structure is adopted to learn the multi-modality sharable person features. The experimental results on two datasets show that our proposed EDFL approach distinctly outperforms state-of-the-art methods by large margins, demonstrating the effectiveness of our EDFL to enhance the discriminative feature learning for VT Re-ID.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds skip connections on mid-level features and a dual-modality triplet loss to a two-stream CNN for visible-thermal Re-ID, but the gains are hard to attribute without ablations.

read the letter

The main move here is taking the standard two-stream setup for visible-thermal person re-identification and layering on two targeted fixes: skip connections to pull in more mid-level features, and a triplet loss that operates across both modalities at once. The dual-modality loss tries to shrink the gap between visible and thermal images of the same person while also handling variation inside each modality. That combination is the concrete thing the authors put forward, and it is presented as a lightweight way to improve discriminability without heavy new architecture.

Referee Report

1 major / 0 minor

Summary. The paper proposes an EDFL method for visible-thermal cross-modality person re-identification that adopts a two-stream CNN backbone, adds skip-connections to incorporate mid-level features for improved discriminability, and introduces a dual-modality triplet loss to jointly address cross-modality discrepancy and intra-modality variations. The central claim, stated in the abstract and §4, is that these two simple enhancements enable the method to distinctly outperform state-of-the-art approaches by large margins on two datasets.

Significance. If the reported gains are reproducible and attributable to the proposed components rather than the backbone or training protocol, the work would be significant for 24-hour surveillance applications, as it suggests that lightweight architectural and loss modifications can mitigate modality gaps without complex models or extra data.

major comments (1)

[§4] §4 (Experiments) and associated tables: no ablation studies are presented that isolate the contribution of the mid-level skip-connection or the dual-modality triplet loss (e.g., full EDFL vs. two-stream baseline with standard triplet loss). Without these controls, the claim that the large margins over SOTA are caused by the two proposed enhancements cannot be verified and remains load-bearing for the paper's central assertion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the major comment below and will incorporate the suggested changes in the revised manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments) and associated tables: no ablation studies are presented that isolate the contribution of the mid-level skip-connection or the dual-modality triplet loss (e.g., full EDFL vs. two-stream baseline with standard triplet loss). Without these controls, the claim that the large margins over SOTA are caused by the two proposed enhancements cannot be verified and remains load-bearing for the paper's central assertion.

Authors: We agree that the manuscript would benefit from explicit ablation studies to isolate the contributions of the mid-level skip-connections and the dual-modality triplet loss. The current experiments focus on overall performance against state-of-the-art methods but do not include direct controls such as a two-stream baseline with standard triplet loss or variants without skip-connections. In the revision, we will add these ablation experiments to the tables in §4, which will allow verification that the reported gains are attributable to the proposed components rather than the backbone or training protocol alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical method (two-stream CNN + mid-level skip connections + dual-modality triplet loss) for VT Re-ID and supports its claims solely via reported performance on two external datasets. No equations, derivations, or predictions are present that reduce to inputs by construction; no self-citations are invoked as load-bearing uniqueness theorems; and the approach does not rename known results or smuggle ansatzes. The derivation chain is therefore self-contained empirical evaluation rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; standard deep learning assumptions apply but are not detailed.

pith-pipeline@v0.9.0 · 5764 in / 1125 out tokens · 28896 ms · 2026-05-24T18:09:08.515496+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 3 internal anchors

[1]

Collective deep quantization for efﬁcient cross-modal retrieval,

Y . Cao, M. Long, J. Wang, and S. Liu, “Collective deep quantization for efﬁcient cross-modal retrieval,” in AAAI, 2017

work page 2017
[2]

Multi-level factorisation net for person re-identiﬁcation,

X. Chang, T. M. Hospedales, and T. Xiang, “Multi-level factorisation net for person re-identiﬁcation,” in CVPR, 2018

work page 2018
[3]

Person re-identiﬁcation by camera correlation aware feature augmentation,

Y .-C. Chen, X. Zhu, W.-S. Zheng, and J.-H. Lai, “Person re-identiﬁcation by camera correlation aware feature augmentation,” IEEE TPAMI , vol. 40, no. 2, pp. 392–408, 2018

work page 2018
[4]

Towards cycle-consistent models for text and image retrieval,

M. Cornia, L. Baraldi, H. R. Tavakoli, and R. Cucchiara, “Towards cycle-consistent models for text and image retrieval,” in ECCV, 2018, pp. 687–691

work page 2018
[5]

Cross-modality person re-identiﬁcation with generative adversarial training

P. Dai, R. Ji, H. Wang, Q. Wu, and Y . Huang, “Cross-modality person re-identiﬁcation with generative adversarial training.” in IJCAI, 2018, pp. 677–683

work page 2018
[6]

Mutual component convolutional neural networks for heterogeneous face recognition,

Z. Deng, X. Peng, Z. Li, and Y . Qiao, “Mutual component convolutional neural networks for heterogeneous face recognition,” IEEE TIP , 2019

work page 2019
[7]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778

work page 2016
[8]

Learning invariant deep represen- tation for nir-vis face recognition,

R. He, X. Wu, Z. Sun, and T. Tan, “Learning invariant deep represen- tation for nir-vis face recognition,” in AAAI, 2017

work page 2017
[9]

In Defense of the Triplet Loss for Person Re-Identification

A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identiﬁcation,” arXiv preprint arXiv:1703.07737 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

A systematic evaluation and benchmark for person re- identiﬁcation: Features, metrics, and datasets,

S. Karanam, M. Gou, Z. Wu, A. Rates-Borras, O. Camps, and R. J. Radke, “A systematic evaluation and benchmark for person re- identiﬁcation: Features, metrics, and datasets,” IEEE TPAMI, 2018

work page 2018
[11]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in NeurIPS, 2012, pp. 1097– 1105

work page 2012
[13]

Harmonious attention network for person re-identiﬁcation,

W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identiﬁcation,” in CVPR, 2018, pp. 2285–2294

work page 2018
[14]

Gallery based k-reciprocal-like re-ranking for heavy cross-camera discrepancy in person re-identiﬁcation,

H. Liu and J. Cheng, “Gallery based k-reciprocal-like re-ranking for heavy cross-camera discrepancy in person re-identiﬁcation,” Neurocom- puting, vol. 333, pp. 64–75, 2019

work page 2019
[15]

Hydraplus-net: Attentive deep features for pedestrian analysis,

X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in ICCV, 2017, pp. 350–359

work page 2017
[16]

Person recognition system based on a combination of body images from visible light and thermal cameras,

D. Nguyen, H. Hong, K. Kim, and K. Park, “Person recognition system based on a combination of body images from visible light and thermal cameras,” Sensors, vol. 17, no. 3, p. 605, 2017

work page 2017
[17]

Deep heterogeneous face recognition networks based on cross-modal distillation and an equitable distance metric,

C. Reale, H. Lee, and H. Kwon, “Deep heterogeneous face recognition networks based on cross-modal distillation and an equitable distance metric,” in ICCV Workshops, 2017, pp. 32–38

work page 2017
[18]

Facenet: A uniﬁed embed- ding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embed- ding for face recognition and clustering,” in CVPR, 2015, pp. 815–823

work page 2015
[19]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626

work page 2017
[20]

Pose-driven deep convolutional model for person re-identiﬁcation,

C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identiﬁcation,” in ICCV, 2017, pp. 3980–3989

work page 2017
[21]

Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline),

Y . Sun, L. Zheng, Y . Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline),” in ECCV, 2018, pp. 501–518

work page 2018
[22]

Inception-v4, inception-resnet and the impact of residual connections on learning

C. Szegedy, S. Ioffe, V . Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning.” in AAAI, vol. 4, 2017, p. 12

work page 2017
[23]

Mancs: A multi-task attentional network with curriculum sampling for person re- identiﬁcation,

C. Wang, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Mancs: A multi-task attentional network with curriculum sampling for person re- identiﬁcation,” in ECCV, 2018, pp. 384–400

work page 2018
[24]

Learning discriminative features with multiple granularities for person re-identiﬁcation,

G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identiﬁcation,” ACM MM, 2018

work page 2018
[25]

Learning two-branch neural networks for image-text matching tasks,

L. Wang, Y . Li, J. Huang, and S. Lazebnik, “Learning two-branch neural networks for image-text matching tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 41, no. 2, pp. 394–407, 2019

work page 2019
[26]

Learn- ing to reduce dual-level discrepancy for infrared-visible person re- identiﬁcation,

Z. Wang, Z. Wang, Y . Zheng, Y .-Y . Chuang, and S. Satoh, “Learn- ing to reduce dual-level discrepancy for infrared-visible person re- identiﬁcation,” in CVPR, 2019, pp. 618–626

work page 2019
[27]

Rgb-infrared cross-modality person re-identiﬁcation,

A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identiﬁcation,” in ICCV, 2017, pp. 5380–5389

work page 2017
[28]

Coupled deep learning for heterogeneous face recognition,

X. Wu, L. Song, R. He, and T. Tan, “Coupled deep learning for heterogeneous face recognition,” in AAAI, 2018

work page 2018
[29]

Hierarchical discriminative learning for visible thermal person re-identiﬁcation,

M. Ye, X. Lan, J. Li, and P. C. Yuen, “Hierarchical discriminative learning for visible thermal person re-identiﬁcation,” in AAAI, 2018

work page 2018
[30]

Visible thermal person re-identiﬁcation via dual-constrained top-ranking

M. Ye, Z. Wang, X. Lan, and P. C. Yuen, “Visible thermal person re-identiﬁcation via dual-constrained top-ranking.” in IJCAI, 2018, pp. 1092–1099

work page 2018
[31]

Visualizing and understanding convolu- tional networks,

M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu- tional networks,” in ECCV, 2014, pp. 818–833

work page 2014
[32]

Person Re-identification: Past, Present and Future

L. Zheng, Y . Yang, and A. G. Hauptmann, “Person re-identiﬁcation: Past, present and future,” arXiv preprint arXiv:1610.02984 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Collective deep quantization for efﬁcient cross-modal retrieval,

Y . Cao, M. Long, J. Wang, and S. Liu, “Collective deep quantization for efﬁcient cross-modal retrieval,” in AAAI, 2017

work page 2017

[2] [2]

Multi-level factorisation net for person re-identiﬁcation,

X. Chang, T. M. Hospedales, and T. Xiang, “Multi-level factorisation net for person re-identiﬁcation,” in CVPR, 2018

work page 2018

[3] [3]

Person re-identiﬁcation by camera correlation aware feature augmentation,

Y .-C. Chen, X. Zhu, W.-S. Zheng, and J.-H. Lai, “Person re-identiﬁcation by camera correlation aware feature augmentation,” IEEE TPAMI , vol. 40, no. 2, pp. 392–408, 2018

work page 2018

[4] [4]

Towards cycle-consistent models for text and image retrieval,

M. Cornia, L. Baraldi, H. R. Tavakoli, and R. Cucchiara, “Towards cycle-consistent models for text and image retrieval,” in ECCV, 2018, pp. 687–691

work page 2018

[5] [5]

Cross-modality person re-identiﬁcation with generative adversarial training

P. Dai, R. Ji, H. Wang, Q. Wu, and Y . Huang, “Cross-modality person re-identiﬁcation with generative adversarial training.” in IJCAI, 2018, pp. 677–683

work page 2018

[6] [6]

Mutual component convolutional neural networks for heterogeneous face recognition,

Z. Deng, X. Peng, Z. Li, and Y . Qiao, “Mutual component convolutional neural networks for heterogeneous face recognition,” IEEE TIP , 2019

work page 2019

[7] [7]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778

work page 2016

[8] [8]

Learning invariant deep represen- tation for nir-vis face recognition,

R. He, X. Wu, Z. Sun, and T. Tan, “Learning invariant deep represen- tation for nir-vis face recognition,” in AAAI, 2017

work page 2017

[9] [9]

In Defense of the Triplet Loss for Person Re-Identification

A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identiﬁcation,” arXiv preprint arXiv:1703.07737 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

A systematic evaluation and benchmark for person re- identiﬁcation: Features, metrics, and datasets,

S. Karanam, M. Gou, Z. Wu, A. Rates-Borras, O. Camps, and R. J. Radke, “A systematic evaluation and benchmark for person re- identiﬁcation: Features, metrics, and datasets,” IEEE TPAMI, 2018

work page 2018

[11] [11]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in NeurIPS, 2012, pp. 1097– 1105

work page 2012

[13] [13]

Harmonious attention network for person re-identiﬁcation,

W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identiﬁcation,” in CVPR, 2018, pp. 2285–2294

work page 2018

[14] [14]

Gallery based k-reciprocal-like re-ranking for heavy cross-camera discrepancy in person re-identiﬁcation,

H. Liu and J. Cheng, “Gallery based k-reciprocal-like re-ranking for heavy cross-camera discrepancy in person re-identiﬁcation,” Neurocom- puting, vol. 333, pp. 64–75, 2019

work page 2019

[15] [15]

Hydraplus-net: Attentive deep features for pedestrian analysis,

X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in ICCV, 2017, pp. 350–359

work page 2017

[16] [16]

Person recognition system based on a combination of body images from visible light and thermal cameras,

D. Nguyen, H. Hong, K. Kim, and K. Park, “Person recognition system based on a combination of body images from visible light and thermal cameras,” Sensors, vol. 17, no. 3, p. 605, 2017

work page 2017

[17] [17]

Deep heterogeneous face recognition networks based on cross-modal distillation and an equitable distance metric,

C. Reale, H. Lee, and H. Kwon, “Deep heterogeneous face recognition networks based on cross-modal distillation and an equitable distance metric,” in ICCV Workshops, 2017, pp. 32–38

work page 2017

[18] [18]

Facenet: A uniﬁed embed- ding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embed- ding for face recognition and clustering,” in CVPR, 2015, pp. 815–823

work page 2015

[19] [19]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626

work page 2017

[20] [20]

Pose-driven deep convolutional model for person re-identiﬁcation,

C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identiﬁcation,” in ICCV, 2017, pp. 3980–3989

work page 2017

[21] [21]

Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline),

Y . Sun, L. Zheng, Y . Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline),” in ECCV, 2018, pp. 501–518

work page 2018

[22] [22]

Inception-v4, inception-resnet and the impact of residual connections on learning

C. Szegedy, S. Ioffe, V . Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning.” in AAAI, vol. 4, 2017, p. 12

work page 2017

[23] [23]

Mancs: A multi-task attentional network with curriculum sampling for person re- identiﬁcation,

C. Wang, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Mancs: A multi-task attentional network with curriculum sampling for person re- identiﬁcation,” in ECCV, 2018, pp. 384–400

work page 2018

[24] [24]

Learning discriminative features with multiple granularities for person re-identiﬁcation,

G. Wang, Y . Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identiﬁcation,” ACM MM, 2018

work page 2018

[25] [25]

Learning two-branch neural networks for image-text matching tasks,

L. Wang, Y . Li, J. Huang, and S. Lazebnik, “Learning two-branch neural networks for image-text matching tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 41, no. 2, pp. 394–407, 2019

work page 2019

[26] [26]

Learn- ing to reduce dual-level discrepancy for infrared-visible person re- identiﬁcation,

Z. Wang, Z. Wang, Y . Zheng, Y .-Y . Chuang, and S. Satoh, “Learn- ing to reduce dual-level discrepancy for infrared-visible person re- identiﬁcation,” in CVPR, 2019, pp. 618–626

work page 2019

[27] [27]

Rgb-infrared cross-modality person re-identiﬁcation,

A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identiﬁcation,” in ICCV, 2017, pp. 5380–5389

work page 2017

[28] [28]

Coupled deep learning for heterogeneous face recognition,

X. Wu, L. Song, R. He, and T. Tan, “Coupled deep learning for heterogeneous face recognition,” in AAAI, 2018

work page 2018

[29] [29]

Hierarchical discriminative learning for visible thermal person re-identiﬁcation,

M. Ye, X. Lan, J. Li, and P. C. Yuen, “Hierarchical discriminative learning for visible thermal person re-identiﬁcation,” in AAAI, 2018

work page 2018

[30] [30]

Visible thermal person re-identiﬁcation via dual-constrained top-ranking

M. Ye, Z. Wang, X. Lan, and P. C. Yuen, “Visible thermal person re-identiﬁcation via dual-constrained top-ranking.” in IJCAI, 2018, pp. 1092–1099

work page 2018

[31] [31]

Visualizing and understanding convolu- tional networks,

M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu- tional networks,” in ECCV, 2014, pp. 818–833

work page 2014

[32] [32]

Person Re-identification: Past, Present and Future

L. Zheng, Y . Yang, and A. G. Hauptmann, “Person re-identiﬁcation: Past, present and future,” arXiv preprint arXiv:1610.02984 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016