In Defense of the Triplet Loss for Person Re-Identification

Alexander Hermans; Bastian Leibe; Lucas Beyer

In Defense of the Triplet Loss for Person Re-Identification

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1703.07737 v4 pith:C7U5Z2GQ submitted 2017-03-22 cs.CV cs.NE

In Defense of the Triplet Loss for Person Re-Identification

Alexander Hermans , Lucas Beyer , Bastian Leibe This is my paper

classification cs.CV cs.NE

keywords learninglosstripletdeepend-to-endlargemetricperson

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

In the past few years, the field of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person re-identification subfield is no exception to this. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (classification, verification) followed by a separate metric learning step. We show that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.

discussion (0)

Forward citations

Cited by 41 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation
cs.RO 2026-06 unverdicted novelty 7.0

Instant-Fold enables execution of multiple deformable object manipulation modes from a single demonstration via a flow-matching transformer policy that transfers zero-shot from simulation to real robots.
ToolFG: Towards Well-Grounded Fine-Grained Image Classification
cs.CV 2026-06 unverdicted novelty 7.0

ToolFG is the first tool-integrated MLLM framework for fine-grained image classification, trained via MCTS-guided distillation from proprietary models and refined through model-tool co-evolution.
Generalization Limits in Vehicle Re-Identification
cs.CV 2026-06 unverdicted novelty 7.0

Standard vehicle re-ID benchmarks allow memorization of seen vehicle types; a new train/test split by vehicle type and view shows that state-of-the-art methods fail to generalize to unseen vehicles.
From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification
cs.CV 2026-04 conditional novelty 7.0

SAGA-ReID improves CLIP-based person ReID by using structured anchor-guided aggregation of patch tokens, delivering up to 10.6 Rank-1 gains on occluded benchmarks over global pooling.
CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras
cs.CV 2026-02 unverdicted novelty 7.0

CityGuard introduces a graph-aware transformer with dispersion-adaptive metrics, spatially conditioned attention using coarse geometry, and differentially private embeddings to support privacy-preserving identity retr...
RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition
cs.CV 2025-12 conditional novelty 7.0

RAG-HAR combines retrieval-augmented generation with LLMs to deliver state-of-the-art human activity recognition across six benchmarks without any model training or fine-tuning.
HiHR: Hierarchical Hyperbolic Representation for Aerial-Ground Person Re-Identification
cs.CV 2026-07 conditional novelty 6.0

Hierarchical hyperbolic embeddings with text-guided multi-granularity fusion improve aerial-ground person re-identification by keeping both view-invariant identity and view-specific cues.
STELLA: Efficient Sensor-to-LLM Translation for On-Device Human Activity Recognition
cs.LG 2026-07 conditional novelty 6.0

A hierarchical sensor tokenizer maps entire multi-channel IMU windows to 16 fixed latent tokens that a frozen GPT-2 scores for activity labels, yielding SOTA on-device HAR with local personalization.
Graph-of-Differences: Anatomy-Structured Difference Alignment for Medical Image Re-Identification
cs.CV 2026-06 unverdicted novelty 6.0

GoD uses anatomy graphs and difference alignment to improve medical image re-identification accuracy and auditability, with +7.1 pp Rank-1 gains on fundus and +3.1 pp on CXR.
TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living
cs.CV 2026-06 unverdicted novelty 6.0

TimeProVe proposes a propose-then-verify framework using lightweight action-based candidate evidence generation followed by targeted VLM verification for efficient long video temporal reasoning, achieving 7.3% improve...
Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification
cs.CV 2026-06 unverdicted novelty 6.0

Ortho-ReID learns instance-adaptive low-rank orthogonal subspaces from VLM text via a transformer Basis Maker and enforces geometric constraints to produce clothing-invariant identity features, reporting SOTA gains on...
MERIT: Learning Disentangled Music Representations for Audio Similarity
cs.SD 2026-05 unverdicted novelty 6.0

MERIT trains disentangled heads for melody, rhythm, and timbre via conditional audio generation and stem separation, with evaluations showing each head responds strongly to its target dimension and near chance on othe...
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
cs.CL 2026-05 unverdicted novelty 6.0

MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive...
Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification
cs.CV 2026-05 unverdicted novelty 6.0

PAD uses prompt distillation on the text side and domain-adaptive EMA prompts on the visual side to balance stability and plasticity in lifelong person re-identification.
ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)
cs.CV 2026-05 accept novelty 6.0

A new benchmark dataset and competition for top-view RGB-Depth person re-identification is released, with competition results showing RGB easier than depth and cross-modal retrieval.
Complexity of Linear Regions in Self-supervised Deep ReLU Networks
cs.LG 2026-04 unverdicted novelty 6.0

Self-supervised ReLU networks form substantially fewer linear regions than supervised models for comparable accuracy, with contrastive methods rapidly expanding regions and self-distillation consolidating them, enabli...
Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification
cs.CV 2026-04 unverdicted novelty 6.0

ReID-R achieves competitive person re-identification performance using chain-of-thought reasoning and reinforcement learning with only 14.3K non-trivial samples, about 20.9% of typical data scales, while providing int...
CraterBench-R: Instance-Level Crater Retrieval for Planetary Scale
cs.CV 2026-04 unverdicted novelty 6.0

CraterBench-R is a new retrieval benchmark where self-supervised ViTs with a training-free instance-token aggregation method achieve high accuracy for identifying individual craters while reducing storage needs.
Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification
cs.CV 2025-12 unverdicted novelty 6.0

DMDL debias modality cues at model and optimization levels via causal adjustment intervention and collaborative bias-free training to learn modality-invariant features for unsupervised VI-ReID.
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
cs.CV 2025-04 unverdicted novelty 6.0

SD-ReID trains a ViT to extract identity and view conditions, fine-tunes Stable Diffusion to generate view-mimicking features, adds a View-Refined Decoder, and combines both identity and all-view features for retrieva...
When Large Vision-Language Models Meet Person Re-Identification
cs.CV 2024-11 unverdicted novelty 6.0

LVLM-ReID guides LVLMs to produce refined semantic tokens as pedestrian identity features for ReID, achieving competitive benchmark results without additional image-text data.
VRSTC: Occlusion-Free Video Person Re-Identification
cs.CV 2019-07 unverdicted novelty 6.0

STCnet recovers occluded parts in video person re-ID using spatio-temporal cues to form the VRSTC framework, outperforming prior methods on three datasets.
A Novel Teacher-Student Learning Framework For Occluded Person Re-Identification
cs.CV 2019-07 unverdicted novelty 6.0

A teacher-student model with co-saliency network and growing-probability occlusion simulator outperforms prior methods on four occluded person re-identification benchmarks.
E-Sports Talent Scouting Based on Multimodal Twitch Stream Data
cs.LG 2019-07 unverdicted novelty 6.0

Neural features from Twitch streams are pooled via hierarchical Bayesian model to estimate CS:GO gamer intrinsic skill, validated by correlation with subsequent public ranks.
DNSMOS-C: Improving End-to-end Speech Quality Models via Contrastive Learning
eess.AS 2026-06 unverdicted novelty 5.0

DNSMOS-C adds MOS-guided triplet contrastive loss to DNSMOS Pro for improved correlation, out-of-domain generalization, and emergent low-dimensional quality ordering in embeddings via unified training.
ConcernBERT: Learning Responsibilities Using Class Membership
cs.SE 2026-06 unverdicted novelty 5.0

ConcernBERT is a BERT embedding model trained with triplet loss on class membership to encode concern-level semantics in Java entities, evaluated by recovering original classes from merged unlabeled groups on a new da...
Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining
cs.CL 2026-06 unverdicted novelty 5.0

ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
Beyond Pedestrians: Caption-Guided CLIP Framework for High-Difficulty Video-based Person Re-Identification
cs.CV 2026-04 unverdicted novelty 5.0

CG-CLIP adds caption-guided memory refinement and token-based spatiotemporal aggregation to CLIP for video person ReID, outperforming SOTA on MARS, iLIDS-VID, SportsVReID and DanceVReID.
The Thue-Morse Transform
math.NT 2026-04 unverdicted novelty 5.0

A Thue-Morse transform T maps 010101… to the Thue-Morse word; its iterates admit explicit formulas, PTE identities, composition laws, and a complete factor-complexity description.
HOTFLoc++: End-to-End Hierarchical LiDAR Place Recognition, Re-Ranking, and 6-DoF Metric Localisation in Forests
cs.CV 2025-11 unverdicted novelty 5.0

A hierarchical octree-transformer framework for LiDAR place recognition, multi-scale re-ranking, and 6-DoF localization in forests reports 90.7% Recall@1 on CS-Wild-Places and under 2 m / 5° error on 97.2% of registra...
A Novel Method for News Article Event-Based Embedding
cs.CL 2024-05 unverdicted novelty 5.0

Proposes an event-based news embedding method via entity/theme extraction, periodic GloVe models, SIF, and Siamese networks, claiming outperformance on shared event detection using GDELT data.
Hard-Aware Fashion Attribute Classification
cs.CV 2019-07 unverdicted novelty 5.0

Presents HABP to emphasize hard samples during training and Deact to generate stable synthetic samples for rare attributes, outperforming prior methods on large-scale fashion datasets without extra supervision.
Spatial-Temporal Expert Learning for Video-based Person Re-identification
cs.CV 2026-07 unverdicted novelty 4.0

Proposes dynamic expert selection with input-aware and spatial-temporal mechanisms plus an extendable scheme to improve fine-grained feature use in video person Re-ID.
Privacy-Preserving Person Re-Identification from Temporal Sequences with Transformer and Hungarian Optimization
cs.CV 2026-06 unverdicted novelty 4.0

Depth-only Re-ID via Transformer on temporal sequences and Hungarian matching achieves competitive accuracy on top-view datasets while prioritizing privacy.
HDST-GNN: Heterogeneous Dynamic Spatiotemporal Graph Neural Networks for Multi-Object Tracking in UAV Aerial Imagery
cs.CV 2026-06 unverdicted novelty 4.0

The paper introduces HDST-GNN, a heterogeneous dynamic spatiotemporal GNN for UAV multi-object tracking with altitude-adaptive edges, typed nodes, and occlusion-gated aggregation, reporting 94.51% MOTA on VisDrone2019-MOT.
Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening
cs.CV 2026-05 unverdicted novelty 4.0

SSL pretraining enhances calibrated confidence and selective performance in DR screening, yet benefits on reliability plateau and longer pretraining does not reliably improve abstention.
Few-Shot Network Intrusion Detection Using Online Triplet Mining
cs.CR 2026-05 unverdicted novelty 4.0

A triplet network using online triplet mining and KNN classifier achieves competitive few-shot performance on network intrusion detection with as few as 10 malicious samples per class.
On the Properties of Feature Attribution for Supervised Contrastive Learning
cs.LG 2026-04 unverdicted novelty 4.0

Neural networks trained via supervised contrastive learning yield feature attributions that are more faithful, less complex, and more continuous than those from cross-entropy trained networks.
Identity-Aware U-Net: Fine-grained Cell Segmentation via Identity-Aware Representation Learning
cs.CV 2026-04 unverdicted novelty 4.0

Identity-Aware U-Net augments a U-Net backbone with an auxiliary embedding branch and triplet metric learning to discriminate among cells with near-identical shapes and textures.
Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-Identification
cs.CV 2019-07 unverdicted novelty 4.0

EDFL improves visible-thermal person re-identification by enhancing feature discriminability with skip connections and dual-modality triplet loss, outperforming state-of-the-art on two datasets.
ThirdEye: Triplet Based Iris Recognition without Normalization
cs.CV 2019-07 unverdicted novelty 4.0

ThirdEye applies triplet convolutional neural networks directly to segmented iris images without normalization, reporting EERs of 1.32% on ND-0405, 9.20% on UbirisV2, and 0.59% on IITD, improving prior results on the ...