Generalization analysis with deep ReLU networks for metric and similarity learning
read the original abstract
While metric and similarity learning has been extensively studied from several theoretical perspectives, a rigorous understanding of its generalization performance is still lacking. In this paper, we investigate the generalization behavior of metric and similarity learning by exploiting the specific structure of the true metric (i.e., the target function). In particular, by deriving the explicit form of the true metric for metric and similarity learning with the hinge loss, we construct a structured deep ReLU neural network as an approximation of the true metric, whose approximation ability depends on the network complexity. Here, the network complexity is characterized by the network depth, the number of nonzero weights, and the number of computational units. Based on the hypothesis space consisting of such structured deep ReLU networks, we establish excess risk bounds for metric and similarity learning by carefully controlling both the approximation error and the estimation error. An explicit excess risk rate is derived by choosing the proper capacity of the constructed hypothesis space. To the best of our knowledge, this is the first generalization analysis that provides explicit excess risk bounds for metric and similarity learning. In addition, we investigate properties of the true metric for metric and similarity learning under more general loss functions. Experiments show that the proposed model is empirically competitive and better captures the underlying similarity structure.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise
First population risk bounds for KANs under mini-batch DP-SGD with correlated noise, using a new non-convex optimization analysis combined with stability-based generalization.
-
Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks
Proves learnability of ordered multiple smooth boundaries in pairwise binary classification via localized deep ReLU networks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.