pith. sign in

arxiv: 1907.07904 · v1 · pith:YTWVYVGTnew · submitted 2019-07-18 · 💻 cs.LG · stat.ML

On the relation between Loss Functions and T-Norms

Pith reviewed 2026-05-24 19:46 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords loss functionst-normscross-entropydeep learningsupervised learninggenerator functionsfuzzy logicneural networks
0
0 comments X

The pith

Cross-entropy loss equals the generator function of a t-norm, yielding a general family of losses built from other t-norms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reinterprets the cross-entropy loss through the generator functions of continuous t-norms rather than solely through probability. It derives an explicit relation that maps any such generator to a corresponding loss suitable for gradient-based training. If the mapping is valid, then choosing different t-norms produces new losses that inherit properties from fuzzy logic while remaining usable in supervised learning. A reader would care because the relation supplies a systematic construction method instead of ad-hoc design of training objectives. The result unifies a probabilistic justification with a logical one.

Core claim

The paper establishes that loss functions used in neural network training correspond directly to the additive generator functions of continuous t-norms, with the cross-entropy loss arising specifically from the generator of the product t-norm, and that this correspondence produces a parametric family of novel losses applicable to any supervised task.

What carries the argument

The direct mapping from t-norm generator functions to loss functions that can be minimized by gradient descent.

If this is right

  • Losses can be constructed systematically by selecting different continuous t-norms.
  • The new losses remain compatible with any supervised learning pipeline that uses gradient descent.
  • Cross-entropy is recovered exactly as one member of the family rather than treated as an isolated choice.
  • Properties of t-norms may translate into convergence or gradient behavior of the associated losses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same generator correspondence could be tested on regression or structured-prediction tasks to check whether advantages appear outside classification.
  • Results from the theory of t-norms on nilpotency or strictness might predict which derived losses avoid vanishing gradients in very deep stacks.
  • The relation invites checking whether other training components, such as regularizers, admit analogous t-norm interpretations.

Load-bearing premise

The generator functions of t-norms supply a mathematically direct and practically usable definition of loss functions for gradient-based supervised learning.

What would settle it

An experiment in which a loss constructed from the generator of a non-product t-norm produces unstable training or strictly worse test accuracy than cross-entropy on a standard image-classification benchmark would falsify the claimed usefulness of the relation.

Figures

Figures reproduced from arXiv: 1907.07904 by Francesco Giannini, Giuseppe Marra, Marco Gori, Marco Maggini, Michelangelo Diligenti.

Figure 1
Figure 1. Figure 1: Convergence speed of multiple generated loss functions on the MNIST [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
read the original abstract

Deep learning has been shown to achieve impressive results in several domains like computer vision and natural language processing. A key element of this success has been the development of new loss functions, like the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. While the cross-entropy loss is usually justified from a probabilistic perspective, this paper shows an alternative and more direct interpretation of this loss in terms of t-norms and their associated generator functions, and derives a general relation between loss functions and t-norms. In particular, the presented work shows intriguing results leading to the development of a novel class of loss functions. These losses can be exploited in any supervised learning task and which could lead to faster convergence rates that the commonly employed cross-entropy loss.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that the cross-entropy loss admits an alternative interpretation via t-norms and their generator functions, derives a general relation between arbitrary loss functions and t-norms, and uses this relation to construct a novel family of loss functions for supervised learning that may exhibit improved convergence properties.

Significance. If the central derivation is correct, the work supplies a mathematically grounded correspondence between a widely used loss and the generator functions of continuous t-norms, together with a constructive method for generating new losses. This could be useful for designing losses that inherit desirable analytic properties (e.g., strict monotonicity or specific behavior near zero) directly from the t-norm axioms rather than from probabilistic arguments alone.

minor comments (3)
  1. The abstract states that the new losses 'could lead to faster convergence rates that the commonly employed cross-entropy loss'; the word 'that' should be 'than'. More importantly, the claim is presented as a possible implication; if the manuscript contains any empirical comparison, the relevant table or figure should be referenced explicitly in the abstract or introduction.
  2. Notation for the generator function g and its pseudo-inverse should be introduced once and used consistently; any re-definition in later sections should be flagged.
  3. The manuscript should include a short table or list that explicitly maps at least three standard t-norms (e.g., product, Łukasiewicz, nilpotent minimum) to the corresponding loss functions obtained from the general relation, so that the construction is immediately verifiable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and for the positive assessment of the manuscript. The referee's summary correctly identifies the central contributions: an alternative interpretation of cross-entropy via t-norm generators, a general relation between losses and t-norms, and the construction of a new family of losses. We are pleased that the potential utility for designing losses with desirable analytic properties is recognized. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical correspondence

full rationale

The paper's central claim is a direct mathematical mapping from loss functions (e.g., cross-entropy) to t-norm generator functions, yielding a general construction for new losses. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations are visible in the provided abstract or stated claims. The derivation stands as an independent interpretive relation without reducing to its inputs by construction. This is the expected honest non-finding for a purely mathematical correspondence paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that t-norm generators map directly onto useful loss functions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Continuous t-norms possess generator functions that can be used to define loss functions for supervised learning.
    This mapping is the load-bearing step that turns the fuzzy-logic object into a training objective.

pith-pipeline@v0.9.0 · 5674 in / 1094 out tokens · 48031 ms · 2026-05-24T19:46:09.144166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

  1. [1]

    Journal of Machine Learning Research 18, 1–67 (2017)

    Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random fields and probabilistic soft logic. Journal of Machine Learning Research 18, 1–67 (2017)

  2. [2]

    Beliakov, G., Pradera, A., Calvo, T.: Aggregation functions: A guide for practi- tioners, vol. 221. Springer (2007)

  3. [3]

    In: Aggregation operators, pp

    Calvo, T., Koles´ arov´ a, A., Komorn´ ıkov´ a, M., Mesiar, R.: Aggregation operators: properties, classes and construction methods. In: Aggregation operators, pp. 3–104. Springer (2002)

  4. [4]

    Artificial Intelligence 244, 143–165 (2017)

    Diligenti, M., Gori, M., Sacca, C.: Semantic-based regularization for learning and inference. Artificial Intelligence 244, 143–165 (2017)

  5. [5]

    In: IJCAI International Joint Conference on Artificial Intelligence

    Donadello, I., Serafini, L., d’Avila Garcez, A.: Logic tensor networks for seman- tic image interpretation. In: IJCAI International Joint Conference on Artificial Intelligence. pp. 1596–1602 (2017)

  6. [6]

    IEEE Transactions on Fuzzy Systems (2018)

    Giannini, F., Diligenti, M., Gori, M., Maggini, M.: On a convex logic fragment for learning and reasoning. IEEE Transactions on Fuzzy Systems (2018)

  7. [7]

    Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT press Cambridge (2016)

  8. [8]

    Information Sciences 181(1), 1–22 (2011)

    Grabisch, M., Marichal, J.L., Mesiar, R., Pap, E.: Aggregation functions: means. Information Sciences 181(1), 1–22 (2011)

  9. [9]

    H´ ajek, P.: Metamathematics of fuzzy logic, vol. 4. Springer Science & Business Media (2013)

  10. [10]

    Fuzzy Sets and Systems 126(2), 199–205 (2002)

    Jenei, S.: A note on the ordinal sum theorem and its consequence for the construc- tion of triangular norms. Fuzzy Sets and Systems 126(2), 199–205 (2002)

  11. [11]

    position paper i: basic ana- lytical and algebraic properties

    Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper i: basic ana- lytical and algebraic properties. Fuzzy Sets and Systems 143(1), 5–26 (2004)

  12. [12]

    position paper ii: general constructions and parameterized families

    Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper ii: general constructions and parameterized families. Fuzzy Sets and Systems145(3), 411–438 (2004)

  13. [13]

    position paper iii: continuous t-norms

    Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper iii: continuous t-norms. Fuzzy Sets and Systems 145(3), 439–454 (2004) 10 F. Giannini et al

  14. [14]

    Klement, E.P., Mesiar, R., Pap, E.: Triangular norms, vol. 8. Springer Science & Business Media (2013)

  15. [15]

    In: IJCAI

    Kolb, S., Teso, S., Passerini, A., De Raedt, L.: Learning smt (lra) constraints using smt solvers. In: IJCAI. pp. 2333–2340 (2018)

  16. [16]

    MIT press (2007)

    Koller, D., Friedman, N., Dˇ zeroski, S., Sutton, C., McCallum, A., Pfeffer, A., Abbeel, P., Wong, M.F., Heckerman, D., Meek, C., et al.: Introduction to sta- tistical relational learning. MIT press (2007)

  17. [17]

    Nov´ ak, V., Perfilieva, I., Mockor, J.: Mathematical principles of fuzzy logic, vol. 517. Springer Science & Business Media (2012)

  18. [18]

    Machine learning 62(1), 107–136 (2006)

    Richardson, M., Domingos, P.: Markov logic networks. Machine learning 62(1), 107–136 (2006)

  19. [19]

    Springer Science & Business Media (2007)

    Torra, V., Narukawa, Y.: Modeling decisions: information fusion and aggregation operators. Springer Science & Business Media (2007)

  20. [20]

    A Semantic Loss Function for Deep Learning with Symbolic Knowledge

    Xu, J., Zhang, Z., Friedman, T., Liang, Y., Broeck, G.V.d.: A semantic loss function for deep learning with symbolic knowledge. arXiv preprint arXiv:1711.11157 (2017)

  21. [21]

    In: Advances in Neural Information Processing Systems

    Yang, F., Yang, Z., Cohen, W.W.: Differentiable learning of logical rules for knowl- edge base reasoning. In: Advances in Neural Information Processing Systems. pp. 2319–2328 (2017)