On the relation between Loss Functions and T-Norms

Francesco Giannini; Giuseppe Marra; Marco Gori; Marco Maggini; Michelangelo Diligenti

arxiv: 1907.07904 · v1 · pith:YTWVYVGTnew · submitted 2019-07-18 · 💻 cs.LG · stat.ML

On the relation between Loss Functions and T-Norms

Francesco Giannini , Giuseppe Marra , Michelangelo Diligenti , Marco Maggini , Marco Gori This is my paper

Pith reviewed 2026-05-24 19:46 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords loss functionst-normscross-entropydeep learningsupervised learninggenerator functionsfuzzy logicneural networks

0 comments

The pith

Cross-entropy loss equals the generator function of a t-norm, yielding a general family of losses built from other t-norms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reinterprets the cross-entropy loss through the generator functions of continuous t-norms rather than solely through probability. It derives an explicit relation that maps any such generator to a corresponding loss suitable for gradient-based training. If the mapping is valid, then choosing different t-norms produces new losses that inherit properties from fuzzy logic while remaining usable in supervised learning. A reader would care because the relation supplies a systematic construction method instead of ad-hoc design of training objectives. The result unifies a probabilistic justification with a logical one.

Core claim

The paper establishes that loss functions used in neural network training correspond directly to the additive generator functions of continuous t-norms, with the cross-entropy loss arising specifically from the generator of the product t-norm, and that this correspondence produces a parametric family of novel losses applicable to any supervised task.

What carries the argument

The direct mapping from t-norm generator functions to loss functions that can be minimized by gradient descent.

If this is right

Losses can be constructed systematically by selecting different continuous t-norms.
The new losses remain compatible with any supervised learning pipeline that uses gradient descent.
Cross-entropy is recovered exactly as one member of the family rather than treated as an isolated choice.
Properties of t-norms may translate into convergence or gradient behavior of the associated losses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same generator correspondence could be tested on regression or structured-prediction tasks to check whether advantages appear outside classification.
Results from the theory of t-norms on nilpotency or strictness might predict which derived losses avoid vanishing gradients in very deep stacks.
The relation invites checking whether other training components, such as regularizers, admit analogous t-norm interpretations.

Load-bearing premise

The generator functions of t-norms supply a mathematically direct and practically usable definition of loss functions for gradient-based supervised learning.

What would settle it

An experiment in which a loss constructed from the generator of a non-product t-norm produces unstable training or strictly worse test accuracy than cross-entropy on a standard image-classification benchmark would falsify the claimed usefulness of the relation.

Figures

Figures reproduced from arXiv: 1907.07904 by Francesco Giannini, Giuseppe Marra, Marco Gori, Marco Maggini, Michelangelo Diligenti.

read the original abstract

Deep learning has been shown to achieve impressive results in several domains like computer vision and natural language processing. A key element of this success has been the development of new loss functions, like the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. While the cross-entropy loss is usually justified from a probabilistic perspective, this paper shows an alternative and more direct interpretation of this loss in terms of t-norms and their associated generator functions, and derives a general relation between loss functions and t-norms. In particular, the presented work shows intriguing results leading to the development of a novel class of loss functions. These losses can be exploited in any supervised learning task and which could lead to faster convergence rates that the commonly employed cross-entropy loss.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper re-derives cross-entropy from t-norm generators and builds a general construction for new losses from the same relation.

read the letter

This paper links loss functions to t-norms by working through their generator functions. It gives cross-entropy an alternative derivation this way and then uses the same relation to produce a wider set of losses for supervised learning tasks. The central move is to treat the generator as the source of the loss expression rather than starting from probabilities or information measures. That produces a systematic route to new losses that the authors present as novel. The derivation itself looks direct and stays inside the t-norm setup without obvious circular steps. If the algebra holds, it supplies a clean alternative justification and a recipe that practitioners could adapt. The practical claim about faster convergence is stated only as a possibility. No experiments or gradient analysis appear to back it up, so that part stays speculative for now. The work is aimed at people already thinking about loss design or who have some exposure to fuzzy logic and t-norms. A reader in that group can pull out the construction and test it on their own problems. Outside that group the payoff is narrower unless the new losses show clear advantages in later work. I would send it to peer review. The mathematical correspondence is worth referee scrutiny on its own terms, even if the empirical side needs more development.

Referee Report

0 major / 3 minor

Summary. The paper claims that the cross-entropy loss admits an alternative interpretation via t-norms and their generator functions, derives a general relation between arbitrary loss functions and t-norms, and uses this relation to construct a novel family of loss functions for supervised learning that may exhibit improved convergence properties.

Significance. If the central derivation is correct, the work supplies a mathematically grounded correspondence between a widely used loss and the generator functions of continuous t-norms, together with a constructive method for generating new losses. This could be useful for designing losses that inherit desirable analytic properties (e.g., strict monotonicity or specific behavior near zero) directly from the t-norm axioms rather than from probabilistic arguments alone.

minor comments (3)

The abstract states that the new losses 'could lead to faster convergence rates that the commonly employed cross-entropy loss'; the word 'that' should be 'than'. More importantly, the claim is presented as a possible implication; if the manuscript contains any empirical comparison, the relevant table or figure should be referenced explicitly in the abstract or introduction.
Notation for the generator function g and its pseudo-inverse should be introduced once and used consistently; any re-definition in later sections should be flagged.
The manuscript should include a short table or list that explicitly maps at least three standard t-norms (e.g., product, Łukasiewicz, nilpotent minimum) to the corresponding loss functions obtained from the general relation, so that the construction is immediately verifiable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and for the positive assessment of the manuscript. The referee's summary correctly identifies the central contributions: an alternative interpretation of cross-entropy via t-norm generators, a general relation between losses and t-norms, and the construction of a new family of losses. We are pleased that the potential utility for designing losses with desirable analytic properties is recognized. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical correspondence

full rationale

The paper's central claim is a direct mathematical mapping from loss functions (e.g., cross-entropy) to t-norm generator functions, yielding a general construction for new losses. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations are visible in the provided abstract or stated claims. The derivation stands as an independent interpretive relation without reducing to its inputs by construction. This is the expected honest non-finding for a purely mathematical correspondence paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that t-norm generators map directly onto useful loss functions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Continuous t-norms possess generator functions that can be used to define loss functions for supervised learning.
This mapping is the load-bearing step that turns the fuzzy-logic object into a training objective.

pith-pipeline@v0.9.0 · 5674 in / 1094 out tokens · 48031 ms · 2026-05-24T19:46:09.144166+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Journal of Machine Learning Research 18, 1–67 (2017)

Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random ﬁelds and probabilistic soft logic. Journal of Machine Learning Research 18, 1–67 (2017)

work page 2017
[2]

Beliakov, G., Pradera, A., Calvo, T.: Aggregation functions: A guide for practi- tioners, vol. 221. Springer (2007)

work page 2007
[3]

In: Aggregation operators, pp

Calvo, T., Koles´ arov´ a, A., Komorn´ ıkov´ a, M., Mesiar, R.: Aggregation operators: properties, classes and construction methods. In: Aggregation operators, pp. 3–104. Springer (2002)

work page 2002
[4]

Artiﬁcial Intelligence 244, 143–165 (2017)

Diligenti, M., Gori, M., Sacca, C.: Semantic-based regularization for learning and inference. Artiﬁcial Intelligence 244, 143–165 (2017)

work page 2017
[5]

In: IJCAI International Joint Conference on Artiﬁcial Intelligence

Donadello, I., Seraﬁni, L., d’Avila Garcez, A.: Logic tensor networks for seman- tic image interpretation. In: IJCAI International Joint Conference on Artiﬁcial Intelligence. pp. 1596–1602 (2017)

work page 2017
[6]

IEEE Transactions on Fuzzy Systems (2018)

Giannini, F., Diligenti, M., Gori, M., Maggini, M.: On a convex logic fragment for learning and reasoning. IEEE Transactions on Fuzzy Systems (2018)

work page 2018
[7]

Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT press Cambridge (2016)

work page 2016
[8]

Information Sciences 181(1), 1–22 (2011)

Grabisch, M., Marichal, J.L., Mesiar, R., Pap, E.: Aggregation functions: means. Information Sciences 181(1), 1–22 (2011)

work page 2011
[9]

H´ ajek, P.: Metamathematics of fuzzy logic, vol. 4. Springer Science & Business Media (2013)

work page 2013
[10]

Fuzzy Sets and Systems 126(2), 199–205 (2002)

Jenei, S.: A note on the ordinal sum theorem and its consequence for the construc- tion of triangular norms. Fuzzy Sets and Systems 126(2), 199–205 (2002)

work page 2002
[11]

position paper i: basic ana- lytical and algebraic properties

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper i: basic ana- lytical and algebraic properties. Fuzzy Sets and Systems 143(1), 5–26 (2004)

work page 2004
[12]

position paper ii: general constructions and parameterized families

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper ii: general constructions and parameterized families. Fuzzy Sets and Systems145(3), 411–438 (2004)

work page 2004
[13]

position paper iii: continuous t-norms

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper iii: continuous t-norms. Fuzzy Sets and Systems 145(3), 439–454 (2004) 10 F. Giannini et al

work page 2004
[14]

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms, vol. 8. Springer Science & Business Media (2013)

work page 2013
[15]

In: IJCAI

Kolb, S., Teso, S., Passerini, A., De Raedt, L.: Learning smt (lra) constraints using smt solvers. In: IJCAI. pp. 2333–2340 (2018)

work page 2018
[16]

MIT press (2007)

Koller, D., Friedman, N., Dˇ zeroski, S., Sutton, C., McCallum, A., Pfeﬀer, A., Abbeel, P., Wong, M.F., Heckerman, D., Meek, C., et al.: Introduction to sta- tistical relational learning. MIT press (2007)

work page 2007
[17]

Nov´ ak, V., Perﬁlieva, I., Mockor, J.: Mathematical principles of fuzzy logic, vol. 517. Springer Science & Business Media (2012)

work page 2012
[18]

Machine learning 62(1), 107–136 (2006)

Richardson, M., Domingos, P.: Markov logic networks. Machine learning 62(1), 107–136 (2006)

work page 2006
[19]

Springer Science & Business Media (2007)

Torra, V., Narukawa, Y.: Modeling decisions: information fusion and aggregation operators. Springer Science & Business Media (2007)

work page 2007
[20]

A Semantic Loss Function for Deep Learning with Symbolic Knowledge

Xu, J., Zhang, Z., Friedman, T., Liang, Y., Broeck, G.V.d.: A semantic loss function for deep learning with symbolic knowledge. arXiv preprint arXiv:1711.11157 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

In: Advances in Neural Information Processing Systems

Yang, F., Yang, Z., Cohen, W.W.: Diﬀerentiable learning of logical rules for knowl- edge base reasoning. In: Advances in Neural Information Processing Systems. pp. 2319–2328 (2017)

work page 2017

[1] [1]

Journal of Machine Learning Research 18, 1–67 (2017)

Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random ﬁelds and probabilistic soft logic. Journal of Machine Learning Research 18, 1–67 (2017)

work page 2017

[2] [2]

Beliakov, G., Pradera, A., Calvo, T.: Aggregation functions: A guide for practi- tioners, vol. 221. Springer (2007)

work page 2007

[3] [3]

In: Aggregation operators, pp

Calvo, T., Koles´ arov´ a, A., Komorn´ ıkov´ a, M., Mesiar, R.: Aggregation operators: properties, classes and construction methods. In: Aggregation operators, pp. 3–104. Springer (2002)

work page 2002

[4] [4]

Artiﬁcial Intelligence 244, 143–165 (2017)

Diligenti, M., Gori, M., Sacca, C.: Semantic-based regularization for learning and inference. Artiﬁcial Intelligence 244, 143–165 (2017)

work page 2017

[5] [5]

In: IJCAI International Joint Conference on Artiﬁcial Intelligence

Donadello, I., Seraﬁni, L., d’Avila Garcez, A.: Logic tensor networks for seman- tic image interpretation. In: IJCAI International Joint Conference on Artiﬁcial Intelligence. pp. 1596–1602 (2017)

work page 2017

[6] [6]

IEEE Transactions on Fuzzy Systems (2018)

Giannini, F., Diligenti, M., Gori, M., Maggini, M.: On a convex logic fragment for learning and reasoning. IEEE Transactions on Fuzzy Systems (2018)

work page 2018

[7] [7]

Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT press Cambridge (2016)

work page 2016

[8] [8]

Information Sciences 181(1), 1–22 (2011)

Grabisch, M., Marichal, J.L., Mesiar, R., Pap, E.: Aggregation functions: means. Information Sciences 181(1), 1–22 (2011)

work page 2011

[9] [9]

H´ ajek, P.: Metamathematics of fuzzy logic, vol. 4. Springer Science & Business Media (2013)

work page 2013

[10] [10]

Fuzzy Sets and Systems 126(2), 199–205 (2002)

Jenei, S.: A note on the ordinal sum theorem and its consequence for the construc- tion of triangular norms. Fuzzy Sets and Systems 126(2), 199–205 (2002)

work page 2002

[11] [11]

position paper i: basic ana- lytical and algebraic properties

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper i: basic ana- lytical and algebraic properties. Fuzzy Sets and Systems 143(1), 5–26 (2004)

work page 2004

[12] [12]

position paper ii: general constructions and parameterized families

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper ii: general constructions and parameterized families. Fuzzy Sets and Systems145(3), 411–438 (2004)

work page 2004

[13] [13]

position paper iii: continuous t-norms

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper iii: continuous t-norms. Fuzzy Sets and Systems 145(3), 439–454 (2004) 10 F. Giannini et al

work page 2004

[14] [14]

Klement, E.P., Mesiar, R., Pap, E.: Triangular norms, vol. 8. Springer Science & Business Media (2013)

work page 2013

[15] [15]

In: IJCAI

Kolb, S., Teso, S., Passerini, A., De Raedt, L.: Learning smt (lra) constraints using smt solvers. In: IJCAI. pp. 2333–2340 (2018)

work page 2018

[16] [16]

MIT press (2007)

Koller, D., Friedman, N., Dˇ zeroski, S., Sutton, C., McCallum, A., Pfeﬀer, A., Abbeel, P., Wong, M.F., Heckerman, D., Meek, C., et al.: Introduction to sta- tistical relational learning. MIT press (2007)

work page 2007

[17] [17]

Nov´ ak, V., Perﬁlieva, I., Mockor, J.: Mathematical principles of fuzzy logic, vol. 517. Springer Science & Business Media (2012)

work page 2012

[18] [18]

Machine learning 62(1), 107–136 (2006)

Richardson, M., Domingos, P.: Markov logic networks. Machine learning 62(1), 107–136 (2006)

work page 2006

[19] [19]

Springer Science & Business Media (2007)

Torra, V., Narukawa, Y.: Modeling decisions: information fusion and aggregation operators. Springer Science & Business Media (2007)

work page 2007

[20] [20]

A Semantic Loss Function for Deep Learning with Symbolic Knowledge

Xu, J., Zhang, Z., Friedman, T., Liang, Y., Broeck, G.V.d.: A semantic loss function for deep learning with symbolic knowledge. arXiv preprint arXiv:1711.11157 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

In: Advances in Neural Information Processing Systems

Yang, F., Yang, Z., Cohen, W.W.: Diﬀerentiable learning of logical rules for knowl- edge base reasoning. In: Advances in Neural Information Processing Systems. pp. 2319–2328 (2017)

work page 2017