pith. sign in

arxiv: 2602.11131 · v2 · pith:VATU4PQHnew · submitted 2026-02-11 · ⚛️ physics.soc-ph · math.ST· stat.TH

Formalization of the generalized Pareto principle and structural typicality of the 20/80-rule

Pith reviewed 2026-05-16 05:36 UTC · model grok-4.3

classification ⚛️ physics.soc-ph math.STstat.TH
keywords Pareto principle20/80 ruleKolkata indexLorenz curvetruncated distributionsfinite-sample effectsgain densitiesdecreasing rearrangement
0
0 comments X p. Extension
pith:VATU4PQH Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{VATU4PQH}

Prints a linked pith:VATU4PQH badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

The generalized Pareto principle, where a fraction p of inputs produces a fraction 1-p of outputs, emerges structurally from truncated exponential and normal distributions for sample sizes between 100 and 100000, concentrating near the 20/

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes the generalized Pareto principle as the property that a fraction p of the largest inputs accounts for a fraction 1-p of the total output, obtained uniquely via the decreasing rearrangement of a non-negative gain density. For probability distributions this p equals one minus the Kolkata index of the associated Lorenz curve. Closed-form expressions are derived for p in the truncated power-law, exponential, and normal families. When these expressions are paired with estimates of the truncation cutoff as a function of sample size N, the resulting predictions show that p for exponential data falls in [0.15, 0.26] and for normal data in [0.20, 0.29] when N lies between 10^2 and 10^5.

Core claim

The central claim is that the imbalance parameter p defined by the generalized Pareto principle is a direct, calculable consequence of the decreasing rearrangement applied to truncated common distributions; when finite-sample truncation is taken into account, p for both exponential and normal families concentrates in narrow intervals around the canonical 0.2 value for realistic dataset sizes, remaining strictly below the infinite-sample saturation conjectured earlier.

What carries the argument

The decreasing rearrangement of the gain density ℓ, which produces a unique p satisfying the integral condition that the rearranged density over [0,p] equals 1-p.

If this is right

  • For exponential distributions of size N between 100 and 100000, p is predicted to lie between 0.15 and 0.26.
  • For normal distributions of the same sizes, p is predicted to lie between 0.20 and 0.29.
  • Both ranges lie strictly below the saturation value k approximately 0.865 conjectured for infinite samples.
  • The structural appearance of such imbalances in standard distributions implies that Pareto-type imbalances arise without requiring special generative mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same truncation-plus-rearrangement mechanism could be applied to log-normal or other commonly observed families to check whether they also produce p near 0.2 at realistic N.
  • If the finite-sample effect dominates, many empirical 20/80 observations in social or economic data may be explained by ordinary sampling from standard distributions rather than by domain-specific processes.
  • The framework supplies a quantitative way to test whether a given dataset's imbalance is typical or atypical for its size and distribution family.

Load-bearing premise

The estimates of the truncation parameter as a function of sample size N are accurate enough to be combined with the closed-form expressions for p.

What would settle it

Measuring p directly on large numbers of synthetic datasets of size N=1000 drawn from truncated exponential or normal distributions and finding that the observed values fall consistently outside the predicted intervals [0.15,0.26] or [0.20,0.29] would falsify the finite-sample concentration claim.

Figures

Figures reproduced from arXiv: 2602.11131 by Antti Hippel\"ainen.

Figure 1
Figure 1. Figure 1: Step function densities and their related cumulative distributions with [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Minimal inequality index or ratio of gain densities a distribution must have to satisfy a [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An L 1 -integrable divergent density with its decreasing rearrangement, and their related cumulative gain functions. As must be, for all t ∈ [0, 1], L ∗ (t) ≥ L(t). Remembering the requirement of no padding by zeros, define a distribution of periodic diminishing returns with essential support in [0, 1]. To make finding the rearrangement 10 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A periodic gain density with its decreasing rearrangement, and their related cumulative [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Polynomial densities with α = 1 4 , 1, 3 and 10, and their related cumulative gain functions. A natural question arises: can we always find an α ∈ [0, ∞) such that any given general￾ized principle is satisfied? That is, can we always find an α such that p(1 − p α + α) α = 1 − p , (16) for any p ∈ (0, 1/2]? On one hand, at the limit α → ∞ we obtain the uniform distribution, for which p → 1/2. On the other h… view at source ↗
Figure 6
Figure 6. Figure 6: Power-law densities with ratio and scale combinations [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The maximal value of p with which the generalized Pareto principle can be satisfied by varying α with a fixed r. The canonical 0.2/0.8-principle becomes impossible with a pure power-law after r ≈ 3100. 3.2.2. Exponential distribution Any process where decay has the same probability of occurring at any moment in time follows an exponential law. For a given rate λ > 0, the truncated exponential distribution … view at source ↗
Figure 8
Figure 8. Figure 8: Exponential densities with Λ = 0.1, 1, 4 and 10, and their related cumulative gain func￾tions. The singularity at Λ = 0 is removable, and for all Λ > 0, h is continuous. The limits are lim Λ→0 h(Λ; p) = −1 + 2p ≤ 0 , lim Λ→∞ h(Λ; p) = p > 0 , (37) and by IVT, there exists at least one value of Λ with which any generalized principle can be satisfied. 3.2.3. Normal distribution Finally, the normal distributi… view at source ↗
Figure 9
Figure 9. Figure 9: Normal densities with Σ = 1, 3, 5 and 10, and their related cumulative gain functions. Analogous to previous cases, set h(Σ; p) = erf(Σp) erf(Σ) − 1 + p . (41) The singularity at Σ = 0 is removable, and h is continuous for all Σ > 0. Studying the limits, we again find lim Σ→0 h(Σ; p) = −1 + 2p ≤ 0 , lim Σ→∞ h(Σ; p) = p > 0 , (42) and by IVT, there exists at least one value of Σ with which any generalized p… view at source ↗
Figure 10
Figure 10. Figure 10: The generalized Pareto principles satisfied by truncated power-laws on common param [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
read the original abstract

We formalize a generalized form of the Pareto principle - ``fraction $p$ of inputs yields fraction $1-p$ of outputs'' - as a property of non-negative gain densities $\ell \in L^1([0,1])$, working with the decreasing rearrangement to obtain a unique characterization. For probability distributions, the resulting $p$ coincides with $1 - k_F$, where $k_F$ is the Kolkata index of the corresponding Lorenz curve. Within this framework we analyze both constructed gain densities and commonly encountered distribution families. We derive closed-form expressions for $p$ for truncated power-law, exponential, and normal distribution families. Combining these with estimates of the truncation parameter as a function of sample size $N$, we predict that datasets of size $N \in [10^2, 10^5]$ from exponential and normal families concentrate $p$ near $[0.15, 0.26]$ and $[0.20, 0.29]$ - values close to the canonical 0.2/0.8-rule, and strictly below the saturation $k \approx 0.865$ conjectured earlier by Ghosh and Chakrabarti. We discuss the implications of the structural ubiquity of Pareto-type imbalances for their use as prescriptive targets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper formalizes the generalized Pareto principle ('fraction p of inputs yields fraction 1-p of outputs') as a property of non-negative gain densities ℓ in L1([0,1]) via decreasing rearrangement, yielding a unique characterization. For probability distributions this p equals 1 - k_F, the complement of the Kolkata index of the Lorenz curve. Closed-form expressions for p are derived for truncated power-law, exponential, and normal families. These are combined with estimates of the truncation parameter as a function of sample size N to predict that datasets with N ∈ [10², 10⁵] from exponential and normal families concentrate p near [0.15, 0.26] and [0.20, 0.29] respectively—values close to the canonical 20/80 rule and below the saturation k ≈ 0.865 conjectured by Ghosh and Chakrabarti. Implications for prescriptive use of the principle are discussed.

Significance. If the finite-N predictions hold after proper justification of the truncation estimates, the manuscript supplies a structural, distribution-family-independent explanation for the frequent appearance of Pareto-type imbalances, thereby accounting for the typicality of the 20/80 rule in data drawn from common continuous distributions. The closed-form derivations for the truncated families constitute a clear technical contribution that could be reused in other contexts.

major comments (1)
  1. [Abstract and finite-sample prediction section] Abstract and the finite-sample prediction section: the headline numerical claims—that p concentrates in [0.15,0.26] for exponential and [0.20,0.29] for normal families when N ∈ [10²,10⁵]—are obtained only after substituting the closed-form expressions with separate estimates of the truncation cutoff as a function of N. No derivation, simulation protocol, error analysis, or validation of these N-dependent estimates appears in the manuscript, so the reported intervals cannot be reproduced or stress-tested.
minor comments (2)
  1. [Section 2] The definition of the decreasing rearrangement and its application to the gain density ℓ should be stated explicitly with an equation number in the main text rather than left implicit.
  2. [Section 4] Notation for the truncation parameter (e.g., its symbol and dependence on N) is introduced only in the abstract and should be defined consistently in the body before the finite-N predictions are presented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We appreciate the positive assessment of the formalization, the closed-form derivations, and the potential significance of the finite-N predictions. We address the single major comment below and will revise the manuscript to incorporate the requested justification.

read point-by-point responses
  1. Referee: [Abstract and finite-sample prediction section] Abstract and the finite-sample prediction section: the headline numerical claims—that p concentrates in [0.15,0.26] for exponential and [0.20,0.29] for normal families when N ∈ [10²,10⁵]—are obtained only after substituting the closed-form expressions with separate estimates of the truncation cutoff as a function of N. No derivation, simulation protocol, error analysis, or validation of these N-dependent estimates appears in the manuscript, so the reported intervals cannot be reproduced or stress-tested.

    Authors: We agree that the truncation estimates require explicit derivation, a simulation protocol, and validation to support reproducibility of the headline intervals. In the revised manuscript we will add a dedicated subsection to the finite-sample prediction section that (i) derives the N-dependent truncation cutoff from the expected value of the sample maximum for the exponential and normal families using standard order-statistic results, (ii) specifies the Monte Carlo protocol (10,000 replications per N) used to obtain the estimates, and (iii) supplies error bounds and concentration diagnostics confirming that p remains inside the stated intervals for N ∈ [10², 10⁵]. These additions will make the numerical claims fully reproducible and stress-testable while preserving the original closed-form expressions for p. revision: yes

Circularity Check

1 steps flagged

Finite-N predictions combine closed forms with auxiliary estimates of truncation parameter

specific steps
  1. fitted input called prediction [Abstract]
    "Combining these with estimates of the truncation parameter as a function of sample size N, we predict that datasets of size N ∈ [10^2, 10^5] from exponential and normal families concentrate p near [0.15, 0.26] and [0.20, 0.29]"

    The paper derives closed-form expressions for p from the generalized Pareto principle, then obtains the headline finite-sample intervals only after inserting externally estimated values of the truncation cutoff (as a function of N). Because those estimates are auxiliary inputs rather than outputs of the formalization, the reported 'predictions' reduce to the closed forms evaluated at fitted truncation values; the numerical concentration near the 20/80 rule is therefore conditioned on the estimates rather than emerging solely from the rearrangement characterization.

full rationale

The core formalization of the generalized Pareto principle via decreasing rearrangement and the closed-form derivations for p in truncated families are self-contained and independent. The load-bearing numerical claims for finite N, however, are produced only by substituting those closed forms with separately estimated truncation parameters as a function of N. This matches the fitted-input-called-prediction pattern at moderate strength because the interval predictions [0.15,0.26] and [0.20,0.29] are statistically forced once the N-dependent estimates are supplied, yet the estimates themselves are not derived from the formalization. No self-citation chain or self-definitional loop is present, so the overall circularity remains limited and the central result retains independent content.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard rearrangement theory for L1 functions and the definition that p coincides with 1 - k_F; the finite-sample predictions additionally rest on an externally estimated truncation parameter whose functional form is not derived inside the paper.

free parameters (1)
  • truncation parameter
    Estimated as a function of sample size N and combined with closed-form expressions to obtain the reported concentration intervals for p.
axioms (2)
  • standard math Decreasing rearrangement yields a unique characterization of the generalized Pareto property for non-negative gain densities in L1([0,1])
    Invoked to obtain the unique p for any such density.
  • domain assumption For probability distributions, p coincides with 1 - k_F where k_F is the Kolkata index of the Lorenz curve
    Stated as part of the framework linking the new definition to existing inequality measures.

pith-pipeline@v0.9.0 · 5532 in / 1528 out tokens · 35963 ms · 2026-05-16T05:36:52.660618+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 3 internal anchors

  1. [1]

    20/80–rule

    INTRODUCTION The so-called Pareto principle or “20/80–rule” is among the most widely quoted heuristics in economics, management, and cognitive science. It states that 20% of causes result in 80% of effects. Originally formulated by Vilfredo Pareto [1], it was an empirical observation about the distribution of wealth, later generalized to domains as divers...

  2. [2]

    fractionpof inputs yields fraction1−pof outputs

    FORMALIZATION We model bounded cumulative processes with a non-negative gain density ℓ:I t →[0,∞), Z It dt ℓ(t) = 1,(1) whereI t = [t min, tmax]is a compact (closed and bounded) interval. Since total gains are assumed to be finite, we requireℓ∈L 1(It), and thatℓis normalized to unity. In fact, if either the domain of input or the total output were not fin...

  3. [3]

    inequality

    EXAMPLE DISTRIBUTIONS AND EXISTENCE OF GENERALIZED PRIN- CIPLES We now examine gain density examples to illustrate how the generalized principle emerges in diverse functional forms. These cases demonstrate, in addition to the framework, the lim- 1 In fact, historically, Pareto himself was aware of the possibility of a probabilistic interpretation, but dec...

  4. [4]

    In such a simple case, the decreasing rearrangement is achieved by shifting the right-hand side of the distribution to start from zero and by sendingt→t/2, so together,t→ t 2 + 1

  5. [5]

    This can be thought of as the continuous version of doubling the length of every bin. Note that shifting the divergence to zero and re-normalizing is not in general equivalent to the decreasing rearrangement; as the rearrangement is done correctly, the density profile stays normalized. The rearranged distribution is ℓ∗(t) =ℓ t 2 + 1 2 = 1 2 √ t .(12) The ...

  6. [6]

    cut off its tail

    COMMON GENERALIZED PRINCIPLES AND SOCIAL DISCUSSION With the generalized Pareto principle formalized and explicit density functions analyzed, we now turn to two questions. First, given common distributions and realistic parameter ranges, whichp/(1−p)-principles should one expect to observe in practice? Second, if such asymmetries are in large part structu...

  7. [7]

    Eureka moments

    CONCLUSIONS We have formalized and studied the existence of the (generalized) Pareto principle or the p/(1−p)-principle. The principle was formalized in Section 2 with the decreasing rearrange- ment of a density functionℓ(t)describing the density of gains on the unit interval. This allowed us to define the satisfied generalized principle unambiguously as ...

  8. [8]

    padding by zeros

    Padding by zeros Another limitation we must impose is on the length of intervals whereℓ(t) = 0, that is, on the support ofℓ(t). A continuous family of generalized principles is trivial to satisfy if one allows such “padding by zeros”. The possibility of padding by zeros does seem natural, since there can for example be periods of various lengths when no g...

  9. [9]

    negative learning

    Negative gains In principle, one could consider the possibility of negative gains as well. The question seems to be mostly semantic; take as an example the case of learning: is forgetting something you have learned “negative learning", or is “forgetting” a separate process from learning? Conversely, is obtaining such forgotten information “learning again”...

  10. [10]

    Define the function evaluating the total mass inside an interval of lengthpwith M(s) = Z s+p s dt ℓ ∗(t),(B7) withs∈[0,1−p].Mis continuous and M(0) =p 1 , M(1−p) =p 2

    Sinceℓ ∗ is decreasing and positive, Z p 0 dt ℓ ∗(t) =p 1 ≥1−p ∗ , Z 1 1−p dt ℓ ∗(t) =p 2 ≤p ∗ .(B5) Hence, we find that integrating over an interval of lengthpwill result in a total mass p1 ≥1−p ∗ > p ∗ ≥p 2 .(B6) Sincep > p ∗, we would like to find an interval with mass1−psuch that1−p∗ >1−p > p ∗. Define the function evaluating the total mass inside an ...

  11. [11]

    Pareto, Cours d’Économie Politique, The Economic Journal7, 91 (1897)

    V. Pareto, Cours d’Économie Politique, The Economic Journal7, 91 (1897)

  12. [12]

    Nielsen, The 90-9-1 Rule for Participation Inequality in Social Media and Online Commu- nities (2006), accessed 2026-01-25

    J. Nielsen, The 90-9-1 Rule for Participation Inequality in Social Media and Online Commu- nities (2006), accessed 2026-01-25

  13. [13]

    G.Zipf,HumanBehaviorandthePrincipleofLeastEffort: AnIntroductiontoHumanEcology, Social Forces28, 340 (1950)

  14. [14]

    Lotka, The frequency distribution of scientific productivity, Journal of the Washington Academy of Sciences16, 317 (1926)

    A. Lotka, The frequency distribution of scientific productivity, Journal of the Washington Academy of Sciences16, 317 (1926)

  15. [15]

    Merton, The Matthew Effect in Science, Science159, 56 (1968)

    R. Merton, The Matthew Effect in Science, Science159, 56 (1968)

  16. [16]

    Newman, Power laws, Pareto distributions and Zipf’s law, Contemporary Physics46, 323 (2005), cond-mat/0412004

    M. Newman, Power laws, Pareto distributions and Zipf’s law, Contemporary Physics46, 323 (2005), cond-mat/0412004

  17. [17]

    Simon, A behavioral model of rational choice, The Quarterly Journal of Economics69, 99 (1955)

    H. Simon, A behavioral model of rational choice, The Quarterly Journal of Economics69, 99 (1955)

  18. [18]

    Tusset, Pareto and probability distributions, International Review of Economics71, 521 (2024)

    G. Tusset, Pareto and probability distributions, International Review of Economics71, 521 (2024)

  19. [19]

    Hardy, Pareto’s Law, Math

    M. Hardy, Pareto’s Law, Math. Intell.32, 38 (2010)

  20. [20]

    Power-law distributions in empirical data

    A. Clauset, C. Shalizi, and M. Newman, Power-Law Distributions in Empirical Data, SIAM Review51, 661 (2009), 0706.1062

  21. [21]

    Arnold, N

    B. Arnold, N. Balakrishnan, and H. Nagaraja,A First Course in Order Statistics (Classics in Applied Mathematics)(Society for Industrial and Applied Mathematics, 2008)

  22. [22]

    Limpert, W

    E. Limpert, W. Stahel, and M. Abbt, Log-Normal Distributions Across the Sciences: Keys and Clues, BioScience51, 341 (2001)

  23. [23]

    Emergence of scaling in random networks

    A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science286, 509 (1999), cond-mat/9910332

  24. [24]

    A.Ghosh, N.Chattopadhyay,andB.Chakrabarti,Inequalityinsocieties, academicinstitutions and science journals: Gini and k-indices, Physica A: Statistical Mechanics and its Applications 410, 30 (2014), 1401.6951. 32

  25. [25]

    Banerjee, B

    S. Banerjee, B. Chakrabarti, M. Mitra, and S. Mutuswami, Inequality Measures: The Kolkata Index in Comparison With Other Measures, Frontiers in Physics8, 10.3389/fphy.2020.562182 (2020), 2005.08762

  26. [26]

    Halmos,Measure Theory(Springer, 1974)

    P. Halmos,Measure Theory(Springer, 1974)

  27. [27]

    Kechris,Classical Descriptive Set Theory(Springer New York, 1995)

    A. Kechris,Classical Descriptive Set Theory(Springer New York, 1995)

  28. [28]

    Lieb and M

    E. Lieb and M. Loss,Analysis, Graduate Studies in Mathematics, Vol. 14 (American Mathe- matical Society, 2001). 33