Common TF-IDF variants arise as key components in the test statistic of a penalized likelihood-ratio test for word burstiness

Aitazaz A. Farooque; Michael McIsaac; Paul Sheridan; Zeyad Ahmed

arxiv: 2604.00672 · v2 · submitted 2026-04-01 · 💻 cs.CL · cs.IR· math.ST· stat.TH

Common TF-IDF variants arise as key components in the test statistic of a penalized likelihood-ratio test for word burstiness

Zeyad Ahmed , Paul Sheridan , Michael McIsaac , Aitazaz A. Farooque This is my paper

Pith reviewed 2026-05-13 23:08 UTC · model grok-4.3

classification 💻 cs.CL cs.IRmath.STstat.TH

keywords TF-IDFword burstinesslikelihood ratio testbeta-binomialterm weightingover-dispersiondocument classification

0 comments

The pith

TF-IDF scores emerge as components of a penalized likelihood-ratio test statistic for detecting word burstiness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that TF-IDF-like scores arise naturally from the test statistic of a penalized likelihood-ratio test for word burstiness. Documents are modeled with beta-binomial distributions under the alternative to capture over-dispersion, using a gamma penalty on the precision parameter. The null hypothesis uses binomial distributions that do not account for burstiness. The resulting scheme performs comparably to TF-IDF in classification tasks, offering a statistical basis for the classical formula.

Core claim

TF-IDF-like scores arise naturally from the test statistic of a penalized likelihood-ratio test where the alternative hypothesis models a collection of documents with beta-binomial distributions and a gamma penalty on the precision parameter to capture word burstiness, while the null hypothesis assumes binomial distributions.

What carries the argument

The penalized likelihood-ratio test statistic for word burstiness based on beta-binomial models with gamma penalty.

Load-bearing premise

The beta-binomial family with gamma penalty on precision is an appropriate model for word burstiness.

What would settle it

Observing that the derived weighting scheme fails to match TF-IDF performance on standard classification benchmarks or that altering the model family removes the TF-IDF components from the statistic would falsify the central connection.

Figures

Figures reproduced from arXiv: 2604.00672 by Aitazaz A. Farooque, Michael McIsaac, Paul Sheridan, Zeyad Ahmed.

**Figure 2.** Figure 2: FIG 2 [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: shows the relationship between the PLR test statistic λi and the total TF–IDF of each term ti defined as ∑d j=1 TF–IDF(i, j). Each point corresponds to a term in the vocabulary, with point diameter proportional to the term’s corresponding αi . The scatter plot shows a positive correlation where terms with higher TF–IDF weights also tend to have larger λi values. In this simulation, the correlation coeffic… view at source ↗

**Figure 4.** Figure 4: shows the empirical distributions of the fitted parameters ai and a¬i on a natural logarithmic scale. As illustrative examples, the bursty and semantically specific term “ambulance” was assigned (αi , α¬i) = (0.0021, 128.30), while “baby” yielded (0.0041, 83.48). In contrast, high-frequency function words exhibit substantially larger target parameters; for instance, “the” was assigned (5.22, 93.38) and “f… view at source ↗

**Figure 5.** Figure 5: FIG 5 [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

read the original abstract

TF-IDF is a classical formula that is widely used for identifying important terms within documents. We show that TF-IDF-like scores arise naturally from the test statistic of a penalized likelihood-ratio test setup capturing word burstiness (also known as word over-dispersion). In our framework, the alternative hypothesis captures word burstiness by modeling a collection of documents according to a family of beta-binomial distributions with a gamma penalty term on the precision parameter. In contrast, the null hypothesis assumes that words are binomially distributed in collection documents, a modeling approach that fails to account for word burstiness. We find that a term-weighting scheme given rise to by this test statistic performs comparably to TF-IDF on document classification tasks. This paper provides insights into TF-IDF from a statistical perspective and underscores the potential of hypothesis testing frameworks for advancing term-weighting scheme development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TF-IDF variants come out of the test statistic in a penalized beta-binomial LR test for burstiness, but the gamma penalty looks picked to match the target formula.

read the letter

The main thing to know is that the paper derives common TF-IDF components directly from the test statistic of a likelihood-ratio test. The alternative hypothesis models documents with beta-binomial distributions and adds a gamma penalty on the precision parameter to capture burstiness, while the null is plain binomial. The resulting weights perform about as well as TF-IDF on document classification tasks. That is the core claim and the practical check they report. The connection itself is new; I have not seen this exact penalized setup tied to TF-IDF before. It gives a hypothesis-testing story for why the log and IDF terms appear, which is cleaner than pure heuristic justification. The comparable classification results add a bit of evidence that the derivation is not just algebraic sleight of hand. The soft spot is the gamma penalty. The stress-test note is right that swapping it for another overdispersion penalty would generally break the exact recovery of TF-IDF form. The abstract presents the emergence as natural, but without seeing the full derivation or any sensitivity checks on the penalty choice, it is hard to tell whether the model was motivated first or tuned to hit the known formula. The binomial null is obviously too restrictive for burstiness, yet that does not automatically validate the specific alternative. The paper is aimed at people who want statistical grounding for term weighting in NLP and information retrieval. A reader working on model-based feature design would get something useful from the perspective, even if the numbers are not dramatically better. It deserves peer review so referees can inspect the exact steps, the parameter selection, and whether the gamma term has motivation beyond recovering TF-IDF. The idea is worth the time even if it needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that TF-IDF-like scores arise naturally as components of the test statistic from a penalized likelihood-ratio test for word burstiness. Documents are modeled under the alternative hypothesis as beta-binomial distributions with a gamma penalty on the precision parameter, while the null assumes binomial distributions that fail to capture over-dispersion; the resulting weighting scheme performs comparably to TF-IDF on document classification tasks.

Significance. If the derivation holds without the gamma penalty being reverse-engineered to match TF-IDF, the work supplies a statistical interpretation of a widely used heuristic and illustrates how hypothesis-testing frameworks can generate term-weighting schemes. The reported classification performance suggests the approach is practically viable, though stronger controls would be needed to establish it as a competitive alternative.

major comments (2)

[§3] §3 (derivation of the test statistic): the gamma penalty on the beta-binomial precision parameter is introduced to capture burstiness, yet the manuscript provides no independent empirical or theoretical motivation for this specific functional form over alternatives such as inverse-gamma or empirical-Bayes precision estimates; without such justification the emergence of the IDF log term appears constructed rather than natural.
[§4.2] §4.2 (classification experiments): the claim that the derived scheme 'performs comparably' to TF-IDF lacks error bars, cross-validation details, or controls for post-hoc hyperparameter choices; the reported results therefore do not yet support the robustness of the performance equivalence.

minor comments (2)

The exact parameterization of the beta-binomial family and the functional form of the gamma penalty should be stated explicitly (including any free parameters) to allow direct reproduction of the test statistic.
Notation for the penalized likelihood ratio should be introduced once and used consistently; several equations reuse symbols without redefinition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us improve the clarity and rigor of the manuscript. We address each major point below.

read point-by-point responses

Referee: [§3] §3 (derivation of the test statistic): the gamma penalty on the beta-binomial precision parameter is introduced to capture burstiness, yet the manuscript provides no independent empirical or theoretical motivation for this specific functional form over alternatives such as inverse-gamma or empirical-Bayes precision estimates; without such justification the emergence of the IDF log term appears constructed rather than natural.

Authors: The gamma penalty is chosen because it preserves conjugacy with the beta-binomial likelihood, yielding a closed-form penalized likelihood-ratio statistic whose log term emerges directly as the IDF component without additional tuning parameters. We acknowledge that the original submission did not sufficiently articulate this modeling rationale or compare it to alternatives. In the revision we will add a short subsection explaining the conjugacy motivation and noting that other penalties (e.g., inverse-gamma) do not produce an equally simple closed-form IDF-like term. revision: partial
Referee: [§4.2] §4.2 (classification experiments): the claim that the derived scheme 'performs comparably' to TF-IDF lacks error bars, cross-validation details, or controls for post-hoc hyperparameter choices; the reported results therefore do not yet support the robustness of the performance equivalence.

Authors: We agree that the experimental reporting is insufficient. The revised manuscript will include standard-error bars computed over repeated random splits, explicit k-fold cross-validation details, and a description of the hyperparameter selection protocol (including any grid search or default settings) to allow readers to assess the stability of the observed performance parity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation follows from explicit modeling choices

full rationale

The paper sets up a penalized LR test with beta-binomial alternative and gamma penalty on precision, then derives that the resulting test statistic yields TF-IDF-like weights. This is a direct algebraic consequence of the chosen family and penalty form, not a post-hoc fit or self-referential definition. The modeling assumptions (beta-binomial for burstiness, gamma penalty) are stated upfront as the framework; the TF-IDF emergence is shown to follow rather than being used to select the penalty. No self-citation load-bearing step, no renaming of known results, and no evidence that parameters were tuned on the target TF-IDF formula. The derivation is self-contained given the stated model.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling choice that beta-binomial distributions plus a gamma penalty capture word burstiness; this is a domain assumption rather than a derived result. No new entities are postulated.

free parameters (1)

gamma penalty parameters
The gamma distribution parameters that penalize the precision of the beta-binomial are introduced to regularize burstiness modeling and may be set or fitted.

axioms (1)

domain assumption Word counts across documents are adequately modeled by beta-binomial distributions under the alternative hypothesis for burstiness
This replaces the binomial null to account for over-dispersion.

pith-pipeline@v0.9.0 · 5470 in / 1402 out tokens · 66487 ms · 2026-05-13T23:08:55.411472+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

AIZAWA, A. (2003). An Information-theoretic Perspective of tf-idf Measures. Information Processing and Management 39 45–65

work page 2003
[2]

and ALGARNI, A

ALSHEHRI, A. and ALGARNI, A. (2023). TF-TDA: A novel supervised term weighting scheme for sentiment analysis. Electronics 12 1632

work page 2023
[3]

and V AN RIJSBERGEN, C

AMATI, G. and V AN RIJSBERGEN, C. J. (2002). Probabilistic models of information retrieval based on measur- ing the divergence from randomness. ACM Transactions on Information Systems 20 357–389

work page 2002
[4]

and KULINSKAYA, E

BAKBERGENULY, I. and KULINSKAYA, E. (2017). Beta-binomial model for meta-analysis of odds ratios. Statis- tics in Medicine 36 1715–1734. 23

work page 2017
[5]

CARDOSO-CACHOPO, A. (2007). Improving Methods for Single-label Text Categorization. PhD Thesis, Insti- tuto Superior Tecnico, Universidade Tecnica de Lisboa

work page 2007
[6]

and ZHANG, H

CHEN, K., ZHANG, Z., LONG, J. and ZHANG, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications 66 245–260

work page 2016
[7]

CUMMINS, R. (2017). Modelling word burstiness in natural language: A generalised Pólya process for docu- ment language models in information retrieval

work page 2017
[8]

CUMMINS, R., PAIK, J. H. and LV, Y. (2015). A Pólya urn document language model for improved information retrieval. ACM Transactions on Information Systems 33

work page 2015
[9]

ELKAN, C. (2005). Deriving tf-idf as a Fisher kernel. In String Processing and Information Retrieval: 12th International Conference, SPIRE 2005, Buenos Aires, Argentina, November 2-4, 2005. Proceedings 12

work page 2005
[10]

ELKAN, C. (2006). Clustering documents with an exponential-family approximation of the Dirichlet com- pound multinomial distribution. In Proceedings of the 23rd International Conference on Machine Learning 289–296

work page 2006
[11]

and TÉLLEZ, E

GRAFF, M., MOCTEZUMA, D. and TÉLLEZ, E. S. (2025). Bag-of-Word approach is not dead: A performance analysis on a myriad of text classification challenges. Natural Language Processing Journal 100154

work page 2025
[12]

HARRISON, X. A. (2015). A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution. PeerJ 3 e1114

work page 2015
[13]

and KREINOVICH, V

HAVRLANT, L. and KREINOVICH, V. (2017). A simple probabilistic explanation of term frequency-inverse doc- ument frequency (tf-idf) heuristic (and variations motivated by this explanation). International Journal of General Systems 46 27–36

work page 2017
[14]

HIEMSTRA, D. (2000). A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3 131–139

work page 2000
[15]

and CALLISON-BURCH, C

IRVINE, A. and CALLISON-BURCH, C. (2017). A Comprehensive Analysis of Bilingual Lexicon Induction. Computational Linguistics 43 273–310

work page 2017
[16]

and PEDRYCZ, W

ISLAM, S., ELMEKKI, H., ELSEBAI, A., BENTAHAR, J., DRAWEL, N., RJOUB, G. and PEDRYCZ, W. (2023). A comprehensive survey on applications of transformers for deep learning tasks. Expert Systems with Applications 241 122666

work page 2023
[17]

JOACHIMS, T. (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning . ICML ’97 143–151. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1997
[18]

KO, Y. (2015). A new term-weighting scheme for text classification using the odds of positive and negative class probabilities. Journal of the Association for Information Science and Technology 66 2553–2565

work page 2015
[19]

KWOK, K. (1990). Experiments with a component theory of probabilistic information retrieval based on single terms as document components. ACM Transactions on Information Systems (TOIS) 8 363–386

work page 1990
[20]

LANG, K. (1995). NewsWeeder: learning to filter netnews. In Proceedings of the Twelfth International Con- ference on International Conference on Machine Learning . ICML’95 331–339. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1995
[21]

LEWIS, D. (1987). Reuters-21578 Text Categorization Collection. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C52G6M

work page doi:10.24432/c52g6m 1987
[22]

E., KAUCHAK, D

MADSEN, R. E., KAUCHAK, D. and ELKAN, C. (2005). Modeling word burstiness using the Dirichlet distribu- tion. In Proceedings of the 22nd International Conference on Machine Learning 545–552

work page 2005
[23]

MINKA, T. (2000). Estimating a Dirichlet distribution

work page 2000
[24]

https://dlmf.nist.gov/ , Release 1.2.4 of 2025-03-15

NIST NIST Digital Library of Mathematical Functions . https://dlmf.nist.gov/ , Release 1.2.4 of 2025-03-15. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V . Saunders, H. S. Cohl, and M. A. McClain, eds

work page 2025
[25]

OKKALIOGLU, M. (2023). TF-IGM revisited: Imbalance text classification with relative imbalance ratio. Ex- pert Systems with Applications 217 119578

work page 2023
[26]

PANJER, H. H. and WILLMOT, G. E. (1992). Insurance risk models. Society of Acturaries, 475 North Martingale Road, Suite 8000, Schaumberg, Illinois 60173-2226, USA

work page 1992
[27]

ROBERTSON, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. Jour- nal of Documentation 60 503–520

work page 2004
[28]

ROBERTSON, S. (2005). On event spaces and probabilistic models in information retrieval. Information Re- trieval 8 319–329

work page 2005
[29]

ROELLEKE, T. (2013). Information Retrieval Models: Foundations and Relationships . Synthesis lectures on information concepts, retrieval, and services . Morgan & Claypool Publishers, San Rafael, USA

work page 2013
[30]

and BUCKLEY, C

SALTON, G. and BUCKLEY, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 513–523

work page 1988
[31]

and YANG, C

SALTON, G. and YANG, C. S. (1973). On the specification of term values in automatic indexing. Journal of Documentation 29 351-372. 24

work page 1973
[32]

and FAROOQUE, A

SHERIDAN, P., AHMED, Z. and FAROOQUE, A. A. (2026). A Fisher’s exact test justification of the TF-IDF term- weighting scheme. The American Statistician 80 146–156

work page 2026
[33]

and ONSJÖ, M

SHERIDAN, P. and ONSJÖ, M. (2024). The hypergeometric test performs comparably to TF-IDF on standard text analysis tasks. Multimedia Tools and Applications 83 28875-28890

work page 2024
[34]

SPÄRCK JONES, K. (2004). IDF term weighting and IR research lessons. Journal of Documentation 60 521– 523

work page 2004
[35]

SUNEHAG, P. (2007). Using two-stage conditional word frequency models to model word burstiness and mo- tivating TF-IDF. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, 2007 8–16

work page 2007
[36]

TANG, Z. (2024). A generic multi-level framework for building term-weighting schemes in text classification. The Computer Journal 67 3042–3055. APPENDIX A: TECHNICAL DETAILS A.1. First-order approximation of the gamma function. We derive the approximation Γ(x + a) = Γ(x) + O(a) for small values of a, using a Taylor expansion of the logarithm of the gamma ...

work page 2024

[1] [1]

AIZAWA, A. (2003). An Information-theoretic Perspective of tf-idf Measures. Information Processing and Management 39 45–65

work page 2003

[2] [2]

and ALGARNI, A

ALSHEHRI, A. and ALGARNI, A. (2023). TF-TDA: A novel supervised term weighting scheme for sentiment analysis. Electronics 12 1632

work page 2023

[3] [3]

and V AN RIJSBERGEN, C

AMATI, G. and V AN RIJSBERGEN, C. J. (2002). Probabilistic models of information retrieval based on measur- ing the divergence from randomness. ACM Transactions on Information Systems 20 357–389

work page 2002

[4] [4]

and KULINSKAYA, E

BAKBERGENULY, I. and KULINSKAYA, E. (2017). Beta-binomial model for meta-analysis of odds ratios. Statis- tics in Medicine 36 1715–1734. 23

work page 2017

[5] [5]

CARDOSO-CACHOPO, A. (2007). Improving Methods for Single-label Text Categorization. PhD Thesis, Insti- tuto Superior Tecnico, Universidade Tecnica de Lisboa

work page 2007

[6] [6]

and ZHANG, H

CHEN, K., ZHANG, Z., LONG, J. and ZHANG, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications 66 245–260

work page 2016

[7] [7]

CUMMINS, R. (2017). Modelling word burstiness in natural language: A generalised Pólya process for docu- ment language models in information retrieval

work page 2017

[8] [8]

CUMMINS, R., PAIK, J. H. and LV, Y. (2015). A Pólya urn document language model for improved information retrieval. ACM Transactions on Information Systems 33

work page 2015

[9] [9]

ELKAN, C. (2005). Deriving tf-idf as a Fisher kernel. In String Processing and Information Retrieval: 12th International Conference, SPIRE 2005, Buenos Aires, Argentina, November 2-4, 2005. Proceedings 12

work page 2005

[10] [10]

ELKAN, C. (2006). Clustering documents with an exponential-family approximation of the Dirichlet com- pound multinomial distribution. In Proceedings of the 23rd International Conference on Machine Learning 289–296

work page 2006

[11] [11]

and TÉLLEZ, E

GRAFF, M., MOCTEZUMA, D. and TÉLLEZ, E. S. (2025). Bag-of-Word approach is not dead: A performance analysis on a myriad of text classification challenges. Natural Language Processing Journal 100154

work page 2025

[12] [12]

HARRISON, X. A. (2015). A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution. PeerJ 3 e1114

work page 2015

[13] [13]

and KREINOVICH, V

HAVRLANT, L. and KREINOVICH, V. (2017). A simple probabilistic explanation of term frequency-inverse doc- ument frequency (tf-idf) heuristic (and variations motivated by this explanation). International Journal of General Systems 46 27–36

work page 2017

[14] [14]

HIEMSTRA, D. (2000). A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3 131–139

work page 2000

[15] [15]

and CALLISON-BURCH, C

IRVINE, A. and CALLISON-BURCH, C. (2017). A Comprehensive Analysis of Bilingual Lexicon Induction. Computational Linguistics 43 273–310

work page 2017

[16] [16]

and PEDRYCZ, W

ISLAM, S., ELMEKKI, H., ELSEBAI, A., BENTAHAR, J., DRAWEL, N., RJOUB, G. and PEDRYCZ, W. (2023). A comprehensive survey on applications of transformers for deep learning tasks. Expert Systems with Applications 241 122666

work page 2023

[17] [17]

JOACHIMS, T. (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning . ICML ’97 143–151. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1997

[18] [18]

KO, Y. (2015). A new term-weighting scheme for text classification using the odds of positive and negative class probabilities. Journal of the Association for Information Science and Technology 66 2553–2565

work page 2015

[19] [19]

KWOK, K. (1990). Experiments with a component theory of probabilistic information retrieval based on single terms as document components. ACM Transactions on Information Systems (TOIS) 8 363–386

work page 1990

[20] [20]

LANG, K. (1995). NewsWeeder: learning to filter netnews. In Proceedings of the Twelfth International Con- ference on International Conference on Machine Learning . ICML’95 331–339. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1995

[21] [21]

LEWIS, D. (1987). Reuters-21578 Text Categorization Collection. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C52G6M

work page doi:10.24432/c52g6m 1987

[22] [22]

E., KAUCHAK, D

MADSEN, R. E., KAUCHAK, D. and ELKAN, C. (2005). Modeling word burstiness using the Dirichlet distribu- tion. In Proceedings of the 22nd International Conference on Machine Learning 545–552

work page 2005

[23] [23]

MINKA, T. (2000). Estimating a Dirichlet distribution

work page 2000

[24] [24]

https://dlmf.nist.gov/ , Release 1.2.4 of 2025-03-15

NIST NIST Digital Library of Mathematical Functions . https://dlmf.nist.gov/ , Release 1.2.4 of 2025-03-15. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V . Saunders, H. S. Cohl, and M. A. McClain, eds

work page 2025

[25] [25]

OKKALIOGLU, M. (2023). TF-IGM revisited: Imbalance text classification with relative imbalance ratio. Ex- pert Systems with Applications 217 119578

work page 2023

[26] [26]

PANJER, H. H. and WILLMOT, G. E. (1992). Insurance risk models. Society of Acturaries, 475 North Martingale Road, Suite 8000, Schaumberg, Illinois 60173-2226, USA

work page 1992

[27] [27]

ROBERTSON, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. Jour- nal of Documentation 60 503–520

work page 2004

[28] [28]

ROBERTSON, S. (2005). On event spaces and probabilistic models in information retrieval. Information Re- trieval 8 319–329

work page 2005

[29] [29]

ROELLEKE, T. (2013). Information Retrieval Models: Foundations and Relationships . Synthesis lectures on information concepts, retrieval, and services . Morgan & Claypool Publishers, San Rafael, USA

work page 2013

[30] [30]

and BUCKLEY, C

SALTON, G. and BUCKLEY, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 513–523

work page 1988

[31] [31]

and YANG, C

SALTON, G. and YANG, C. S. (1973). On the specification of term values in automatic indexing. Journal of Documentation 29 351-372. 24

work page 1973

[32] [32]

and FAROOQUE, A

SHERIDAN, P., AHMED, Z. and FAROOQUE, A. A. (2026). A Fisher’s exact test justification of the TF-IDF term- weighting scheme. The American Statistician 80 146–156

work page 2026

[33] [33]

and ONSJÖ, M

SHERIDAN, P. and ONSJÖ, M. (2024). The hypergeometric test performs comparably to TF-IDF on standard text analysis tasks. Multimedia Tools and Applications 83 28875-28890

work page 2024

[34] [34]

SPÄRCK JONES, K. (2004). IDF term weighting and IR research lessons. Journal of Documentation 60 521– 523

work page 2004

[35] [35]

SUNEHAG, P. (2007). Using two-stage conditional word frequency models to model word burstiness and mo- tivating TF-IDF. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, 2007 8–16

work page 2007

[36] [36]

TANG, Z. (2024). A generic multi-level framework for building term-weighting schemes in text classification. The Computer Journal 67 3042–3055. APPENDIX A: TECHNICAL DETAILS A.1. First-order approximation of the gamma function. We derive the approximation Γ(x + a) = Γ(x) + O(a) for small values of a, using a Taylor expansion of the logarithm of the gamma ...

work page 2024