An Information-Geometric Justification for Composite Coherence in Event-Based Narrative Extraction
Pith reviewed 2026-06-30 02:59 UTC · model grok-4.3
The pith
The geometric mean is the unique combinator satisfying four axioms for combining angular and topic similarities in narrative coherence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the product manifold S to the d-1 times Delta to the K-1, the negative log-coherence decomposes additively into an angular cost from embedding similarity and a topic cost from the Jensen-Shannon distance. The topic cost is locally consistent with the Fisher-Rao metric because the Riemannian metric tensor induced by the Jensen-Shannon distance is proportional to the Fisher information matrix. Within the compensability spectrum of combinators, the geometric mean is the unique rule satisfying the four axioms of a boundary or veto condition, symmetry, log-additivity, and normalization, and this choice induces a proper product metric d sub times on the manifold.
What carries the argument
The geometric-mean combinator that forms sqrt of A times T and is the sole function on the compensability spectrum obeying the boundary, symmetry, log-additivity, and normalization axioms.
Load-bearing premise
The Riemannian metric tensor induced by the Jensen-Shannon distance on the simplex is proportional to the Fisher information matrix.
What would settle it
A calculation on the same topic models showing that the correlation between the Jensen-Shannon induced distances and the Fisher information matrix drops below 0.99, or an LLM-as-judge evaluation in which some other combinator or single-channel baseline outperforms the geometric mean on the extracted storylines.
Figures
read the original abstract
Graph-based narrative extraction relies on a coherence function to score transitions between events, but the coherence metrics in current use are defined operationally and lack an information-theoretic foundation. We study the composite metric $C=\sqrt{A\cdot T}$, where $A$ is the angular similarity of document embeddings and $T=1-d_{\mathrm{JS}}$ is a topic proximity from the Jensen-Shannon distance of soft memberships, and give it an information-geometric reading together with an axiomatic characterization of the geometric-mean combinator. On the product manifold $\mathbb{S}^{d-1}\times\Delta^{K-1}$, the negative log-coherence decomposes additively into an angular and a topic cost. Because the Riemannian metric tensor induced by the Jensen-Shannon distance on the simplex is proportional to the Fisher information matrix, the topic component is locally consistent with the Fisher-Rao metric singled out by Chentsov's theorem. Within the compensability spectrum of combinators, the geometric mean is the unique one consistent with four natural axioms (a boundary/veto condition, symmetry, log-additivity, normalization), and the construction motivates a proper product metric $d_\times$. Experiments on four corpora, three embedding families, and three topic models are consistent with the framework: the Fisher identity holds ($R\ge0.99$), the geometric mean tracks $d_\times$ closely ($\rho=0.999$), and a downstream LLM-as-judge check finds it is not dominated by any alternative combinator or single-channel baseline. Sweeping the spectrum, the bottleneck-coherence gap between extracted and random storylines splits into a symmetric component, maximized at the geometric mean across five corpora, and a displacement term; a cross-modal image-narrative case study reproduces the effect. These results justify the composite coherence metric and articulate when the geometric mean is the natural choice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide an information-geometric justification for the composite coherence metric C=√(A·T) in event-based narrative extraction. It axiomatizes the geometric mean as the unique combinator satisfying four axioms (boundary/veto condition, symmetry, log-additivity, normalization) within the compensability spectrum, shows that the JS distance induces a Riemannian metric proportional to the Fisher information matrix on the simplex (consistent with Chentsov's theorem), motivates a product metric d_× on the product manifold, and reports empirical consistency with R≥0.99 for the Fisher identity and ρ=0.999 for tracking d_× across experiments on four corpora, three embeddings, and three topic models, along with LLM-as-judge and cross-modal validations.
Significance. If the central claims hold, the work supplies a principled axiomatic and geometric foundation for composite coherence, uniquely characterizing the geometric mean without free parameters and linking it to the Fisher-Rao metric. The use of standard results like Chentsov's theorem and the explicit four-axiom derivation are strengths that provide a parameter-free justification. The empirical checks, while supportive, require careful verification for the reported correlations.
minor comments (2)
- [Experiments] Experiments section: the reported values R≥0.99 and ρ=0.999 for the Fisher identity and geometric mean tracking lack accompanying details on sample sizes, data splits, exclusion criteria, or confidence intervals, which would strengthen the verification of the local consistency with the Fisher-Rao metric.
- [Abstract / product manifold section] The proportionality between the JS-induced metric tensor and the Fisher information matrix is asserted in the abstract and product manifold discussion but the explicit constant of proportionality or its derivation (via the Hessian of JS) could be stated more clearly for readers.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report correctly identifies the core contributions: the axiomatic uniqueness of the geometric mean under the four stated axioms and the link to the Fisher-Rao metric via Chentsov's theorem on the product manifold. No major comments requiring point-by-point rebuttal were listed in the report.
Circularity Check
No significant circularity identified
full rationale
The derivation rests on four explicitly listed axioms for the geometric-mean combinator and on the external Chentsov uniqueness theorem; the claimed proportionality between the JS-induced Riemannian tensor and the Fisher information matrix is a standard property of the Jensen-Shannon divergence, not obtained by fitting inside the paper. The additive decomposition on the product manifold follows directly from the definitions of angular and topic costs. No equation reduces by construction to a fitted parameter, no load-bearing premise is justified solely by self-citation, and no ansatz is smuggled via prior work by the same authors. Experiments supply consistency checks rather than the central justification.
Axiom & Free-Parameter Ledger
axioms (5)
- domain assumption boundary/veto condition
- standard math symmetry
- domain assumption log-additivity
- domain assumption normalization
- domain assumption Jensen-Shannon metric tensor proportional to Fisher information
Reference graph
Works this paper leans on
-
[1]
Halverson, H
Jeffry R. Halverson, H. L. Goodall, and Steven R. Corman.Master Narratives of Is- lamist Extremism. Palgrave Macmillan, New York, NY, USA, 2011. doi: 10.1007/ 978-0-230-11723-5
2011
-
[2]
Brian Keith and Tanushree Mitra. Narrative maps: An algorithmic approach to repre- sent and extract information narratives.Proceedings of the ACM on Human-Computer Interaction, 4(CSCW3):1–33, 2021. doi: 10.1145/3432927
-
[3]
Dafna Shahaf and Carlos Guestrin. Connecting the dots between news articles. InPro- ceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 623–632. ACM, 2010. doi: 10.1145/1835804.1835884
-
[4]
Storylines for structuring massive streams of news
Piek Vossen, Tommaso Caselli, and Yiota Kontzopoulou. Storylines for structuring massive streams of news. InProceedings of the 1st Workshop on Computing News Storylines, pages 40–49. ACL, 2015. doi: 10.18653/v1/W15-4507
-
[5]
A survey on event-based news narrative extraction.ACM Computing Surveys, 55(14s):300, 2023
Brian Keith Norambuena, Tanushree Mitra, and Chris North. A survey on event-based news narrative extraction.ACM Computing Surveys, 55(14s):300, 2023. doi: 10.1145/ 3584741
2023
-
[6]
Trains of thought: Generating informa- tion maps
Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. Trains of thought: Generating informa- tion maps. InProceedings of the 21st International Conference on World Wide Web, pages 899–908. ACM, 2012. doi: 10.1145/2187836.2187957
-
[7]
Narrative trails: A method for coherent storyline extraction via maximum capacity path optimization
Fausto German, Brian Keith, and Chris North. Narrative trails: A method for coherent storyline extraction via maximum capacity path optimization. InProceedings of the 8th International Workshop on Narrative Extraction from Texts (Text2Story 2025), volume 3964 ofCEUR Workshop Proceedings, pages 15–27, 2025. URLhttps://ceur-ws.org/ Vol-3964/paper2.pdf.https...
2025
-
[8]
Evolutionary timeline summarization: A balanced optimization framework via iterative substitution
Rui Yan, Xiaojun Wan, Jahna Otterbacher, Liang Kong, Xiaoming Li, and Yan Zhang. Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. InProceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 745–754. ACM, 2011. doi: 10.1145/ 2009916.2010016
-
[9]
Evolutionary hierarchical Dirichlet process for timeline sum- marization
Jiwei Li and Sujian Li. Evolutionary hierarchical Dirichlet process for timeline sum- marization. InProceedings of the 51st Annual Meeting of the Association for Com- putational Linguistics (Volume 2: Short Papers), pages 556–560. ACL, 2013.https: //aclanthology.org/P13-2099
2013
-
[10]
Nikolai N. Chentsov.Statistical Decision Rules and Optimal Inference, volume 53 of Translations of Mathematical Monographs. AMS, Providence, RI, USA, 1982. doi: 10.1090/mmono/053
-
[11]
Humanwayfindingininformationnetworks
RobertWestandJureLeskovec. Humanwayfindingininformationnetworks. InProceedings of the 21st International Conference on World Wide Web (WWW), pages 619–628, 2012. doi: 10.1145/2187836.2187920
-
[12]
Semi- supervised image-based narrative extraction: A case study with historical photographic records
Fausto German, Brian Keith, Mauricio Matus, Diego Urrutia, and Claudio Meneses. Semi- supervised image-based narrative extraction: A case study with historical photographic records. InAdvances in Information Retrieval – 47th European Conference on Information Retrieval (ECIR 2025), volume 15573 ofLecture Notes in Computer Science, pages 248– 262, Cham, Sw...
-
[13]
Generating textual storyline to improve situation awareness in disaster management
Wubai Zhou, Chao Shen, Tao Li, Shu-Ching Chen, and Ning Xie. Generating textual storyline to improve situation awareness in disaster management. InProceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, pages 585–592. IEEE, 2014. doi: 10.1109/IRI.2014.7051942
-
[14]
Leveraging learning to rank in an optimization framework for timeline summarization
Giang Binh Tran, Tuan Tran, Nam-Khanh Tran, Mohammad Alrifai, and Nat- tiya Kanhabua. Leveraging learning to rank in an optimization framework for timeline summarization. InSIGIR 2013 Workshop on Time-aware Informa- tion Access (TAIA). ACM, 2013. URLhttps://www.semanticscholar.org/paper/ 33d5e4161c1a9f8a94ee9ee7b537dc5a4202f6fc.https://www.semanticscholar...
2013
-
[15]
Tracking events using time-dependent hierarchical Dirichlet tree model
Rumeng Li, Tao Wang, and Xun Wang. Tracking events using time-dependent hierarchical Dirichlet tree model. InProceedings of the 2015 SIAM International Conference on Data Mining (SDM), pages 550–558. SIAM, 2015. doi: 10.1137/1.9781611974010.62
-
[16]
Generating risk maps for evolution analysis of societal risk events
Nuo Xu and Xijin Tang. Generating risk maps for evolution analysis of societal risk events. InKnowledge and Systems Sciences (KSS), pages 115–128. Springer, 2018. doi: 10.1007/ 978-981-13-3149-7_9
2018
-
[17]
Growing story forest online from massive breaking news
Bang Liu, Di Niu, Kunfeng Lai, Linglong Kong, and Yu Xu. Growing story forest online from massive breaking news. InProceedings of the 2017 ACM Conference on Information and Knowledge Management (CIKM), pages 777–785. ACM, 2017. doi: 10.1145/3132847. 3132852
-
[18]
Han, Di Niu, Linglong Kong, Kunfeng Lai, and Yu Xu
Bang Liu, Fred X. Han, Di Niu, Linglong Kong, Kunfeng Lai, and Yu Xu. Story forest: Extracting events and telling stories from breaking news.ACM Transactions on Knowledge Discovery from Data, 14(3):1–28, 2020. doi: 10.1145/3377939
-
[19]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
LelandMcInnes, JohnHealy, andJamesMelville. UMAP:Uniformmanifoldapproximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. UMAP: Uniform manifold approximation and projection.Journal of Open Source Software, 3(29):861, 2018. doi: 10.21105/joss.00861
-
[21]
Ricardo J. G. B. Campello, Davoud Moulavi, and Jörg Sander. Density-based clustering based on hierarchical density estimates. InProceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 160–172. Springer, 2013. doi: 10.1007/978-3-642-37456-2_14
-
[22]
Dominik M. Endres and Johannes E. Schindelin. A new metric for probability distributions. IEEE Transactions on Information Theory, 49(7):1858–1860, 2003. doi: 10.1109/TIT.2003. 813506
-
[23]
Ferdinand Österreicher and Igor Vajda. A new class of metric divergences on probability spaces and its applicability in statistics.Annals of the Institute of Statistical Mathematics, 55(3):639–653, 2003. doi: 10.1007/BF02517812
-
[24]
Divergence measures based on the Shannon en- tropy,
Jianhua Lin. Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991. doi: 10.1109/18.61115
-
[25]
Radhakrishna Rao
C. Radhakrishna Rao. Information and the accuracy attainable in the estimation of statistical parameters.Bulletin of the Calcutta Mathematical Society, 37:81–91, 1945. Reprinted inBreakthroughs in Statistics, Springer, 1992,https://doi.org/10.1007/ 978-1-4612-0919-5_16. 43
1945
-
[26]
Springer, Berlin, Germany, 1985
Shun-ichiAmari.Differential-Geometrical Methods in Statistics, volume28ofLecture Notes in Statistics. Springer, Berlin, Germany, 1985. doi: 10.1007/978-1-4612-5056-2
-
[27]
AMS, Providence, RI, USA, 2000
Shun-ichi Amari and Hiroshi Nagaoka.Methods of Information Geometry, volume 191 ofTranslations of Mathematical Monographs. AMS, Providence, RI, USA, 2000. doi: 10.1090/mmono/191
-
[28]
Springer, Cham, Switzer- land, 2017
Nihat Ay, Jürgen Jost, Hông Vân Lê, and Lorenz Schwachhöfer.Information Geometry, volume 64 ofErgebnisse der Mathematik und ihrer Grenzgebiete. Springer, Cham, Switzer- land, 2017. doi: 10.1007/978-3-319-56478-4
-
[29]
An elementary introduction to information geometry.Entropy, 22(10):1100,
Frank Nielsen. An elementary introduction to information geometry.Entropy, 22(10):1100,
-
[30]
doi: 10.3390/e22101100
-
[31]
Diffusion kernels on statistical manifolds.Journal of Ma- chine Learning Research, 6:129–163, 2005.https://jmlr.org/papers/v6/lafferty05a
John Lafferty and Guy Lebanon. Diffusion kernels on statistical manifolds.Journal of Ma- chine Learning Research, 6:129–163, 2005.https://jmlr.org/papers/v6/lafferty05a. html
2005
-
[32]
PhD thesis, Carnegie Mellon University, 2005.https://www.cs.cmu.edu/~lebanon/pub/thesis/ thesis.pdf
Guy Lebanon.Riemannian Geometry and Statistical Machine Learning. PhD thesis, Carnegie Mellon University, 2005.https://www.cs.cmu.edu/~lebanon/pub/thesis/ thesis.pdf
2005
-
[33]
InfoLM: A new metric to evaluate summarization & data2text generation
Pierre Colombo, Chloé Clavel, and Pablo Piantanida. InfoLM: A new metric to evaluate summarization & data2text generation. InProceedings of the AAAI Conference on Artifi- cial Intelligence, volume 36, pages 10554–10562, 2022. doi: 10.1609/aaai.v36i10.21299
-
[34]
Dhillon, Subramanyam Mallela, and Rahul Kumar
Inderjit S. Dhillon, Subramanyam Mallela, and Rahul Kumar. A divisive information- theoretic feature clustering algorithm for text classification.Journal of Machine Learning Research, 3:1265–1287, 2003.https://jmlr.org/papers/v3/dhillon03a.html
2003
-
[35]
Poincaré embeddings for learning hierarchical repre- sentations
Maximilian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical repre- sentations. InAdvances in Neural Information Processing Systems, volume 30, 2017
2017
-
[36]
Jensen–Shannon divergence and Hilbert space em- bedding
Bent Fuglede and Flemming Topsøe. Jensen–Shannon divergence and Hilbert space em- bedding. InProceedings of the IEEE International Symposium on Information Theory, page 30. IEEE, 2004. doi: 10.1109/ISIT.2004.1365067
-
[37]
Lorne Campbell
L. Lorne Campbell. An extended Chentsov characterization of the information metric. Proceedings of the American Mathematical Society, 98(1):135–141, 1986. doi: 10.1090/ S0002-9939-1986-0848890-5
1986
-
[38]
Martin Bauer, Martins Bruveris, and Peter W. Michor. Uniqueness of the Fisher–Rao metric on the space of smooth densities.Bulletin of the London Mathematical Society, 48 (3):499–506, 2016. doi: 10.1112/blms/bdw020
-
[39]
Flemming Topsøe. Some inequalities for information divergence and related measures of discrimination.IEEE Transactions on Information Theory, 46(4):1602–1609, 2000. doi: 10.1109/18.850703
-
[40]
Lee.Introduction to Riemannian Manifolds, volume 176 ofGraduate Texts in Mathematics
John M. Lee.Introduction to Riemannian Manifolds, volume 176 ofGraduate Texts in Mathematics. Springer, Cham, Switzerland, 2nd edition, 2018. doi: 10.1007/ 978-3-319-91755-9
2018
-
[41]
A Survey on Multi-view Learning
Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning.arXiv preprint arXiv:1304.5634, 2013. 44
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[42]
Princeton Series in Applied Mathematics
Rajendra Bhatia.Positive Definite Matrices. Princeton Series in Applied Mathematics. Princeton University Press, Princeton, NJ, USA, 2009. doi: 10.1515/9781400827787
-
[43]
P. S. Bullen.Handbook of Means and Their Inequalities, volume 560 ofMathematics and Its Applications. Springer, Dordrecht, 2003. doi: 10.1007/978-94-017-0399-4
-
[44]
American Mathematical Society, Providence, RI,
Dmitri Burago, Yuri Burago, and Sergei Ivanov.A Course in Metric Geometry, volume 33 ofGraduate Studies in Mathematics. American Mathematical Society, Providence, RI,
-
[45]
doi: 10.1090/gsm/033
-
[46]
Aczél and J
J. Aczél and J. Dhombres.Functional Equations in Several Variables, volume 31 ofEn- cyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge,
-
[47]
doi: 10.1017/CBO9781139086578
-
[48]
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley- Interscience, Hoboken, NJ, USA, 2nd edition, 2006. doi: 10.1002/047174882X
-
[49]
Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Journal of the ACM, 42(6):1115–1145, 1995. doi: 10.1145/227683.227684
-
[50]
Moses S. Charikar. Similarity estimation techniques from rounding algorithms. InPro- ceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 380–388, 2002. doi: 10.1145/509907.509965
-
[51]
Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld. SPECTER: Document-level representation learning using citation-informed transformers. InProceed- ings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2270–2282. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020. acl-main.207
-
[52]
Fang, J., Jiang, H., Wang, K., Ma, Y ., Shi, J., Wang, X., He, X., and Chua, T
Amanpreet Singh, Mike D’Arcy, Arman Cohan, Doug Downey, and Sergey Feldman. SciRepEval: A multi-format benchmark for scientific document representations. InPro- ceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5548–5566. Association for Computational Linguistics, 2023. doi: 10.18653/v1/ 2023.emnlp-main.338
-
[53]
Optimum branchings.Journal of Research of the National Bureau of Standards B, 71B(4):233–240, 1967
Jack Edmonds. Optimum branchings.Journal of Research of the National Bureau of Standards B, 71B(4):233–240, 1967. doi: 10.6028/jres.071B.032
-
[54]
BrianKeith. LLM-as-a-judgeapproachesasproxiesformathematicalcoherenceinnarrative extraction.Electronics, 14(13):2735, 2025. doi: 10.3390/electronics14132735
-
[55]
The use of ranks to avoid the assumption of normality implicit in the analysis of variance.Journal of the American Statistical Association, 32(200):675–701,
Milton Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance.Journal of the American Statistical Association, 32(200):675–701,
-
[56]
doi: 10.1080/01621459.1937.10503522
-
[57]
PhD thesis, Princeton University, 1963.https://catalog.princeton.edu/catalog/9920813653506421
Peter Nemenyi.Distribution-free Multiple Comparisons. PhD thesis, Princeton University, 1963.https://catalog.princeton.edu/catalog/9920813653506421
-
[58]
Xing, Hao Zhang, Joseph E
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging LLM-as-a-judge with MT-bench and chatbot arena.Advances in Neural Information Processing Systems, 36, 2023. 45
2023
-
[59]
Donald J. Schuirmann. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability.Journal of Pharmacoki- netics and Biopharmaceutics, 15(6):657–680, 1987. doi: 10.1007/BF01068419
-
[60]
Equivalence tests: A practical primer forttests, correlations, and meta- analyses.Social Psychological and Personality Science, 8(4):355–362, 2017
Daniël Lakens. Equivalence tests: A practical primer forttests, correlations, and meta- analyses.Social Psychological and Personality Science, 8(4):355–362, 2017. doi: 10.1177/ 1948550617697177
2017
-
[61]
Brian Keith Norambuena, Tanushree Mitra, and Chris North. Design guidelines for nar- rative maps in sensemaking tasks.Information Visualization, 21(3):220–245, 2022. doi: 10.1177/14738716221079593
-
[62]
An Information-Geometric Justification for Composite Coherence in Event-Based Narrative Extraction
Brian Keith Norambuena, Fausto German, Eric Krokos, Sarah Joseph, and Chris North. Semantic interaction for narrative map sensemaking: An insight-based evaluation. InPro- ceedings of the 9th International Workshop on Narrative Extraction from Texts (Text2Story 2026), volume 4202 ofCEUR Workshop Proceedings, 2026. URLhttps://ceur-ws.org/ Vol-4202/paper4.pd...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.