arxiv: 2605.05134 · v1 · submitted 2026-05-06 · 💻 cs.LG · math.DS

Recognition: unknown

Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction

Dan Wilson , Mohamed Akrout

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:45 UTC · model grok-4.3

classification 💻 cs.LG math.DS

keywords LLM hallucination detectionKoopman operator theorydynamical systemsblack-box methodsembedding spacestate-space modelsresidual scoringpreference calibration

0 comments

The pith

Projecting LLM responses into an embedding space and fitting Koopman operators to factual and hallucinated regimes enables low-cost single-pass hallucination detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats sequences of LLM outputs as observations from a latent dynamical system once projected by an embedding model. It fits two Koopman transition operators, one to factual responses and one to hallucinated ones, then uses the difference in their prediction errors as a score for whether a new response is hallucinated. A calibration procedure adjusts the decision threshold using a small number of user-provided examples to match different risk tolerances. This single-generation approach avoids the multiple samples or knowledge-base lookups required by earlier methods. Benchmark tests show it reaches state-of-the-art accuracy while lowering compute demands.

Core claim

LLM responses projected via an embedding model form observable realizations of latent state-space dynamics; the factual and hallucinated regimes admit distinct, learnable Koopman transition operators whose respective prediction errors yield a differential residual score that classifies hallucinations.

What carries the argument

The differential residual score computed from the one-step prediction errors of two separately fitted Koopman transition operators, one for factual regime trajectories and one for hallucinated regime trajectories in embedding space.

If this is right

Detection operates on a single LLM response without requiring additional sampling or external retrieval.
The classification threshold can be tuned to user preferences or domain needs using only a small demonstration set.
The method runs as a black box, requiring no internal access to the LLM.
Resource overhead is reduced compared to consistency-checking or retrieval-augmented baselines while matching their accuracy on standard benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the embedding space faithfully captures the dynamics, the same separation might allow early prediction of hallucinations within a single long generation.
Extending the operator fitting to online updates could support real-time monitoring of chatbot conversations.
Similar dynamical modeling might apply to detecting other LLM failure modes such as logical inconsistencies if they also produce distinct trajectory patterns.

Load-bearing premise

That the projected embedding sequences of factual responses and of hallucinated responses obey measurably different linear transition rules in the lifted space.

What would settle it

A test set in which the factual-operator prediction error is not reliably smaller than the hallucinated-operator error on verified factual responses, or in which the differential score shows no correlation with ground-truth hallucination labels.

Figures

Figures reproduced from arXiv: 2605.05134 by Dan Wilson, Mohamed Akrout.

**Figure 1.** Figure 1: Our proposed differential residual score view at source ↗

**Figure 2.** Figure 2: For 300 factual ground-truth and hallucinated responses from the view at source ↗

**Figure 3.** Figure 3: Hallucination Detection Dynamical System (DS): (a) Phase 1: Dynamical system fitting view at source ↗

**Figure 4.** Figure 4: Hallucination detection performance using DS classification on 8K samples of the HaluE view at source ↗

**Figure 5.** Figure 5: ROC curves on the HaluEval dataset as a function of the sequence length L for (a) the average classification performance and for (b)–(f) each embedding model showing the variation of the true positive rate against the false positive rate at various prediction error thresholds. Notably, the magnitude of the improvement from shorter to longer sequence length correlates inversely with model size. Smaller emb… view at source ↗

**Figure 6.** Figure 6: Histograms of token-level residual scores from (6) for Wikibio test data. This effect can be leveraged to implement a calibration process that utilizes a small set of userprovided demonstrations to determine the classification threshold η. A sketch of this process is provided here. Whether the user is strict or tolerant to minor hallucinations, our method sets the classification threshold η to align with … view at source ↗

**Figure 7.** Figure 7: User-centric threshold calibration: (1) selection of calibration samples based on user preference (tolerant or strict to minor inaccuracies), (2) computation of per-token differential residual scores ∆E, and (3) calibration of the classification threshold η to maximize the target metric. 5 Conclusion In this paper, we introduce a new hallucination detection method by rethinking a Large Language Model as a… view at source ↗

read the original abstract

Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Koopman operators on embedded LLM sequences give a fresh framing for hallucination detection but the abstract leaves the core dynamical claim and the SOTA numbers unexamined.

read the letter

The paper's main move is to embed LLM responses, treat the resulting vector sequences as observations of a latent dynamical system, and fit separate Koopman transition operators for factual and hallucinated regimes. Detection then comes from the difference in one-step prediction residuals, with a small demonstration set used to set the threshold. That is the punchline: a single-pass, sampling-free detector built on operator theory rather than consistency checks or retrieval.

Referee Report

3 major / 2 minor

Summary. The paper proposes a black-box hallucination detection method that embeds LLM response sequences into a manifold and models them as observable realizations of latent state-space dynamics. Separate Koopman transition operators are fitted to factual and hallucinated regimes; a differential residual score derived from their one-step prediction errors is used for classification, with a preference-aware calibration step that tunes the threshold on a small set of demonstrations. The central claim is that this yields state-of-the-art detection performance across three benchmarks at substantially lower computational cost than sampling-based or retrieval-based alternatives.

Significance. If the core dynamical assumption holds, the approach would constitute a meaningful advance by replacing expensive multi-sample consistency checks or external grounding with a single-pass operator-based residual test. The low-cost, black-box framing and the introduction of preference-aware calibration are practically attractive. However, the significance is currently limited by the absence of quantitative evidence that the embedding trajectories actually admit distinct, learnable Koopman operators that separate the two regimes.

major comments (3)

[Abstract] Abstract: the assertion of 'state-of-the-art performance' is unsupported by any numerical results, error bars, baseline comparisons, or dataset statistics. This claim is load-bearing for the paper's contribution and cannot be evaluated without the missing quantitative evidence.
[Method] Method (operator fitting and calibration): transition operators are fitted separately on factual and hallucinated regimes while the classification threshold is optimized on demonstrations drawn from the same data distribution. This construction makes the differential residual score dependent on quantities derived from the evaluation data, raising a direct circularity risk that must be addressed with explicit train/test splits or held-out calibration sets.
[§2–3] Core modeling assumption (§2–3): the manuscript provides no empirical test or theoretical argument showing that standard embedding trajectories are Koopman-linearizable in a factuality-dependent manner. If the fitted operators are not measurably distinct or if residuals are dominated by embedding noise, the method reduces to a generic embedding classifier and the claimed dynamical advantage disappears.

minor comments (2)

[Method] The notation for the differential residual score and the precise definition of the Koopman operator approximation should be stated explicitly with equations rather than described only in prose.
[Experiments] Figure captions and experimental tables should include the exact embedding model, sequence length, and number of demonstrations used for calibration so that the low-cost claim can be reproduced.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment point-by-point below, clarifying aspects of the work and indicating revisions made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'state-of-the-art performance' is unsupported by any numerical results, error bars, baseline comparisons, or dataset statistics. This claim is load-bearing for the paper's contribution and cannot be evaluated without the missing quantitative evidence.

Authors: We agree that the abstract would be stronger with explicit quantitative support. The full manuscript reports detailed results in Section 4, including accuracy/F1 scores on the three benchmarks, comparisons against sampling-based and retrieval baselines, error bars from repeated runs, and dataset statistics. We have revised the abstract to incorporate the key numerical findings (e.g., detection performance and cost reductions) while remaining concise. revision: yes
Referee: [Method] Method (operator fitting and calibration): transition operators are fitted separately on factual and hallucinated regimes while the classification threshold is optimized on demonstrations drawn from the same data distribution. This construction makes the differential residual score dependent on quantities derived from the evaluation data, raising a direct circularity risk that must be addressed with explicit train/test splits or held-out calibration sets.

Authors: This is a valid concern about potential leakage. In the original experiments the operators were fit on training partitions and the preference-aware threshold was tuned on a held-out calibration subset disjoint from the test set. We have revised the Method section to explicitly document the data partitioning (train/calibration/test splits) and to confirm that all reported metrics use these held-out sets, thereby removing any circularity. revision: yes
Referee: [§2–3] Core modeling assumption (§2–3): the manuscript provides no empirical test or theoretical argument showing that standard embedding trajectories are Koopman-linearizable in a factuality-dependent manner. If the fitted operators are not measurably distinct or if residuals are dominated by embedding noise, the method reduces to a generic embedding classifier and the claimed dynamical advantage disappears.

Authors: We acknowledge that the original submission did not foreground direct validation of the Koopman assumption. We have added a new subsection in §3 that (i) reports the Frobenius-norm difference between the factual and hallucinated operators, (ii) shows that the differential residual is not reproduced by noise-only ablations on the embeddings, and (iii) provides a brief theoretical motivation based on the approximation properties of Koopman operators for regime-dependent nonlinear dynamics. These additions demonstrate that the separation is not reducible to a generic embedding classifier. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper projects LLM responses into embeddings, fits separate Koopman transition operators to labeled factual and hallucinated regimes, defines a differential residual score from one-step prediction errors, and calibrates a threshold on a small set of demonstrations before evaluating on separate benchmarks. This is a standard supervised modeling pipeline with offline fitting and held-out testing; no equation or step reduces the claimed detection performance or SOTA results to the inputs by construction. The Koopman fitting is an explicit modeling choice whose validity is tested empirically rather than assumed tautologically, and no self-citation chain or ansatz smuggling supports the central claims.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the applicability of Koopman theory to embedded LLM sequences and on the existence of separable factual/hallucinated regimes; these are domain assumptions rather than derived results. Free parameters include the embedding model choice and the calibrated threshold.

free parameters (2)

classification threshold
Optimized via preference-aware calibration on a small set of demonstrations to accommodate user requirements.
embedding model
Choice of model used to project responses into the high-dimensional manifold before fitting operators.

axioms (1)

domain assumption Projected LLM response sequences behave as observable realizations of an underlying latent dynamical system to which Koopman operator theory applies.
Invoked when the paper states that vector sequences are treated as realizations of the model's latent state-space dynamics.

pith-pipeline@v0.9.0 · 5468 in / 1395 out tokens · 41088 ms · 2026-05-08T16:45:40.075167+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 14 canonical work pages · 6 internal anchors

[1]

Language mod- els are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language mod- els are few-shot learners. Advances in neural information processing systems , 33:1877–1901, 2020

1901
[2]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review arXiv 2023
[3]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Y u, Dan Su, Y an Xu, Etsuko Ishii, Y ejin Bang, Andrea Madotto, and Pascale Fung. A survey of hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023

work page internal anchor Pith review arXiv 2023
[4]

A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and int eractivity,

Y ejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Y u, Willy Chung, et al. A multitask, multilingual, multi- modal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023

work page arXiv 2023
[5]

Ethical governance of artiﬁcial intelligence hallucinations in legal practice

Muhammad Khurram Shahzad Warraich, Hazrat Usman, Sidra Zakir, and Mohaddas Mehboob. Ethical governance of artiﬁcial intelligence hallucinations in legal practice. Social Sciences Spectrum, 4(2):603–615, 2025

2025
[6]

A survey on hallucination in large language models: Principles, taxonomy, and challenges

Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. A survey on hallucination in large language models: Principles, taxonomy, and challenges. arXiv preprint arXiv:2411.08009, 2024

work page arXiv 2024
[7]

Calibrated language models must hallucinate

Adam Tauman Kalai and Santosh S V empala. Calibrated language models must hallucinate. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing , pages 160–171, 2024

2024
[8]

On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages 610–623, 2021

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages 610–623, 2021

2021
[9]

Why Language Models Hallucinate

Adam Tauman Kalai, Oﬁr Nachum, Santosh S V empala, and Edwin Zhang. Why language models hallucinate. arXiv preprint arXiv:2509.04664, 2025

work page internal anchor Pith review arXiv 2025
[10]

Language models cannot reliably distinguish belief from knowledge and fact

Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E Ho, Thomas Icard, Dan Jurafsky, and James Zou. Language models cannot reliably distinguish belief from knowledge and fact. Nature Machine Intelligence, pages 1–11, 2025

2025
[11]

Truthful: A benchmark for evaluating the truthfulness of large language models

Amos Azaria and Tom Mitchell. Truthful: A benchmark for evaluating the truthfulness of large language models. arXiv preprint arXiv:2310.06689, 2023

work page arXiv 2023
[12]

Truthfulqa: Measuring how models mimic human falsehoods

Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2201.08045, 2022

work page arXiv 2022
[13]

Benchmarking large language models for news summarization

Junyi Li, Tianyi Tang, Wayne Xin Zhao, and Ji-Rong Wen. Benchmarking large language models for news summarization. arXiv preprint arXiv:2305.09034, 2023

work page arXiv 2023
[14]

The internal state of an llm knows when its lying

Amos Azaria and Tom Mitchell. The internal state of an llm knows when its lying. In Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 967–976, 2023

2023
[15]

Unsupervised real-time hallucination detection based on the internal states of large lan- guage models

Weihang Su, Changyue Wang, Qingyao Ai, Yiran Hu, Zhijing Wu, Y ujia Zhou, and Yiqun Liu. Unsupervised real-time hallucination detection based on the internal states of large lan- guage models. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 14379–14391, 2024. 10

2024
[16]

INSIDE: LLMs’ internal states retain the power of hallucination detection.arXiv preprint arXiv:2402.03744,

Chao Chen, Kai Liu, Ze Chen, Yi Gu, Y ue Wu, Mingyuan Tao, Zhihang Fu, and Jieping Y e. Inside: Llms’ internal states retain the power of hallucination detection. arXiv preprint arXiv:2402.03744, 2024

work page arXiv 2024
[17]

Be- yond the next token: Towards prompt-robust zero-shot classiﬁcation via efﬁcient multi-token prediction

Junlang Qian, Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Zepeng Zhai, and Kezhi Mao. Be- yond the next token: Towards prompt-robust zero-shot classiﬁcation via efﬁcient multi-token prediction. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: ...

2025
[18]

Learning on llm output signatures for gray-box behavior analysis

Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Y oav Gelberg, Yftah Ziser, Ran El-Y aniv, Gal Chechik, and Haggai Maron. Learning on llm output signatures for gray-box behavior analysis. In ICML 2025 Workshop on Reliable and Responsible F oundation Models, 2025

2025
[19]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

Potsawee Manakul, Adian Liusie, and Mark Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. In Proceedings of the 2023 con- ference on empirical methods in natural language processing , pages 9004–9017, 2023

2023
[20]

Felm: Bench- marking factuality evaluation of large language models

Yiran Zhao, Jinghan Zhang, I Chern, Siyang Gao, Pengfei Liu, Junxian He, et al. Felm: Bench- marking factuality evaluation of large language models. Advances in Neural Information Pro- cessing Systems, 36:44502–44523, 2023

2023
[21]

Halueval: A large-scale hallucination evaluation benchmark for large language models

Junyi Li, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Y un Nie, and Ji-Rong Wen. Halueval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 conference on empirical methods in natural language processing , pages 6449–6464, 2023

2023
[22]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

Lei Huang, Weijiang Y u, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1–55, 2025

2025
[23]

Embedding and gradient say wrong: A white-box method for hallucination detection

Xiaomeng Hu, Yiming Zhang, Ru Peng, Haozhe Zhang, Chenwei Wu, Gang Chen, and Junbo Zhao. Embedding and gradient say wrong: A white-box method for hallucination detection. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages 1950–1959, 2024

2024
[24]

Llm-check: Investigating detection of hallucinations in large lan- guage models

Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kat- takinda, and Soheil Feizi. Llm-check: Investigating detection of hallucinations in large lan- guage models. Advances in Neural Information Processing Systems , 37:34188–34216, 2024

2024
[25]

Detect- ing hallucinations in large language model generation: A token probability approach

Ernesto Quevedo, Jorge Y ero Salazar, Rachel Koerner, Pablo Rivas, and Tomas Cerny. Detect- ing hallucinations in large language model generation: A token probability approach. In World Congress in Computer Science, Computer Engineering & Applied Computing , pages 154–173. Springer, 2024

2024
[26]

Detecting hallucinations in large language models using semantic entropy

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Y arin Gal. Detecting hallucinations in large language models using semantic entropy. Nature, 630(8017):625–630, 2024

2024
[27]

Semantic uncertainty: Linguistic invari- ances for uncertainty estimation in natural language generation

Lorenz Kuhn, Y arin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invari- ances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . Open- Review.net, 2023

2023
[28]

Beyond next token probabilities: Learnable, fast detection of hallucinations and data contamination on llm output distributions

Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Y oav Gelberg, Yftah Ziser, Ran El-Y aniv, Gal Chechik, and Haggai Maron. Beyond next token probabilities: Learnable, fast detection of hallucinations and data contamination on llm output distributions. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 40, pages 30058–30066, 2026

2026
[29]

Lowest span conﬁdence: A zero-shot metric for efﬁcient and black-box hallucination detection in llms

Yitong Qiao, Licheng Pan, Y u Mi, Lei Liu, Y ue Shen, Fei Sun, and Zhixuan Chu. Lowest span conﬁdence: A zero-shot metric for efﬁcient and black-box hallucination detection in llms. arXiv preprint arXiv:2601.19918, 2026. 11

work page arXiv 2026
[30]

Factselfcheck: Fact-level black-box hallucination detection for llms

Albert Sawczyn, Jakub Binkowski, Denis Janiak, Bogdan Gabrys, and Tomasz Jan Kajdanow- icz. Factselfcheck: Fact-level black-box hallucination detection for llms. In Findings of the Association for Computational Linguistics: EACL 2026 , pages 5603–5621, 2026

2026
[31]

SAC3: reliable hallucination detection in black-box language models via semantic-aware cross-check consistency

Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley Malin, and Sricharan Kumar. SAC3: reliable hallucination detection in black-box language models via semantic-aware cross-check consistency. In Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 15445–15458, 2023

2023
[32]

Multi- perspective consistency checking for large language model hallucination detection: a black- box zero-resource approach

Linggang Kong, Xiaofeng Zhong, Jie Chen, Haoran Fu, and Y ongjie Wang. Multi- perspective consistency checking for large language model hallucination detection: a black- box zero-resource approach. Frontiers of Information Technology & Electronic Engineering , 26(11):2298–2309, 2025

2025
[33]

Zero-knowledge llm hallucination detection and mitigation through ﬁne-grained cross-model consistency

Aman Goel, Daniel Schwartz, and Y anjun Qi. Zero-knowledge llm hallucination detection and mitigation through ﬁne-grained cross-model consistency. In Proceedings of the 2025 Con- ference on Empirical Methods in Natural Language Processing: Industry Track , pages 1982– 1999, 2025

2025
[34]

Budiši ´c, R

M. Budiši ´c, R. Mohr, and I. Mezi ´c. Applied Koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science , 22(4):047510, 2012

2012
[35]

I. Mezi ´c. Analysis of ﬂuid ﬂows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics , 45:357–378, 2013

2013
[36]

J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor. Dynamic mode decomposition: data-driven modeling of complex systems . Society for Industrial and Applied Mathematics, Philadelphia, PA, 2016

2016
[37]

P . J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics, 656:5–28, 2010

2010
[38]

C. W. Rowley, I. Mezic, S. Bagheri, P . Schlatter, and D. S. Henningson. Spectral analysis of nonlinear ﬂows. Journal of Fluid Mechanics , 641(1):115–127, 2009

2009
[39]

M. O. Williams, I. G. Kevrekidis, and C. W. Rowley. A data–driven approximation of the koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science , 25(6):1307–1346, 2015

2015
[40]

D. Wilson. Koopman operator inspired nonlinear system identiﬁcation. SIAM Journal on Applied Dynamical Systems, 22(2):1445–1471, 2023

2023
[41]

J. L. Proctor, S. L. Brunton, and J. N. Kutz. Dynamic mode decomposition with control. SIAM Journal on Applied Dynamical Systems , 15(1):142–161, 2016

2016
[42]

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael Günther, Maximilian Werk, and Han Xiao. jina-embeddings-v5-text: Task-targeted embed- ding distillation. arXiv preprint arXiv:2602.15547, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[43]

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Mingxin Li, Y anzhao Zhang, Dingkun Long, Keqin Chen, Sibo Song, Shuai Bai, Zhibo Y ang, Pengjun Xie, An Y ang, Dayiheng Liu, et al. Qwen3-vl-embedding and qwen3-vl-reranker: A uniﬁed framework for state-of-the-art multimodal retrieval and ranking. arXiv preprint arXiv:2601.04720, 2026

work page internal anchor Pith review arXiv 2026
[44]

F2LLM-v2: Inclusive, performant, and efficient embeddings for a multilingual world.arXiv preprint arXiv:2603.19223, 2026

Ziyin Zhang, Zihan Liao, Hang Y u, Peng Di, and Rui Wang. F2llm-v2: Inclusive, performant, and efﬁcient embeddings for a multilingual world. arXiv preprint arXiv:2603.19223, 2026

work page arXiv 2026
[45]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023

2023
[46]

The Llama 3 Herd of Models

Aaron Grattaﬁori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex V aughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. 12

work page internal anchor Pith review arXiv 2024
[47]

A quantitative analysis of koopman operator methods for system identiﬁcation and predictions

Christophe Zhang and Enrique Zuazua. A quantitative analysis of koopman operator methods for system identiﬁcation and predictions. Comptes Rendus. Mécanique, 351(S1):1–31, 2023

2023
[48]

H. S. Hemati, C. W. Rowley, E. A. Deem, and L. N. Cattafesta. De-biasing the dynamic mode decomposition for applied koopman spectral analysis of noisy datasets. Theoretical and Computational Fluid Dynamics, 31(4):349–368, 2017

2017
[49]

S. T. M. Dawson, H. S. Hemati, M. O. Williams, and C. W. Rowley. Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Experiments in Fluids, 57(3):42, 2016. 13 A Implementation Details A.1 Embedding extraction Embedding extraction for all datasets relies on capturing the embedding per token for a given sen- tence...

2016