The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management

Binqi Shen; Hanyu Cai; Lan Hu; Lier Jin; Yuting Xin

arxiv: 2605.23071 · v1 · pith:K7MARZMInew · submitted 2026-05-21 · 💻 cs.CL

The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management

Binqi Shen , Lier Jin , Hanyu Cai , Lan Hu , Yuting Xin This is my paper

Pith reviewed 2026-05-25 05:20 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM context managementefficiency frontieramortized cost modelingtoken usage optimizationretrieval versus compressionHotpotQA evaluationdeployment-aware optimization

0 comments

The pith

A unified optimization framework for LLM context management cuts effective token use by 25% at comparable performance levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents The Efficiency Frontier as a way to treat choice of context strategy as a single optimization problem that balances task performance against token cost while folding in preprocessing reuse through amortization. This replaces isolated comparisons of retrieval or compression methods with a deployment-aware view that shows when each approach crosses into preference under different operating conditions. Evaluated across 5000 HotpotQA examples, the framework locates distinct regimes and transition points, delivering the reported 25% token reduction at F1 near 0.78 and more than 50% lower cost for amortized memory compression versus full-context baselines in stronger-performance settings. Readers would care because the same model supplies a concrete decision procedure rather than separate performance or efficiency scores.

Core claim

The Efficiency Frontier models context strategy selection as a deployment-aware optimization problem that jointly accounts for task performance, token cost, and preprocessing reuse through amortized cost modeling. Unlike prior evaluations that treat methods in isolation, the framework produces decision-oriented analysis of when retrieval-based versus preprocessing-based strategies become preferable, with distinct operational regimes and transition boundaries observed on HotpotQA.

What carries the argument

The Efficiency Frontier, a unified framework that casts context strategy selection as deployment-aware optimization using amortized cost modeling to incorporate preprocessing reuse.

If this is right

Deployment-aware optimization yields roughly 25% lower effective token usage while holding F1 near 0.78.
Amortized memory compression delivers over 50% lower token cost than full-context prompting once higher performance targets are required.
The framework surfaces explicit transition boundaries that mark when retrieval overtakes compression or vice versa under changing cost or accuracy constraints.
Systematic comparison across strategies becomes possible because all are placed on the same cost-performance surface rather than evaluated separately.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same amortized lens could be applied to production logs to decide strategy switches on a per-query basis rather than at the dataset level.
Extending the frontier to include latency or energy metrics would let operators optimize for additional deployment constraints not modeled here.
If preprocessing reuse is lower than assumed, the advantage of memory-compression regimes would shrink, moving the transition points toward retrieval.

Load-bearing premise

The amortized cost model correctly captures real preprocessing reuse and the regimes found on HotpotQA extend to other tasks and deployments.

What would settle it

Repeating the full optimization and regime analysis on a second multi-hop QA dataset such as Natural Questions and checking whether the 25% token reduction, 50% compression saving, and transition boundaries remain stable or shift by more than 10%.

Figures

Figures reproduced from arXiv: 2605.23071 by Binqi Shen, Hanyu Cai, Lan Hu, Lier Jin, Yuting Xin.

**Figure 1.** Figure 1: Strategy-level Efficiency Frontiers and decision paths. Each panel plots token cost versus task performance (F1). Faint points denote all evaluated [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Global Efficiency Frontier under different reuse regimes ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Large language models (LLMs) increasingly rely on long-context processing, but expanding context windows introduces substantial computational and financial costs. Existing context reduction approaches, including retrieval and memory compression methods, are typically evaluated using performance and efficiency metrics independently, limiting systematic comparison and deployment-aware decision-making. This paper introduces The Efficiency Frontier, a unified framework for cost-performance optimization in LLM context management. The framework models context strategy selection as a deployment-aware optimization problem that jointly accounts for task performance, token cost, and preprocessing reuse through amortized cost modeling. Unlike existing evaluations that compare methods in isolation, the proposed framework enables decision-oriented analysis of when different context management strategies become preferable under varying operational conditions. Evaluated on 5,000 HotpotQA instances, the framework reveals distinct operational regimes and transition boundaries between retrieval-based and preprocessing-based strategies. Results show that deployment-aware optimization reduces effective token usage by approximately 25% at comparable performance ($F1 \approx 0.78$), while amortized memory compression achieves over 50% lower token cost relative to full-context prompting in higher-performance settings. Overall, the proposed framework provides a principled and practical foundation for evaluating and deploying scalable, efficient, and sustainable LLM systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames context strategy choice as a joint cost-performance optimization problem and shows concrete token savings on HotpotQA, but the reported regimes rest entirely on that single task.

read the letter

The core contribution is a deployment-aware optimization lens that treats context management as selecting among retrieval, compression, and full-context options while folding in amortized preprocessing cost. It identifies transition points where one strategy overtakes another under different token budgets and performance targets. That framing is cleaner than the usual side-by-side metric tables in the retrieval-augmented generation literature, and the 5,000-instance HotpotQA run produces usable numbers: roughly 25 % lower effective tokens at F1 around 0.78, and more than 50 % amortized cost reduction versus full context in the higher-performance band. Those are the practical takeaways an engineer could actually use tomorrow. The evaluation is at least large enough to be worth looking at, and the amortized-cost modeling is a reasonable way to capture reuse that most prior papers ignore. The main limitation is that every regime boundary and every percentage is derived from HotpotQA alone. No other datasets, no sensitivity sweeps over different cost ratios, and no external benchmarks are described, so it is unclear whether the transition points travel. The abstract also gives no error bars, baseline definitions, or exclusion rules, which makes it hard to judge how much of the reported gain is stable versus post-hoc. This is the kind of applied systems paper that production teams would read for the decision tool even if the exact frontiers need re-running on their own workload. It is not foundational, but the framing is honest and the scale is decent. I would send it to referees with a request for at least one additional task and a clearer statement of which assumptions are being tested versus assumed.

Referee Report

1 major / 1 minor

Summary. The paper introduces the Efficiency Frontier framework for unified cost-performance optimization in LLM context management. It models context strategy selection (retrieval vs. memory compression vs. full-context) as a deployment-aware optimization problem incorporating task performance, token cost, and amortized preprocessing reuse. On 5,000 HotpotQA instances, it identifies operational regimes and transition boundaries, claiming that deployment-aware optimization yields ~25% token reduction at F1≈0.78 while amortized memory compression yields >50% lower token cost than full-context prompting in higher-performance regimes.

Significance. If the empirical regimes and amortized model hold beyond the reported setting, the framework supplies a decision-oriented tool for choosing context strategies under varying operational conditions, addressing the common limitation of evaluating efficiency methods in isolation. The explicit incorporation of preprocessing reuse via amortization is a constructive modeling choice that could support more realistic deployment analysis.

major comments (1)

[Abstract, paragraph 3 and §4] Abstract, paragraph 3 and §4 (evaluation): the reported 25% token reduction and >50% amortized cost savings, along with the identified transition boundaries, are derived exclusively from 5,000 HotpotQA instances. No cross-task evaluation, sensitivity analysis to different retrieval patterns or cost structures, or external benchmarks are described, which is load-bearing for the central claim that the framework enables general deployment-aware optimization rather than task-specific observations.

minor comments (1)

[Abstract] Abstract: numerical claims (25%, 50%, F1≈0.78) are stated without reference to baseline definitions, error bars, or exclusion criteria; the full manuscript should make these explicit in the results section to allow assessment of post-hoc selection.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the Efficiency Frontier framework. We address the single major comment below.

read point-by-point responses

Referee: [Abstract, paragraph 3 and §4] Abstract, paragraph 3 and §4 (evaluation): the reported 25% token reduction and >50% amortized cost savings, along with the identified transition boundaries, are derived exclusively from 5,000 HotpotQA instances. No cross-task evaluation, sensitivity analysis to different retrieval patterns or cost structures, or external benchmarks are described, which is load-bearing for the central claim that the framework enables general deployment-aware optimization rather than task-specific observations.

Authors: We agree that the reported quantitative results (25% token reduction at F1≈0.78 and >50% amortized savings) are derived solely from the 5,000 HotpotQA instances and that this constrains the strength of any generality claim. HotpotQA was selected as a standard multi-hop QA benchmark that stresses context management, but we acknowledge the absence of cross-task validation or external benchmarks. In the revised manuscript we will (1) revise the abstract and §4 to state explicitly that the numerical regimes and transition boundaries are demonstrated on HotpotQA while the framework itself is task-agnostic, (2) add a sensitivity analysis varying token-cost ratios and retrieval-pattern parameters within the existing HotpotQA setup, and (3) include a discussion of how the same optimization procedure can be applied to other tasks. These textual and analytical changes will be incorporated in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of proposed framework on HotpotQA

full rationale

The paper introduces the Efficiency Frontier as a modeling framework for context strategy selection and reports concrete performance numbers (25% token reduction at F1≈0.78; >50% amortized cost savings) as direct outcomes of running the framework on 5,000 HotpotQA instances. No equations, derivations, fitted parameters renamed as predictions, or self-citations appear in the provided text. The central claims are therefore experimental results rather than reductions to inputs by construction, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The framework is described at a high level without mathematical detail.

pith-pipeline@v0.9.0 · 5752 in / 1095 out tokens · 28682 ms · 2026-05-25T05:20:33.200376+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 7 internal anchors

[1]

Industrial applications of large language models,

M. Raza, Z. Jahangir, M. B. Riaz, M. J. Saeed, and M. A. Sattar, “Industrial applications of large language models,”Scientific Reports, vol. 15, no. 1, p. 13755, Apr. 2025

work page 2025
[2]

Dissecting the runtime performance of the training, fine-tuning, and inference of large language models,

L. Zhang, X. Liu, Z. Li, X. Pan, P. Dong, R. Fan, R. Guo, X. Wang, Q. Luo, S. Shi, and X. Chu, “Dissecting the runtime performance of the training, fine-tuning, and inference of large language models,”

work page
[3]

Available: https://arxiv.org/abs/2311.03687

[Online]. Available: https://arxiv.org/abs/2311.03687

work page arXiv
[4]

Evaluation of tunnel rock mass integrity using multi-modal data and generative large model: Tunnel rip-gpt,

C. Wu, H. Huang, and Y .-Q. Ni, “Evaluation of tunnel rock mass integrity using multi-modal data and generative large model: Tunnel rip-gpt,”SSRN Electronic Journal, 2025. [Online]. Available: https://ssrn.com/abstract=5348429

work page 2025
[5]

Sustainable ai: Environmental implications, challenges and opportunities,

C.-J. Wu, R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, G. Chang, F. Aga, J. Huang, C. Baiet al., “Sustainable ai: Environmental implications, challenges and opportunities,”Proceedings of machine learning and systems, vol. 4, pp. 795–813, 2022

work page 2022
[6]

Environmental and economic costs behind llms,

P. L ´opez- ´Ubeda, T. Mart´ın-Noguerol, and A. Luna, “Environmental and economic costs behind llms,”Nature Reviews Electrical Engineering, vol. 21, no. 3, pp. 661–663, Mar. 2026

work page 2026
[7]

Longllmlingua: Accelerating and enhancing llms in long context sce- narios via prompt compression,

H. Jiang, Q. Wu, X. Luo, D. Li, C.-Y . Lin, Y . Yang, and L. Qiu, “Longllmlingua: Accelerating and enhancing llms in long context sce- narios via prompt compression,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 1658–1677

work page 2024
[8]

Retrieval meets long context large language models,

P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Subramanian, E. Bakhturina, M. Shoeybi, and B. Catanzaro, “Retrieval meets long context large language models,” inInternational Conference on Learning Representations, vol. 2024, 2024, pp. 49 569–49 584

work page 2024
[9]

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

D. Jiang, Y . Li, G. Li, and B. Li, “Magma: A multi-graph based agentic memory architecture for ai agents,”arXiv preprint arXiv:2601.03236, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

Holistic Evaluation of Language Models

“Holistic evaluation of language models,” 2023. [Online]. Available: https://arxiv.org/abs/2211.09110

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents,

N. Pollertlam and W. Kornsuwannawit, “Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents,” 2026. [Online]. Available: https://arxiv.org/ abs/2603.04814

work page arXiv 2026
[12]

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

D. Jiang, Y . Li, S. Wei, J. Yang, A. Kishore, A. Zhao, D. Kang, X. Hu, F. Chen, Q. Liet al., “Anatomy of agentic memory: Taxonomy and empirical analysis of evaluation and system limitations,”arXiv preprint arXiv:2602.19320, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “Hotpotqa: A dataset for diverse, explainable multi-hop question answering,” 2018. [Online]. Available: https: //arxiv.org/abs/1809.09600

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Benchmark for evaluating initialization of visual-inertial odometry,

Z. Zhao and B. M. Chen, “Benchmark for evaluating initialization of visual-inertial odometry,” in2023 42nd Chinese Control Conference (CCC). IEEE, 2023, pp. 3935–3940

work page 2023
[15]

A data-centric perspective on the lifecycle of large language models,

J. Rao, X. Liu, H. Yan, J. Shen, H. Mo, Y . Dong, Z. Yan, Z. Wang, Z. Lin, X. Meng, Z. Yu, L. Deng, J. Wei, Y . Wang, and M. Zhang, “A data-centric perspective on the lifecycle of large language models,” TechRxiv, vol. 2025, no. 1220, 2025. [Online]. Available: https: //www.techrxiv.org/doi/abs/10.36227/techrxiv.176620610.03288677/v1

work page doi:10.36227/techrxiv.176620610.03288677/v1 2025
[16]

Green ai,

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green ai,” Communications of the ACM, vol. 63, no. 12, pp. 54–63, 2020

work page 2020
[17]

Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios

J. Zang, Y . Wei, R. Bai, S. Jiang, N. Mo, B. Li, Q. Sun, and H. Liu, “Reward auditor: Inference on reward modeling suitability in real-world perturbed scenarios,”arXiv preprint arXiv:2512.00920, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data,

W. Sun, Z. Qi, and Q. Shen, “High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data,” in2025 5th International Conference on Digital Society and Intelligent Systems (DSInS), 2025, pp. 207–212

work page 2025
[19]

Task-specific efficiency analysis: When small language models outperform large language models,

J. Cao, Y . Ma, X. Li, Q. Ren, and X. Chen, “Task-specific efficiency analysis: When small language models outperform large language models,” 2026. [Online]. Available: https://arxiv.org/abs/2603.21389

work page arXiv 2026
[20]

Lost in the middle: How language models use long contexts,

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,” Transactions of the association for computational linguistics, vol. 12, pp. 157–173, 2024

work page 2024
[21]

Context length alone hurts llm performance despite perfect retrieval,

Y . Du, M. Tian, S. Ronanki, S. Rongali, S. Bodapati, A. Galstyan, A. Wells, R. Schwartz, E. A. Huerta, and H. Peng, “Context length alone hurts llm performance despite perfect retrieval,” 2025. [Online]. Available: https://arxiv.org/abs/2510.05381

work page arXiv 2025
[22]

Let’s (not) just put things in context: Test-time training for long-context llms,

R. Bansal, A. Zhang, R. Tiwari, L. Madaan, S. S. Duvvuri, D. Khatri, D. Brandfonbrener, D. Alvarez-Melis, P. Bhargava, M. S. Kaleet al., “Let’s (not) just put things in context: Test-time training for long-context llms,”arXiv preprint arXiv:2512.13898, 2025

work page arXiv 2025
[23]

Long context, less focus: A scaling gap in llms revealed through privacy and personalization,

S. Gu, “Long context, less focus: A scaling gap in llms revealed through privacy and personalization,”arXiv preprint arXiv:2602.15028, 2026

work page arXiv 2026
[24]

Longbench pro: A more realistic and comprehensive bilingual long- context evaluation benchmark,

Z. Chen, X. Wu, J. Jia, C. Gao, Q. Fu, D. Zhang, and S. Hu, “Longbench pro: A more realistic and comprehensive bilingual long- context evaluation benchmark,”arXiv preprint arXiv:2601.02872, 2026

work page arXiv 2026
[25]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[26]

Longbench: A bilingual, multitask benchmark for long context understanding,

Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,” inProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers), 2024, pp. 3119–3137

work page 2024
[27]

Do transformers always win? an empirical study of semantic embeddings for short-text e-commerce reviews,

L. Lai, Z. Cheng, K. Cheng, and X. Qi, “Do transformers always win? an empirical study of semantic embeddings for short-text e-commerce reviews,” in2026 9th International Symposium on Big Data and Applied Statistics (ISBDAS), 2026, pp. 525–529

work page 2026
[28]

In-context autoencoder for context compression in a large language model,

T. Ge, J. Hu, L. Wang, X. Wang, S.-Q. Chen, and F. Wei, “In-context autoencoder for context compression in a large language model,”arXiv preprint arXiv:2307.06945, 2023

work page arXiv 2023
[29]

Cogvla: Cognition- aligned vision-language-action model via instruction-driven routing & sparsification,

W. Li, R. Zhang, R. Shao, J. He, and L. Nie, “Cogvla: Cognition- aligned vision-language-action model via instruction-driven routing & sparsification,” inAdvances in Neural Information Processing Systems, 2025

work page 2025
[30]

Reasoning-enhanced domain-adaptive pretraining of multimodal large language models for short video content governance,

Z. Wang, Y . Sun, H. Wang, B. Jing, X. Shen, X. Dong, Z. Hao, H. Xiong, and Y . Song, “Reasoning-enhanced domain-adaptive pretraining of multimodal large language models for short video content governance,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, S. Potdar, L. Rojas-Barahona, and S. Montell...

work page 2025
[31]

Audio-enhanced vision-language modeling with latent space broadening for high quality data expansion,

Y . Sun, Y . Li, R. Sun, C. Liu, F. Zhou, Z. Jin, L. Wang, X. Shen, Z. Hao, and H. Xiong, “Audio-enhanced vision-language modeling with latent space broadening for high quality data expansion,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2, ser. KDD ’25. New York, NY , USA: Association for Computing Machinery...

work page doi:10.1145/3711896.3737195 2025
[32]

Human Motion Instruction Tuning,

L. Li, S. Jia, J. Wang, Z. Jiang, F. Zhou, J. Dai, T. Zhang, Z. Wu, and J.-N. Hwang, “Human Motion Instruction Tuning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[33]

Balf: Simple and efficient blur aware local feature detector,

Z. Zhao, “Balf: Simple and efficient blur aware local feature detector,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 3362–3372

work page 2024
[34]

Semanticvla: Semantic-aligned sparsification and enhancement for ef- ficient robotic manipulation,

W. Li, R. Zhang, R. Shao, Z. Fang, K. Zhou, Z. Tian, and L. Nie, “Semanticvla: Semantic-aligned sparsification and enhancement for ef- ficient robotic manipulation,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026

work page 2026
[35]

Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning

J. Rao, X. Liu, H. Deng, Z. Lin, Z. Yu, J. Wei, X. Meng, and M. Zhang, “Dynamic sampling that adapts: Iterative dpo for self-aware mathematical reasoning,” 2025. [Online]. Available: https://arxiv.org/abs/2505.16176

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Resilient routing: Risk-aware dynamic routing in smart logistics via spatiotemporal graph learning,

Z. Xue, S. Zhao, Y . Qi, X. Zeng, and Z. Yu, “Resilient routing: Risk-aware dynamic routing in smart logistics via spatiotemporal graph learning,” 2026. [Online]. Available: https://arxiv.org/abs/2601.13632

work page arXiv 2026
[37]

Resolving the robustness-precision trade-off in financial rag through hybrid document-routed retrieval,

Z. Cheng, L. Lai, and Y . Liu, “Resolving the robustness-precision trade-off in financial rag through hybrid document-routed retrieval,”

work page
[38]

Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval

[Online]. Available: https://arxiv.org/abs/2603.26815

work page internal anchor Pith review Pith/arXiv arXiv
[39]

GPT-5.4 mini,

OpenAI, “GPT-5.4 mini,” OpenAI, Technical Report, 2026. [Online]. Available: https://platform.openai.com/docs/models

work page 2026
[40]

Semantic autoen- coder for modeling beol and mol dielectric lifetime distributions,

W. Yan, E. Wu, A. G. Schwing, and E. Rosenbaum, “Semantic autoen- coder for modeling beol and mol dielectric lifetime distributions,” in 2023 IEEE International Reliability Physics Symposium (IRPS). IEEE, 2023, pp. 1–9

work page 2023
[41]

New loss function for learning dielectric thickness distributions and generative modeling of breakdown lifetime,

W. Yan, E. Wu, and E. Rosenbaum, “New loss function for learning dielectric thickness distributions and generative modeling of breakdown lifetime,” in2025 IEEE International Reliability Physics Symposium (IRPS). IEEE, 2025, pp. 1–9

work page 2025

[1] [1]

Industrial applications of large language models,

M. Raza, Z. Jahangir, M. B. Riaz, M. J. Saeed, and M. A. Sattar, “Industrial applications of large language models,”Scientific Reports, vol. 15, no. 1, p. 13755, Apr. 2025

work page 2025

[2] [2]

Dissecting the runtime performance of the training, fine-tuning, and inference of large language models,

L. Zhang, X. Liu, Z. Li, X. Pan, P. Dong, R. Fan, R. Guo, X. Wang, Q. Luo, S. Shi, and X. Chu, “Dissecting the runtime performance of the training, fine-tuning, and inference of large language models,”

work page

[3] [3]

Available: https://arxiv.org/abs/2311.03687

[Online]. Available: https://arxiv.org/abs/2311.03687

work page arXiv

[4] [4]

Evaluation of tunnel rock mass integrity using multi-modal data and generative large model: Tunnel rip-gpt,

C. Wu, H. Huang, and Y .-Q. Ni, “Evaluation of tunnel rock mass integrity using multi-modal data and generative large model: Tunnel rip-gpt,”SSRN Electronic Journal, 2025. [Online]. Available: https://ssrn.com/abstract=5348429

work page 2025

[5] [5]

Sustainable ai: Environmental implications, challenges and opportunities,

C.-J. Wu, R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, G. Chang, F. Aga, J. Huang, C. Baiet al., “Sustainable ai: Environmental implications, challenges and opportunities,”Proceedings of machine learning and systems, vol. 4, pp. 795–813, 2022

work page 2022

[6] [6]

Environmental and economic costs behind llms,

P. L ´opez- ´Ubeda, T. Mart´ın-Noguerol, and A. Luna, “Environmental and economic costs behind llms,”Nature Reviews Electrical Engineering, vol. 21, no. 3, pp. 661–663, Mar. 2026

work page 2026

[7] [7]

Longllmlingua: Accelerating and enhancing llms in long context sce- narios via prompt compression,

H. Jiang, Q. Wu, X. Luo, D. Li, C.-Y . Lin, Y . Yang, and L. Qiu, “Longllmlingua: Accelerating and enhancing llms in long context sce- narios via prompt compression,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 1658–1677

work page 2024

[8] [8]

Retrieval meets long context large language models,

P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Subramanian, E. Bakhturina, M. Shoeybi, and B. Catanzaro, “Retrieval meets long context large language models,” inInternational Conference on Learning Representations, vol. 2024, 2024, pp. 49 569–49 584

work page 2024

[9] [9]

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

D. Jiang, Y . Li, G. Li, and B. Li, “Magma: A multi-graph based agentic memory architecture for ai agents,”arXiv preprint arXiv:2601.03236, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [10]

Holistic Evaluation of Language Models

“Holistic evaluation of language models,” 2023. [Online]. Available: https://arxiv.org/abs/2211.09110

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents,

N. Pollertlam and W. Kornsuwannawit, “Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents,” 2026. [Online]. Available: https://arxiv.org/ abs/2603.04814

work page arXiv 2026

[12] [12]

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

D. Jiang, Y . Li, S. Wei, J. Yang, A. Kishore, A. Zhao, D. Kang, X. Hu, F. Chen, Q. Liet al., “Anatomy of agentic memory: Taxonomy and empirical analysis of evaluation and system limitations,”arXiv preprint arXiv:2602.19320, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “Hotpotqa: A dataset for diverse, explainable multi-hop question answering,” 2018. [Online]. Available: https: //arxiv.org/abs/1809.09600

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Benchmark for evaluating initialization of visual-inertial odometry,

Z. Zhao and B. M. Chen, “Benchmark for evaluating initialization of visual-inertial odometry,” in2023 42nd Chinese Control Conference (CCC). IEEE, 2023, pp. 3935–3940

work page 2023

[15] [15]

A data-centric perspective on the lifecycle of large language models,

J. Rao, X. Liu, H. Yan, J. Shen, H. Mo, Y . Dong, Z. Yan, Z. Wang, Z. Lin, X. Meng, Z. Yu, L. Deng, J. Wei, Y . Wang, and M. Zhang, “A data-centric perspective on the lifecycle of large language models,” TechRxiv, vol. 2025, no. 1220, 2025. [Online]. Available: https: //www.techrxiv.org/doi/abs/10.36227/techrxiv.176620610.03288677/v1

work page doi:10.36227/techrxiv.176620610.03288677/v1 2025

[16] [16]

Green ai,

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green ai,” Communications of the ACM, vol. 63, no. 12, pp. 54–63, 2020

work page 2020

[17] [17]

Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios

J. Zang, Y . Wei, R. Bai, S. Jiang, N. Mo, B. Li, Q. Sun, and H. Liu, “Reward auditor: Inference on reward modeling suitability in real-world perturbed scenarios,”arXiv preprint arXiv:2512.00920, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data,

W. Sun, Z. Qi, and Q. Shen, “High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data,” in2025 5th International Conference on Digital Society and Intelligent Systems (DSInS), 2025, pp. 207–212

work page 2025

[19] [19]

Task-specific efficiency analysis: When small language models outperform large language models,

J. Cao, Y . Ma, X. Li, Q. Ren, and X. Chen, “Task-specific efficiency analysis: When small language models outperform large language models,” 2026. [Online]. Available: https://arxiv.org/abs/2603.21389

work page arXiv 2026

[20] [20]

Lost in the middle: How language models use long contexts,

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,” Transactions of the association for computational linguistics, vol. 12, pp. 157–173, 2024

work page 2024

[21] [21]

Context length alone hurts llm performance despite perfect retrieval,

Y . Du, M. Tian, S. Ronanki, S. Rongali, S. Bodapati, A. Galstyan, A. Wells, R. Schwartz, E. A. Huerta, and H. Peng, “Context length alone hurts llm performance despite perfect retrieval,” 2025. [Online]. Available: https://arxiv.org/abs/2510.05381

work page arXiv 2025

[22] [22]

Let’s (not) just put things in context: Test-time training for long-context llms,

R. Bansal, A. Zhang, R. Tiwari, L. Madaan, S. S. Duvvuri, D. Khatri, D. Brandfonbrener, D. Alvarez-Melis, P. Bhargava, M. S. Kaleet al., “Let’s (not) just put things in context: Test-time training for long-context llms,”arXiv preprint arXiv:2512.13898, 2025

work page arXiv 2025

[23] [23]

Long context, less focus: A scaling gap in llms revealed through privacy and personalization,

S. Gu, “Long context, less focus: A scaling gap in llms revealed through privacy and personalization,”arXiv preprint arXiv:2602.15028, 2026

work page arXiv 2026

[24] [24]

Longbench pro: A more realistic and comprehensive bilingual long- context evaluation benchmark,

Z. Chen, X. Wu, J. Jia, C. Gao, Q. Fu, D. Zhang, and S. Hu, “Longbench pro: A more realistic and comprehensive bilingual long- context evaluation benchmark,”arXiv preprint arXiv:2601.02872, 2026

work page arXiv 2026

[25] [25]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017

[26] [26]

Longbench: A bilingual, multitask benchmark for long context understanding,

Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,” inProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers), 2024, pp. 3119–3137

work page 2024

[27] [27]

Do transformers always win? an empirical study of semantic embeddings for short-text e-commerce reviews,

L. Lai, Z. Cheng, K. Cheng, and X. Qi, “Do transformers always win? an empirical study of semantic embeddings for short-text e-commerce reviews,” in2026 9th International Symposium on Big Data and Applied Statistics (ISBDAS), 2026, pp. 525–529

work page 2026

[28] [28]

In-context autoencoder for context compression in a large language model,

T. Ge, J. Hu, L. Wang, X. Wang, S.-Q. Chen, and F. Wei, “In-context autoencoder for context compression in a large language model,”arXiv preprint arXiv:2307.06945, 2023

work page arXiv 2023

[29] [29]

Cogvla: Cognition- aligned vision-language-action model via instruction-driven routing & sparsification,

W. Li, R. Zhang, R. Shao, J. He, and L. Nie, “Cogvla: Cognition- aligned vision-language-action model via instruction-driven routing & sparsification,” inAdvances in Neural Information Processing Systems, 2025

work page 2025

[30] [30]

Reasoning-enhanced domain-adaptive pretraining of multimodal large language models for short video content governance,

Z. Wang, Y . Sun, H. Wang, B. Jing, X. Shen, X. Dong, Z. Hao, H. Xiong, and Y . Song, “Reasoning-enhanced domain-adaptive pretraining of multimodal large language models for short video content governance,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, S. Potdar, L. Rojas-Barahona, and S. Montell...

work page 2025

[31] [31]

Audio-enhanced vision-language modeling with latent space broadening for high quality data expansion,

Y . Sun, Y . Li, R. Sun, C. Liu, F. Zhou, Z. Jin, L. Wang, X. Shen, Z. Hao, and H. Xiong, “Audio-enhanced vision-language modeling with latent space broadening for high quality data expansion,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2, ser. KDD ’25. New York, NY , USA: Association for Computing Machinery...

work page doi:10.1145/3711896.3737195 2025

[32] [32]

Human Motion Instruction Tuning,

L. Li, S. Jia, J. Wang, Z. Jiang, F. Zhou, J. Dai, T. Zhang, Z. Wu, and J.-N. Hwang, “Human Motion Instruction Tuning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025

[33] [33]

Balf: Simple and efficient blur aware local feature detector,

Z. Zhao, “Balf: Simple and efficient blur aware local feature detector,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 3362–3372

work page 2024

[34] [34]

Semanticvla: Semantic-aligned sparsification and enhancement for ef- ficient robotic manipulation,

W. Li, R. Zhang, R. Shao, Z. Fang, K. Zhou, Z. Tian, and L. Nie, “Semanticvla: Semantic-aligned sparsification and enhancement for ef- ficient robotic manipulation,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026

work page 2026

[35] [35]

Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning

J. Rao, X. Liu, H. Deng, Z. Lin, Z. Yu, J. Wei, X. Meng, and M. Zhang, “Dynamic sampling that adapts: Iterative dpo for self-aware mathematical reasoning,” 2025. [Online]. Available: https://arxiv.org/abs/2505.16176

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

Resilient routing: Risk-aware dynamic routing in smart logistics via spatiotemporal graph learning,

Z. Xue, S. Zhao, Y . Qi, X. Zeng, and Z. Yu, “Resilient routing: Risk-aware dynamic routing in smart logistics via spatiotemporal graph learning,” 2026. [Online]. Available: https://arxiv.org/abs/2601.13632

work page arXiv 2026

[37] [37]

Resolving the robustness-precision trade-off in financial rag through hybrid document-routed retrieval,

Z. Cheng, L. Lai, and Y . Liu, “Resolving the robustness-precision trade-off in financial rag through hybrid document-routed retrieval,”

work page

[38] [38]

Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval

[Online]. Available: https://arxiv.org/abs/2603.26815

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

GPT-5.4 mini,

OpenAI, “GPT-5.4 mini,” OpenAI, Technical Report, 2026. [Online]. Available: https://platform.openai.com/docs/models

work page 2026

[40] [40]

Semantic autoen- coder for modeling beol and mol dielectric lifetime distributions,

W. Yan, E. Wu, A. G. Schwing, and E. Rosenbaum, “Semantic autoen- coder for modeling beol and mol dielectric lifetime distributions,” in 2023 IEEE International Reliability Physics Symposium (IRPS). IEEE, 2023, pp. 1–9

work page 2023

[41] [41]

New loss function for learning dielectric thickness distributions and generative modeling of breakdown lifetime,

W. Yan, E. Wu, and E. Rosenbaum, “New loss function for learning dielectric thickness distributions and generative modeling of breakdown lifetime,” in2025 IEEE International Reliability Physics Symposium (IRPS). IEEE, 2025, pp. 1–9

work page 2025