Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing

Mehmet Utku Colak

arxiv: 2606.03618 · v1 · pith:I4PPOL3Gnew · submitted 2026-06-02 · 💻 cs.AI

Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing

Mehmet Utku Colak This is my paper

Pith reviewed 2026-06-28 09:55 UTC · model grok-4.3

classification 💻 cs.AI

keywords prompt optimizationtoken reductionlocal LLM middlewarecross-lingual translationcode agentsmultilingual codingcontext windowstructural rewriting

0 comments

The pith

A small local model rewrites multilingual coding prompts into compact English before they reach cloud agents, cutting prompt tokens 34-47 percent while holding or raising accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a pre-flight middleware that sits between a developer and a cloud coding agent. A local 3B-parameter model first translates non-English specifications into English and then rewrites the prompt into a shorter, task-oriented structure, with a safeguard that prevents the output from ever exceeding the original length. When tested on a benchmark covering Turkish, Arabic, Chinese, and mixed-language coding tasks, the approach lowers prompt tokens by 34-47 percent and total tokens by up to 18.8 percent across three commercial backends, with no loss in task success rate. Ablations indicate that the structural rewrite, rather than simple name extraction, drives most of the saving, and the method beats an existing compression baseline at matched rates.

Core claim

The central claim is that proactive edge-side rewriting by a small local model can arbitrage tokenization differences across languages and reduce structural entropy in prompts, thereby shrinking context windows for downstream code agents without degrading the quality of the generated solutions.

What carries the argument

The middleware that runs cross-lingual translation into English followed by structural rewriting into a compact task-oriented format, protected by regex-validated rewrite-with-fallback.

If this is right

Token spend for AI coding agents can be reduced at the input stage rather than after bloat occurs.
The same middleware works with multiple commercial backends without requiring changes to the agent itself.
Most of the token saving comes from the rewriting step, not from language translation alone.
The method remains effective even when compared against other compression techniques at the same compression ratio.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-model rewrite pattern could be applied to non-coding agent prompts that also mix languages or contain conversational noise.
If the rewrite rules were made explicit rather than learned, the approach might run on even smaller or non-LLM edge devices.
Accuracy preservation on one benchmark leaves open whether the same middleware would hold for longer, multi-turn coding sessions.

Load-bearing premise

The local 3B model can translate and rewrite prompts without changing their meaning in ways that would lower accuracy on the multilingual coding tasks.

What would settle it

Running the same OMH-Polyglot tasks with and without the middleware and observing that accuracy falls for any of the three commercial backends would falsify the preservation claim.

read the original abstract

AI-assisted coding agents are bottlenecked by input-token cost. Two pathologies of raw human input drive much of this overhead: tokenization inefficiency for non-English text and structural entropy in conversational prompts. Existing approaches act reactively by compressing already-bloated contexts or intervening after failures occur. We introduce a pre-flight, edge-side prompt-rewriting middleware that operates between the developer and the cloud agent. A local Llama 3.2 (3B) model performs cross-lingual translation into English, structural rewriting into a compact task-oriented format, and regex-validated rewrite-with-fallback safeguards to ensure the optimized prompt is never larger than the original. We evaluate on OMH-Polyglot, a multilingual coding benchmark spanning Turkish, Arabic, Chinese, and code-switched specifications. Across three commercial LLM backends, the middleware reduces prompt tokens by 34-47 percent and total tokens by up to 18.8 percent while preserving or improving task accuracy. Ablation studies show that gains arise primarily from the rewriting stage rather than simple function-name extraction. Compared with LLMLingua-2 at matched compression rates, our method consistently achieves superior OckScore performance across all evaluated backends. These results demonstrate that proactive prompt optimization can substantially reduce inference costs without sacrificing coding quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A local 3B model does cross-lingual translation and structural rewriting on coding prompts to cut tokens 34-47 percent on three backends while accuracy holds on their new benchmark, but the abstract gives no direct check on whether the rewrites keep meaning intact.

read the letter

The main thing here is a practical edge-side middleware that runs Llama 3.2 3B to translate non-English specs into English and then rewrite them into compact task format before they hit the cloud agent. They add regex size guards so the output never grows, and they test on a new OMH-Polyglot set that mixes Turkish, Arabic, Chinese, and code-switched prompts.

The concrete results are the useful part: prompt token reductions of 34-47 percent and total token savings up to 18.8 percent across three commercial backends, with accuracy preserved or better. The ablation credits the rewriting step more than simple extraction, and the method beats LLMLingua-2 at matched compression on their OckScore.

The soft spot is exactly what the stress test flags. Nothing in the abstract shows a semantic-equivalence check, human review of the rewrites, or error cases where the 3B model might drop constraints or mangle domain terms. Accuracy on the downstream task could stay flat for reasons unrelated to the middleware if the benchmark is lenient.

This is aimed at teams shipping coding agents that see international inputs and want to trim inference spend. The benchmark and the side-by-side numbers give something concrete to test against.

Send it for review. The empirical claims are large enough to be worth checking the full methods and any additional validation they ran.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a pre-flight, edge-side middleware that uses a local Llama 3.2 (3B) model to translate non-English coding prompts into English and rewrite them into a compact task-oriented format, with regex-validated safeguards to prevent size increases. Evaluated on the OMH-Polyglot benchmark (Turkish, Arabic, Chinese, and code-switched specifications), the approach is claimed to reduce prompt tokens by 34-47% and total tokens by up to 18.8% across three commercial LLM backends while preserving or improving task accuracy. Ablations attribute gains primarily to the rewriting stage rather than simple extraction, and the method outperforms LLMLingua-2 at matched compression rates on OckScore.

Significance. If the accuracy preservation holds, the work offers a practical proactive method for lowering token costs in multilingual code-agent workflows, distinct from post-hoc compression techniques. The reported empirical savings, multi-backend evaluation, and ablation isolating the rewriting contribution provide concrete evidence of utility; the edge-local deployment and fallback safeguards are pragmatic strengths.

major comments (2)

[Abstract] Abstract: The central claim that task accuracy is preserved or improved rests on the assumption that cross-lingual translation and structural rewriting by the 3B model introduce no semantic drift (e.g., altered requirements or dropped constraints). No semantic-equivalence metric, human validation of rewrites, or error analysis on failure cases is described, leaving the accuracy result vulnerable to benchmark tolerance or prompt-style artifacts rather than true fidelity.
[Evaluation / Ablation studies] Evaluation / Ablation studies: The statement that gains arise primarily from the rewriting stage (rather than function-name extraction) is load-bearing for the method's novelty, yet the manuscript supplies no details on baseline construction, statistical tests for the reported differences, or how OckScore was computed at matched compression rates versus LLMLingua-2.

minor comments (2)

[Abstract] The acronym 'OckScore' appears without definition or reference; add a brief explanation or citation in the abstract and results.
[Method] The regex safeguard mechanism is mentioned but not specified (e.g., exact patterns or fallback behavior); a short pseudocode or description would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address the two major comments below and will revise the manuscript to incorporate additional validation and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that task accuracy is preserved or improved rests on the assumption that cross-lingual translation and structural rewriting by the 3B model introduce no semantic drift (e.g., altered requirements or dropped constraints). No semantic-equivalence metric, human validation of rewrites, or error analysis on failure cases is described, leaving the accuracy result vulnerable to benchmark tolerance or prompt-style artifacts rather than true fidelity.

Authors: We agree that the manuscript does not provide direct semantic-equivalence metrics or human validation, relying instead on downstream task accuracy on OMH-Polyglot as the primary fidelity signal. This leaves open the possibility of compensated drift. In revision we will add a dedicated error-analysis subsection that reports (1) manual semantic-fidelity ratings on a stratified sample of 100 rewrites by two annotators (with Cohen’s κ), (2) counts of dropped constraints or altered requirements, and (3) per-language breakdown of cases where accuracy declined. These additions will be placed in Section 4.3 and referenced from the abstract. revision: yes
Referee: [Evaluation / Ablation studies] Evaluation / Ablation studies: The statement that gains arise primarily from the rewriting stage (rather than function-name extraction) is load-bearing for the method's novelty, yet the manuscript supplies no details on baseline construction, statistical tests for the reported differences, or how OckScore was computed at matched compression rates versus LLMLingua-2.

Authors: We acknowledge that the current text omits explicit baseline-construction details, statistical tests, and the precise OckScore protocol. The ablation variants were created by (a) translation-only, (b) extraction-only, and (c) full rewrite pipelines applied to the same source prompts; token counts were measured with the respective backend tokenizers. In the revised version we will (1) list the exact prompt templates used for each ablation arm, (2) report paired t-test p-values for all token-reduction and accuracy deltas, and (3) append a paragraph in Section 5.2 that reproduces the OckScore formula together with the compression-rate matching procedure used against LLMLingua-2. These changes will be marked as new material. revision: yes

Circularity Check

0 steps flagged

No circularity; results are direct empirical measurements on external benchmarks and backends

full rationale

The paper reports measured token reductions (34-47% prompt, up to 18.8% total) and accuracy preservation on OMH-Polyglot using three commercial LLM backends after local preprocessing. No equations, fitted parameters, self-citations, or derivations are described that would reduce any reported quantity to a quantity defined by the method itself. The central claims rest on external falsifiable measurements rather than internal redefinitions or self-referential steps. This matches the default expectation of a non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a 3B local model can translate and rewrite prompts without semantic loss; no free parameters or invented entities are introduced.

axioms (1)

domain assumption A 3B-parameter local LLM can reliably translate non-English coding prompts to English and rewrite them structurally without introducing errors that affect downstream task accuracy.
This assumption is required for the claim that accuracy is preserved while tokens are reduced.

pith-pipeline@v0.9.1-grok · 5757 in / 1393 out tokens · 28648 ms · 2026-06-28T09:55:04.036640+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 9 canonical work pages · 2 internal anchors

[1]

Sanchit Ahuja, Praneetha Vaddamanu, and Barun Patra. 2025. Efficientxlang: Towards improving token efficiency through cross-lingual reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15612--15624

2025
[2]

Meta AI. 2024. Llama 3.2: Lightweight and multimodal edge models. https://ai.azure.com/catalog/models/Llama-3.2-1B-Instruct

2024
[3]

Black Duck Software . 2026. https://www.blackduck.com/content/dam/black-duck/en-us/reports/rep-ossra.pdf 2026 open source security and risk analysis report . Technical report, Synopsys

2026
[4]

Boyuan Chen, Mingzhi Zhu, Brendan Dolan-Gavitt, Muhammad Shafique, and Siddharth Garg. 2024. Model cascading for code: A cascaded black-box multi-model framework for cost-efficient code completion with self-testing. arXiv preprint arXiv:2405.15842

work page arXiv 2024
[5]

Zheng Du, Hao Kang, Song Han, Tushar Krishna, and Ligeng Zhu. 2026. Ockbench: Measuring the efficiency of llm reasoning. arXiv preprint arXiv:2511.05722

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Weizhi Fei, Xueyan Niu, Guoqing Xie, Yingqing Liu, Bo Bai, and Wei Han. 2025. https://arxiv.org/abs/2501.12959 Efficient prompt compression with evaluator heads for long-context transformer inference . In Advances in Neural Information Processing Systems

work page arXiv 2025
[7]

Le Hai and 1 others. 2026. Repository-level code generation: A survey. arXiv preprint arXiv:2602.11671

work page arXiv 2026
[8]

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yu Yang, and Lili Qiu. 2023. LLML ingua: Compressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics

2023
[9]

Carlos E Jimenez, John Yang Murphy, Paul Xia, Aida Wilbur MacMillan, and 1 others. 2024. SWE -bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations

2024
[10]

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023. https://aclanthology.org/2023.emnlp-main.391 Compressing context to enhance inference efficiency of large language models . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6342--6353. Association for Computational Linguistics

2023
[11]

Wei Liu and 1 others. 2024. Graphcoder: Enhancing repository-level code completion via code context graph-based retrieval and language model. arXiv preprint arXiv:2406.07003

work page arXiv 2024
[12]

Hanzhen Lu, Lishui Fan, Jiachi Chen, and Zhongxin Liu. 2026. Balancing latency and accuracy of code completion via local-cloud model cascading. Preprint

2026
[13]

Yuanchi Ma and 1 others. 2025. Sketch-of-thought (sot): A prompting framework for reducing token usage via linguistic constraints. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

2025
[14]

Sergey Mechtaev and 1 others. 2026. Compressing code context for llm-based issue resolution. arXiv preprint arXiv:2603.28119

work page arXiv 2026
[15]

MorphLLM . 2026. https://www.morphllm.com/context-engineering Context engineering: The key to efficient code agents

2026
[16]

Vicky Zhao, Lili Qiu, and Dongmei Zhang

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor R\" u hle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Dongmei Zhang. 2024. https://aclanthology.org/2024.findings-acl.57 LLML ingua-2: Data distillation for efficient and faithful task-agnostic prompt compression . In Findings of the Association f...

2024
[17]

Yun Peng, Jun Wan, Yichen Li, and Xiaoxue Ren. 2025. COFFE : A code efficiency benchmark for code generation. Proceedings of the ACM on Software Engineering, 2(FSE):FSE012

2025
[18]

Aleksandar Petrov, Emanuele La Malfa, Adel Bibi, and Philip HS Torr. 2023. Language model tokenizers introduce unfairness between languages. In Advances in Neural Information Processing Systems

2023
[19]

Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, and Michael Zeng. 2023. https://aclanthology.org/2023.emnlp-main.494 Automatic prompt optimization with `` gradient descent '' and beam search . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957--7968. Association for Computational Linguistics

2023
[20]

Yuling Shi and 1 others. 2025. https://arxiv.org/abs/2510.00446 Longcodezip: Compress long context for code language models . In Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering

work page arXiv 2025
[21]

Hamed Taherkhani, Melika Sepidband, Hung Viet Pham, Song Wang, and Hadi Hemmati. 2025. Automated prompt engineering for cost-effective code generation using evolutionary algorithm. Proceedings of the ACM on Software Engineering, 1(1)

2025
[22]

Teklehaymanot and A

F. Teklehaymanot and A. Petrov. 2025. Tokenization disparities: Systematic differences in segmenting linguistic input. Emergent Mind: AI Research Index

2025
[23]

Yuan-An Xiao, Pengfei Gao, Chao Peng, and Yingfei Xiong. 2026. https://arxiv.org/abs/2509.23586 Reducing cost of llm agents with trajectory reduction . In Proceedings of the ACM on Software Engineering

work page arXiv 2026
[24]

Tom Zehle, Moritz Schlager, Timo Heiss, and Matthias Feurer. 2025. https://openreview.net/forum?id=UweaRrg9D0 CAPO : Cost-aware prompt optimization . In 4th International Conference on Automated Machine Learning

2025
[25]

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. 2023. https://aclanthology.org/2023.emnlp-main.151 R epo C oder: Repository-level code completion through iterative retrieval and generation . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2471-...

2023
[26]

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Porges, Harris Chan, Stella Biderman, Lillian Weng, and Timnit Gebru. 2023. https://arxiv.org/abs/2211.01910 Large language models are human-level prompt engineers . In The Eleventh International Conference on Learning Representations

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Sanchit Ahuja, Praneetha Vaddamanu, and Barun Patra. 2025. Efficientxlang: Towards improving token efficiency through cross-lingual reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15612--15624

2025

[2] [2]

Meta AI. 2024. Llama 3.2: Lightweight and multimodal edge models. https://ai.azure.com/catalog/models/Llama-3.2-1B-Instruct

2024

[3] [3]

Black Duck Software . 2026. https://www.blackduck.com/content/dam/black-duck/en-us/reports/rep-ossra.pdf 2026 open source security and risk analysis report . Technical report, Synopsys

2026

[4] [4]

Boyuan Chen, Mingzhi Zhu, Brendan Dolan-Gavitt, Muhammad Shafique, and Siddharth Garg. 2024. Model cascading for code: A cascaded black-box multi-model framework for cost-efficient code completion with self-testing. arXiv preprint arXiv:2405.15842

work page arXiv 2024

[5] [5]

Zheng Du, Hao Kang, Song Han, Tushar Krishna, and Ligeng Zhu. 2026. Ockbench: Measuring the efficiency of llm reasoning. arXiv preprint arXiv:2511.05722

work page internal anchor Pith review Pith/arXiv arXiv 2026

[6] [6]

Weizhi Fei, Xueyan Niu, Guoqing Xie, Yingqing Liu, Bo Bai, and Wei Han. 2025. https://arxiv.org/abs/2501.12959 Efficient prompt compression with evaluator heads for long-context transformer inference . In Advances in Neural Information Processing Systems

work page arXiv 2025

[7] [7]

Le Hai and 1 others. 2026. Repository-level code generation: A survey. arXiv preprint arXiv:2602.11671

work page arXiv 2026

[8] [8]

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yu Yang, and Lili Qiu. 2023. LLML ingua: Compressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics

2023

[9] [9]

Carlos E Jimenez, John Yang Murphy, Paul Xia, Aida Wilbur MacMillan, and 1 others. 2024. SWE -bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations

2024

[10] [10]

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023. https://aclanthology.org/2023.emnlp-main.391 Compressing context to enhance inference efficiency of large language models . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6342--6353. Association for Computational Linguistics

2023

[11] [11]

Wei Liu and 1 others. 2024. Graphcoder: Enhancing repository-level code completion via code context graph-based retrieval and language model. arXiv preprint arXiv:2406.07003

work page arXiv 2024

[12] [12]

Hanzhen Lu, Lishui Fan, Jiachi Chen, and Zhongxin Liu. 2026. Balancing latency and accuracy of code completion via local-cloud model cascading. Preprint

2026

[13] [13]

Yuanchi Ma and 1 others. 2025. Sketch-of-thought (sot): A prompting framework for reducing token usage via linguistic constraints. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

2025

[14] [14]

Sergey Mechtaev and 1 others. 2026. Compressing code context for llm-based issue resolution. arXiv preprint arXiv:2603.28119

work page arXiv 2026

[15] [15]

MorphLLM . 2026. https://www.morphllm.com/context-engineering Context engineering: The key to efficient code agents

2026

[16] [16]

Vicky Zhao, Lili Qiu, and Dongmei Zhang

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor R\" u hle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Dongmei Zhang. 2024. https://aclanthology.org/2024.findings-acl.57 LLML ingua-2: Data distillation for efficient and faithful task-agnostic prompt compression . In Findings of the Association f...

2024

[17] [17]

Yun Peng, Jun Wan, Yichen Li, and Xiaoxue Ren. 2025. COFFE : A code efficiency benchmark for code generation. Proceedings of the ACM on Software Engineering, 2(FSE):FSE012

2025

[18] [18]

Aleksandar Petrov, Emanuele La Malfa, Adel Bibi, and Philip HS Torr. 2023. Language model tokenizers introduce unfairness between languages. In Advances in Neural Information Processing Systems

2023

[19] [19]

Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, and Michael Zeng. 2023. https://aclanthology.org/2023.emnlp-main.494 Automatic prompt optimization with `` gradient descent '' and beam search . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957--7968. Association for Computational Linguistics

2023

[20] [20]

Yuling Shi and 1 others. 2025. https://arxiv.org/abs/2510.00446 Longcodezip: Compress long context for code language models . In Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering

work page arXiv 2025

[21] [21]

Hamed Taherkhani, Melika Sepidband, Hung Viet Pham, Song Wang, and Hadi Hemmati. 2025. Automated prompt engineering for cost-effective code generation using evolutionary algorithm. Proceedings of the ACM on Software Engineering, 1(1)

2025

[22] [22]

Teklehaymanot and A

F. Teklehaymanot and A. Petrov. 2025. Tokenization disparities: Systematic differences in segmenting linguistic input. Emergent Mind: AI Research Index

2025

[23] [23]

Yuan-An Xiao, Pengfei Gao, Chao Peng, and Yingfei Xiong. 2026. https://arxiv.org/abs/2509.23586 Reducing cost of llm agents with trajectory reduction . In Proceedings of the ACM on Software Engineering

work page arXiv 2026

[24] [24]

Tom Zehle, Moritz Schlager, Timo Heiss, and Matthias Feurer. 2025. https://openreview.net/forum?id=UweaRrg9D0 CAPO : Cost-aware prompt optimization . In 4th International Conference on Automated Machine Learning

2025

[25] [25]

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. 2023. https://aclanthology.org/2023.emnlp-main.151 R epo C oder: Repository-level code completion through iterative retrieval and generation . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2471-...

2023

[26] [26]

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Porges, Harris Chan, Stella Biderman, Lillian Weng, and Timnit Gebru. 2023. https://arxiv.org/abs/2211.01910 Large language models are human-level prompt engineers . In The Eleventh International Conference on Learning Representations

work page internal anchor Pith review Pith/arXiv arXiv 2023