Budgeted Dynamic Trace Structures for Token-Efficient Sequential Computation

Faruk Alpay; Levent Sarioglu

arxiv: 2605.22879 · v1 · pith:XVMZFKRQnew · submitted 2026-05-20 · 💻 cs.DC

Budgeted Dynamic Trace Structures for Token-Efficient Sequential Computation

Faruk Alpay , Levent Sarioglu This is my paper

Pith reviewed 2026-05-25 06:05 UTC · model grok-4.3

classification 💻 cs.DC

keywords trace structurestoken budgetsequential computationgraph compactiondynamic graphshistory maintenancereachability queriessummary compaction

0 comments

The pith

Budgeted dynamic trace structures maintain rooted graphs and append-only histories under an explicit token budget.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces budgeted dynamic trace structures to keep long sequential computation traces within a fixed byte or token limit. It shows that status-filtered reachability, cursor pagination, reference-counted keys, delta overlays, bounded caches, and summary-plus-suffix compaction can be combined to preserve necessary graph structure and history information. A sympathetic reader would care because sequential processes increasingly generate traces that exceed practical context or storage limits. The work supplies formal invariants, asymptotic bounds, and benchmarks on synthetic traces and model outputs.

Core claim

Budgeted dynamic trace structures (BDTS) maintain rooted trace graphs and append-only histories under an explicit budget by combining status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction, with formal invariants and asymptotic bounds; across synthetic traces of 10,000-40,000 vertices the structures build in 0.58-2.72 ms, enumerate descendants in 0.24-1.42 ms, and compact 350k-2.71M approximate tokens to 1,048-4,120 approximate tokens while reducing 3,359-3,360 trace tokens to 432-433 tokens on three public model targets.

What carries the argument

Budgeted dynamic trace structures (BDTS), which enforce an explicit token budget on rooted trace graphs through status-filtered reachability and summary-plus-suffix compaction.

If this is right

Graphs with 10,000-40,000 vertices are built in 0.58-2.72 ms.
All descendants are enumerated in 0.24-1.42 ms.
Histories of 350k-2.71M approximate tokens compact to 1,048-4,120 approximate tokens.
Trace tokens of 3,359-3,360 reduce to 432-433 tokens on three public model targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support longer sequential agent runs without exceeding language-model context windows.
The compaction methods might apply to history maintenance in other distributed or logging systems.
Empirical tests on production agent execution traces would check whether the reported ratios persist outside synthetic cases.

Load-bearing premise

The listed techniques can be combined while preserving the stated formal invariants and asymptotic bounds without hidden conflicts or budget violations under realistic trace workloads.

What would settle it

A realistic trace workload in which the combined techniques produce either a budget overrun or a violation of reachability or history invariants.

Figures

Figures reproduced from arXiv: 2605.22879 by Faruk Alpay, Levent Sarioglu.

**Figure 1.** Figure 1: A status-filtered trace graph. With predicate P(σ) ≡ (σ = active), the descendants of r are {1, 3}, while the descendants of 2 are {4}. 2.5 Problem Variants The default problem charges only the retained suffix against the user-facing budget. This is the most convenient policy when the summary has its own reserved allowance. Three variants are useful in other settings. Charged-summary compaction charges the… view at source ↗

read the original abstract

Sequential computation increasingly produces long traces containing nested branches, status transitions, textual payloads, and compact summaries of earlier execution. This paper introduces budgeted dynamic trace structures (BDTS), a data-structural framework for maintaining rooted trace graphs and append-only histories under an explicit byte or token budget. BDTS combines status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction. We give formal invariants, asymptotic bounds, and an ancillary Rust implementation with reproducible benchmarks. Across synthetic traces with 10,000-40,000 vertices, the prototype builds graphs in 0.58-2.72 ms, enumerates all descendants in 0.24-1.42 ms, and compacts histories of 350k-2.71M approximate tokens to 1,048-4,120 approximate tokens. Tokenizer and forward measurements with three public model targets reduce 3,359-3,360 trace tokens to 432-433 tokens.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BDTS packages standard trace techniques into a budgeted manager with a working Rust prototype and concrete compaction numbers, but the formal invariants are asserted without visible derivations.

read the letter

BDTS is a data-structural framework that keeps rooted computation traces under an explicit token or byte budget by combining status-filtered reachability, cursor pagination, reference-counted keys, delta overlays, bounded caches, and summary-plus-suffix compaction. The paper ships a Rust implementation and reports reproducible benchmarks on synthetic traces of 10k-40k vertices. Build times sit at 0.58-2.72 ms, descendant enumeration at 0.24-1.42 ms, and compaction reduces 350k-2.71M tokens down to roughly 1k-4k tokens, with even stronger ratios on three public model tokenizers. Those numbers are the clearest contribution. The work is aimed at engineers who need to log long sequential traces without blowing context limits. A reader building logging layers for model reasoning or similar append-only histories would find the listed components and the measured savings directly usable. The main soft spot is the formal side. The abstract claims invariants and asymptotic bounds but shows no derivations or interaction analysis, so it is impossible to confirm that the full stack preserves the budget and append-only properties without hidden conflicts on realistic workloads. The benchmarks are on synthetic data, which is reasonable for a first prototype but leaves real-world behavior open. This is solid enough engineering work to send out for review; the implementation gives referees something concrete to check even if the proofs need more detail.

Referee Report

2 major / 1 minor

Summary. The paper introduces budgeted dynamic trace structures (BDTS) for maintaining rooted trace graphs and append-only histories under an explicit byte or token budget. BDTS combines status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction. It claims to provide formal invariants, asymptotic bounds, and reports results from a Rust implementation with benchmarks on synthetic traces showing compaction from 350k-2.71M tokens to 1,048-4,120 tokens, and further reductions on model targets.

Significance. If the invariants hold and the techniques integrate without conflicts, this could be significant for efficient trace management in sequential computation, particularly for AI model interactions where token budgets are critical. The reproducible Rust implementation and concrete benchmark numbers are strengths that allow verification of the performance claims.

major comments (2)

[Abstract] Abstract: the central claim that status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction can be combined while preserving formal invariants and asymptotic bounds is asserted without any derivations, proofs, or interaction analysis visible in the text; this is load-bearing because the performance results rest on the invariants holding under realistic workloads.
[Abstract] Abstract: the reported compaction ratios (350k-2.71M to 1,048-4,120 approximate tokens; 3,359-3,360 to 432-433 tokens) and timing numbers (0.58-2.72 ms build, 0.24-1.42 ms enumeration) are presented without error analysis, measurement methodology, or confirmation that the implementation respects the budget invariants.

minor comments (1)

[Abstract] Abstract: the qualifier 'approximate tokens' is used for both input and output sizes without definition of the approximation or tokenizer details.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for recognizing the potential significance of BDTS along with the value of the reproducible Rust implementation. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction can be combined while preserving formal invariants and asymptotic bounds is asserted without any derivations, proofs, or interaction analysis visible in the text; this is load-bearing because the performance results rest on the invariants holding under realistic workloads.

Authors: The manuscript body (Sections 3–5) supplies the formal invariants for each component, the asymptotic bounds on space and time, and the interaction analysis showing that the composition preserves the budget invariants. The abstract is intentionally concise; we will revise it to add explicit forward references to these sections and a one-sentence statement that the invariants are shown to be preserved under the stated composition rules. The benchmark results are presented only after the invariants are established, and the open-source implementation allows independent verification. revision: yes
Referee: [Abstract] Abstract: the reported compaction ratios (350k-2.71M to 1,048-4,120 approximate tokens; 3,359-3,360 to 432-433 tokens) and timing numbers (0.58-2.72 ms build, 0.24-1.42 ms enumeration) are presented without error analysis, measurement methodology, or confirmation that the implementation respects the budget invariants.

Authors: We agree that the abstract omits these details. The benchmark section of the full manuscript describes the synthetic trace generator, the measurement harness, and the reproducibility instructions; the implementation source is provided. We will expand the abstract (or add a short methods paragraph) to include (i) standard-deviation error bars across repeated runs, (ii) explicit confirmation that every reported compaction and timing result was obtained under an active token/byte budget, and (iii) a pointer to the open-source code for full methodology. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained with external implementation

full rationale

The provided abstract and text introduce BDTS via a list of techniques and assert formal invariants plus asymptotic bounds, supported by an ancillary Rust implementation with reproducible benchmarks on synthetic traces. No equations, fitted parameters, self-citations, or derivations are exhibited that reduce any claimed result to its own inputs by construction. The work positions the invariants as independent and externally verifiable, satisfying the criteria for a non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review supplies almost no explicit free parameters, axioms, or invented entities beyond the framework name itself.

axioms (1)

domain assumption Formal invariants hold for the combined trace-graph operations under budget constraints
Referenced in abstract as given but not stated or proved here.

invented entities (1)

Budgeted Dynamic Trace Structures (BDTS) no independent evidence
purpose: Data structure for maintaining rooted trace graphs and histories under explicit token/byte budget
New framework introduced by the paper; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5709 in / 1378 out tokens · 30537 ms · 2026-05-25T06:05:34.675153+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BDTS combines status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We give formal invariants, asymptotic bounds...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 2 internal anchors

[1]

Aho and Margaret J

Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search.Communications of the ACM, 18(6):333–340, 1975. doi: 10.1145/360825.360855

work page doi:10.1145/360825.360855 1975
[2]

Jon Louis Bentley and James B. Saxe. Decomposable searching problems i: Static-to-dynamic transformation.Journal of Algorithms, 1(4):301–358, 1980. doi: 10.1016/0196-6774(80)90015-2

work page doi:10.1016/0196-6774(80)90015-2 1980
[3]

Cormen, Charles E

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.Introduction to Algorithms. MIT Press, 3 edition, 2009

work page 2009
[4]

Muthukrishnan

Graham Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications.Journal of Algorithms, 55(1):58–75, 2005. doi: 10.1016/j.jalgor. 2003.12.001

work page doi:10.1016/j.jalgor 2005
[5]

Maintaining stream statistics over sliding windows.SIAM Journal on Computing, 31(6):1794–1813, 2002

Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows.SIAM Journal on Computing, 31(6):1794–1813, 2002. doi: 10.1137/ S0097539701398363. 20

work page 2002
[6]

Italiano

Camil Demetrescu and Giuseppe F. Italiano. A new approach to dynamic all pairs shortest paths.Journal of the ACM, 51(6):968–992, 2004. doi: 10.1145/1039488.1039492

work page doi:10.1145/1039488.1039492 2004
[7]

An on-line edge-deleti on problem

Shimon Even and Yossi Shiloach. An on-line edge-deletion problem.Journal of the ACM, 28(1): 1–4, 1981. doi: 10.1145/322234.322235

work page doi:10.1145/322234.322235 1981
[8]

A new algorithm for data compression.C Users Journal, 12(2):23–38, 1994

Philip Gage. A new algorithm for data compression.C Users Journal, 12(2):23–38, 1994

work page 1994
[9]

Randomized fully dynamic graph algorithms with polylogarithmic time per operation

Monika Rauch Henzinger and Valerie King. Randomized fully dynamic graph algorithms with polylogarithmic time per operation. InProceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, pages 519–527, 1999. doi: 10.1145/301250.301383

work page doi:10.1145/301250.301383 1999
[10]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015

work page 2015
[11]

distilbert/distilgpt2 model card

Hugging Face. distilbert/distilgpt2 model card. https://huggingface.co/distilbert/ distilgpt2, 2026. Accessed 20 May 2026

work page 2026
[12]

Space-efficient static trees and graphs

Guy Jacobson. Space-efficient static trees and graphs. In30th Annual Symposium on Foundations of Computer Science, pages 549–554, 1989. doi: 10.1109/SFCS.1989.63533

work page doi:10.1109/sfcs.1989.63533 1989
[13]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018. doi: 10.18653/v1/D18-2012

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018
[14]

Nimrod Megiddo and Dharmendra S. Modha. Arc: A self-tuning, low overhead replacement cache. InProceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 115–130, 2003

work page 2003
[15]

Muthukrishnan.Data Streams: Algorithms and Applications

S. Muthukrishnan.Data Streams: Algorithms and Applications. Now Publishers, 2005. doi: 10.1561/0400000002

work page doi:10.1561/0400000002 2005
[16]

Cambridge University Press,

Gonzalo Navarro.Compact Data Structures: A Practical Approach. Cambridge University Press,

work page
[17]

doi: 10.1017/CBO9781316588284

work page doi:10.1017/cbo9781316588284
[18]

Language models are unsupervised multitask learners, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019. Technical report

work page 2019
[19]

Srinivasa Rao

Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. InProceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 233–242, 2002

work page 2002
[20]

Improved dynamic reachabil ity algorithms for directed graphs

Liam Roditty and Uri Zwick. Improved dynamic reachability algorithms for directed graphs. SIAM Journal on Computing, 37(5):1455–1471, 2008. doi: 10.1137/060650271

work page doi:10.1137/060650271 2008
[21]

Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. InNeurIPS Workshop on Energy Efficient Machine Learning and Cognitive Computing, 2019

work page 2019
[22]

Dynamic transitive closure via dynamic matrix inverse

Piotr Sankowski. Dynamic transitive closure via dynamic matrix inverse. In45th Annual IEEE Symposium on Foundations of Computer Science, pages 509–517, 2004. doi: 10.1109/FOCS. 2004.25. 21

work page doi:10.1109/focs 2004
[23]

Neural machine translation of rare words with subword units

Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016. doi: 10.18653/v1/P16-1162

work page doi:10.18653/v1/p16-1162 2016
[24]

Sleator and Robert E

Daniel D. Sleator and Robert E. Tarjan. Amortized efficiency of list update and paging rules. Communications of the ACM, 28(2):202–208, 1985. doi: 10.1145/2786.2793

work page doi:10.1145/2786.2793 1985
[25]

Robert E. Tarjan. Depth-first search and linear graph algorithms.SIAM Journal on Computing, 1(2):146–160, 1972. doi: 10.1137/0201010

work page doi:10.1137/0201010 1972
[26]

Terry A. Welch. A technique for high-performance data compression.Computer, 17(6):8–19,

work page
[27]

doi: 10.1109/MC.1984.1659158

work page doi:10.1109/mc.1984.1659158 1984
[28]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01068 2022
[29]

A universal algorithm for sequential data compression.IEEE Transactions on Information Theory, 23(3):337–343, 1977

Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression.IEEE Transactions on Information Theory, 23(3):337–343, 1977. doi: 10.1109/TIT.1977.1055714

work page doi:10.1109/tit.1977.1055714 1977
[30]

Compression of individual sequences via variable-rate coding

Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530–536, 1978. doi: 10.1109/TIT.1978.1055934. 22

work page doi:10.1109/tit.1978.1055934 1978

[1] [1]

Aho and Margaret J

Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search.Communications of the ACM, 18(6):333–340, 1975. doi: 10.1145/360825.360855

work page doi:10.1145/360825.360855 1975

[2] [2]

Jon Louis Bentley and James B. Saxe. Decomposable searching problems i: Static-to-dynamic transformation.Journal of Algorithms, 1(4):301–358, 1980. doi: 10.1016/0196-6774(80)90015-2

work page doi:10.1016/0196-6774(80)90015-2 1980

[3] [3]

Cormen, Charles E

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.Introduction to Algorithms. MIT Press, 3 edition, 2009

work page 2009

[4] [4]

Muthukrishnan

Graham Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications.Journal of Algorithms, 55(1):58–75, 2005. doi: 10.1016/j.jalgor. 2003.12.001

work page doi:10.1016/j.jalgor 2005

[5] [5]

Maintaining stream statistics over sliding windows.SIAM Journal on Computing, 31(6):1794–1813, 2002

Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows.SIAM Journal on Computing, 31(6):1794–1813, 2002. doi: 10.1137/ S0097539701398363. 20

work page 2002

[6] [6]

Italiano

Camil Demetrescu and Giuseppe F. Italiano. A new approach to dynamic all pairs shortest paths.Journal of the ACM, 51(6):968–992, 2004. doi: 10.1145/1039488.1039492

work page doi:10.1145/1039488.1039492 2004

[7] [7]

An on-line edge-deleti on problem

Shimon Even and Yossi Shiloach. An on-line edge-deletion problem.Journal of the ACM, 28(1): 1–4, 1981. doi: 10.1145/322234.322235

work page doi:10.1145/322234.322235 1981

[8] [8]

A new algorithm for data compression.C Users Journal, 12(2):23–38, 1994

Philip Gage. A new algorithm for data compression.C Users Journal, 12(2):23–38, 1994

work page 1994

[9] [9]

Randomized fully dynamic graph algorithms with polylogarithmic time per operation

Monika Rauch Henzinger and Valerie King. Randomized fully dynamic graph algorithms with polylogarithmic time per operation. InProceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, pages 519–527, 1999. doi: 10.1145/301250.301383

work page doi:10.1145/301250.301383 1999

[10] [10]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015

work page 2015

[11] [11]

distilbert/distilgpt2 model card

Hugging Face. distilbert/distilgpt2 model card. https://huggingface.co/distilbert/ distilgpt2, 2026. Accessed 20 May 2026

work page 2026

[12] [12]

Space-efficient static trees and graphs

Guy Jacobson. Space-efficient static trees and graphs. In30th Annual Symposium on Foundations of Computer Science, pages 549–554, 1989. doi: 10.1109/SFCS.1989.63533

work page doi:10.1109/sfcs.1989.63533 1989

[13] [13]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018. doi: 10.18653/v1/D18-2012

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018

[14] [14]

Nimrod Megiddo and Dharmendra S. Modha. Arc: A self-tuning, low overhead replacement cache. InProceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 115–130, 2003

work page 2003

[15] [15]

Muthukrishnan.Data Streams: Algorithms and Applications

S. Muthukrishnan.Data Streams: Algorithms and Applications. Now Publishers, 2005. doi: 10.1561/0400000002

work page doi:10.1561/0400000002 2005

[16] [16]

Cambridge University Press,

Gonzalo Navarro.Compact Data Structures: A Practical Approach. Cambridge University Press,

work page

[17] [17]

doi: 10.1017/CBO9781316588284

work page doi:10.1017/cbo9781316588284

[18] [18]

Language models are unsupervised multitask learners, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019. Technical report

work page 2019

[19] [19]

Srinivasa Rao

Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. InProceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 233–242, 2002

work page 2002

[20] [20]

Improved dynamic reachabil ity algorithms for directed graphs

Liam Roditty and Uri Zwick. Improved dynamic reachability algorithms for directed graphs. SIAM Journal on Computing, 37(5):1455–1471, 2008. doi: 10.1137/060650271

work page doi:10.1137/060650271 2008

[21] [21]

Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. InNeurIPS Workshop on Energy Efficient Machine Learning and Cognitive Computing, 2019

work page 2019

[22] [22]

Dynamic transitive closure via dynamic matrix inverse

Piotr Sankowski. Dynamic transitive closure via dynamic matrix inverse. In45th Annual IEEE Symposium on Foundations of Computer Science, pages 509–517, 2004. doi: 10.1109/FOCS. 2004.25. 21

work page doi:10.1109/focs 2004

[23] [23]

Neural machine translation of rare words with subword units

Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016. doi: 10.18653/v1/P16-1162

work page doi:10.18653/v1/p16-1162 2016

[24] [24]

Sleator and Robert E

Daniel D. Sleator and Robert E. Tarjan. Amortized efficiency of list update and paging rules. Communications of the ACM, 28(2):202–208, 1985. doi: 10.1145/2786.2793

work page doi:10.1145/2786.2793 1985

[25] [25]

Robert E. Tarjan. Depth-first search and linear graph algorithms.SIAM Journal on Computing, 1(2):146–160, 1972. doi: 10.1137/0201010

work page doi:10.1137/0201010 1972

[26] [26]

Terry A. Welch. A technique for high-performance data compression.Computer, 17(6):8–19,

work page

[27] [27]

doi: 10.1109/MC.1984.1659158

work page doi:10.1109/mc.1984.1659158 1984

[28] [28]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01068 2022

[29] [29]

A universal algorithm for sequential data compression.IEEE Transactions on Information Theory, 23(3):337–343, 1977

Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression.IEEE Transactions on Information Theory, 23(3):337–343, 1977. doi: 10.1109/TIT.1977.1055714

work page doi:10.1109/tit.1977.1055714 1977

[30] [30]

Compression of individual sequences via variable-rate coding

Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530–536, 1978. doi: 10.1109/TIT.1978.1055934. 22

work page doi:10.1109/tit.1978.1055934 1978