Simplicity Scales

Andrew Sampson (6OVER3 Institute); Ronny Chan (6OVER3 Institute); Yuta Saito (GoodNotes)

arxiv: 2604.09591 · v1 · submitted 2026-03-04 · 💻 cs.DC · cs.PF· cs.PL

Simplicity Scales

Andrew Sampson (6OVER3 Institute) , Yuta Saito (GoodNotes) , Ronny Chan (6OVER3 Institute) This is my paper

Pith reviewed 2026-05-15 17:14 UTC · model grok-4.3

classification 💻 cs.DC cs.PFcs.PL

keywords serialization formatfixed-size encodingdecoding speedRPC protocolProtocol Buffers comparisonmemory bandwidthCPU pipeline stallsbatch pipelining

0 comments

The pith

Bebop's fixed-size encoding turns every decode into a single unconditional memory read, delivering 9-213 times faster performance than Protocol Buffers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Bebop, a data serialization format that assigns every type a fixed byte width instead of using variable-length encodings. A 32-bit integer is always exactly four bytes, and decoding reduces to reading that block from memory without inspecting continuation bits or parsing strings. This design eliminates data-dependent branches that stall CPU pipelines in formats like Protocol Buffers and JSON. Measurements across 19 workloads show speedups from 9 to 213 times, with a 1536-dimensional vector decoding in 2.8 nanoseconds compared to 111 nanoseconds for Protocol Buffers. The same wire format also supports an RPC protocol that pipelines dependent calls across services in one round trip over standard transports.

Core claim

Bebop encodes every data type using a fixed number of bytes so that decoding requires only a direct memory load with no conditionals. Across 19 decode workloads this produces speedups of 9 to 213 times over Protocol Buffers, while a 1536-dimension embedding vector decodes in 2.8 nanoseconds versus 111 nanoseconds for Protocol Buffers. On records larger than 64 KB the decoder reaches 86 percent of peak memory bandwidth, showing that the CPU is no longer the limiting factor.

What carries the argument

The fixed-byte-width encoding scheme, in which each primitive type and field occupies a predetermined byte count without variable prefixes or tags that require inspection.

Load-bearing premise

Fixed-size encoding stays practical for the range of data types and values encountered in real applications without causing excessive message bloat.

What would settle it

Run the same benchmarks on workloads dominated by small integers or sparse data and check whether Bebop still shows large speedups or if padding costs erase the advantage.

Figures

Figures reproduced from arXiv: 2604.09591 by Andrew Sampson (6OVER3 Institute), Ronny Chan (6OVER3 Institute), Yuta Saito (GoodNotes).

**Figure 2.** Figure 2: Wire encoding of a small embedding. Bebop’s native UUID saves 20 bytes versus [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Bandwidth utilization vs record size. Larger records amortize per-record overhead [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Encode vs decode latency across binary formats. Bebop’s decode advantage (dark [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

The dominant data interchange formats encode integers using a variable number of bytes or represent floating-point numbers as variable-length UTF-8 strings. The decoder must inspect each byte for a continuation bit or parse each character individually, producing data-dependent branches that stall modern CPU pipelines. Protocol Buffers pays this cost on every integer, field tag, and length prefix. JSON pays it on every value. We present Bebop, a serialization format where every data type uses a fixed number of bytes. A 32-bit integer is always four bytes. Decoding becomes a single memory read with no conditionals. Across 19 decode workloads, Bebop decodes 9--213$\times$ faster than Protocol Buffers. On a 1536-dimension embedding vector, Bebop decodes in 2.8 nanoseconds versus 111 nanoseconds for Protocol Buffers and 4.69 microseconds for simdjson, a 1,675$\times$ gap. On records above 64 KB, the decoder achieves 86% of peak memory bandwidth. The CPU is no longer the bottleneck. We also present a transport-agnostic RPC protocol built on the same wire format. The protocol introduces batch pipelining, where dependent cross-service calls execute in a single round trip with server-side dependency resolution. It deploys over HTTP/1.1, HTTP/2, and binary transports without proxies, removing the HTTP/2 requirement that limits gRPC on serverless platforms and in browsers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bebop's fixed-length format delivers clear decode speedups on primitives and vectors, but the handling of strings and arrays remains the key open question for whether the no-branch promise actually holds in real messages.

read the letter

The core of this paper is a serialization format called Bebop that assigns every data type a fixed byte width so the decoder can just read memory without testing continuation bits or lengths. They pair it with an RPC layer that batches dependent calls and resolves them on the server in one round trip. That combination is what is actually new here relative to existing fixed-length encodings and to gRPC-style protocols. The reported numbers on a 1536-element vector—2.8 ns decode versus 111 ns for protobuf—are the strongest concrete result, and the claim that the decoder hits 86% of memory bandwidth on larger records shows they measured against real hardware limits rather than just micro-benchmarks. Those parts are worth taking seriously if the workloads are representative. The variable-length case is the soft spot. The abstract insists every type is fixed bytes, yet strings, bytes, and repeated fields are not naturally fixed. If the format pads to a maximum size it risks message bloat; if it still encodes a length it likely reintroduces a data-dependent read or branch. The paper needs to show the exact encoding rule and benchmark it on mixed records, not just primitives or embedding vectors. The 9–213× range is also hard to assess without seeing the full workload definitions, measurement harness, and raw timings. This work is aimed at people who care about shaving nanoseconds off hot serialization paths in distributed or serverless systems. A reader who already thinks about cache effects and pipeline stalls will get usable design ideas even if they end up adapting only pieces of it. I would send it to peer review. The central mechanism is simple enough to verify and the performance claims are specific enough that referees can check them against code or additional experiments.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Bebop, a serialization format in which every data type is encoded using a fixed number of bytes, enabling branch-free decoding via single memory reads. It reports substantial speedups over Protocol Buffers (9-213× on 19 decode workloads) and simdjson (1675× on a 1536-dimension embedding vector), with the decoder reaching 86% of peak memory bandwidth on large records. The paper also describes a batch-pipelining RPC protocol built on this format that supports dependent cross-service calls in one round trip over HTTP/1.1, HTTP/2, and binary transports.

Significance. If the empirical results are robust, Bebop could eliminate serialization as a performance bottleneck in data-intensive distributed systems, particularly for embedding vectors and high-volume RPCs. The fixed-size approach is a clean departure from variable-length encodings like varints in Protocol Buffers. The RPC extension addresses practical deployment constraints in serverless and browser environments. Strengths include the direct timing measurements and the bandwidth utilization result.

major comments (2)

[Abstract] Abstract, paragraph 2: The reported speedups (9--213× vs Protocol Buffers across 19 workloads, 2.8 ns vs 111 ns for the 1536-dimension embedding vector) are presented without any description of the workloads, measurement methodology, hardware platform, error bars, or raw data. This makes it impossible to evaluate reproducibility or rule out post-hoc selection and measurement artifacts.
[Abstract] Abstract, paragraph 1: The claim that 'every data type uses a fixed number of bytes' and that decoding reduces to 'a single memory read with no conditionals' is load-bearing for the performance results. The manuscript must explicitly show how variable-length types (strings, bytes, repeated fields) are handled without reintroducing data-dependent branches or unacceptable padding; otherwise the no-branch property does not survive for typical production messages.

minor comments (2)

[Abstract] Abstract: Clarify whether the simdjson comparison (4.69 microseconds) uses the identical embedding-vector workload and optimal configuration.
The manuscript should define all acronyms on first use (e.g., RPC) and provide a brief overview of the 19 workloads in the evaluation section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important issues of reproducibility and clarity around the core claims. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract, paragraph 2: The reported speedups (9--213× vs Protocol Buffers across 19 workloads, 2.8 ns vs 111 ns for the 1536-dimension embedding vector) are presented without any description of the workloads, measurement methodology, hardware platform, error bars, or raw data. This makes it impossible to evaluate reproducibility or rule out post-hoc selection and measurement artifacts.

Authors: We agree that the abstract would benefit from additional context to support reproducibility. In the revised version we will add a concise clause describing the 19 workloads at a high level (integer arrays, embedding vectors, and nested messages) and explicitly direct readers to the evaluation section for the full methodology, hardware platform, timing approach, error bars on all figures, and availability of raw data. This keeps the abstract within length limits while addressing the concern directly. revision: yes
Referee: [Abstract] Abstract, paragraph 1: The claim that 'every data type uses a fixed number of bytes' and that decoding reduces to 'a single memory read with no conditionals' is load-bearing for the performance results. The manuscript must explicitly show how variable-length types (strings, bytes, repeated fields) are handled without reintroducing data-dependent branches or unacceptable padding; otherwise the no-branch property does not survive for typical production messages.

Authors: The manuscript already describes the encoding in Section 3, but we accept that the abstract claim requires an explicit supporting explanation for variable-length types. Bebop encodes strings, bytes, and repeated fields using a fixed-size (4-byte) length or count prefix followed immediately by the payload; the decoder issues an unconditional 32-bit read for the prefix and then computes the payload address from that value. Because the prefix is read as a single integer rather than inspected byte-by-byte, no data-dependent branches appear in the hot path. We will add a dedicated paragraph plus a small diagram in the revised Section 3 that walks through the string, bytes, and repeated-field cases, confirming that padding is limited to natural alignment and does not affect the reported speedups or bandwidth utilization. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims are direct empirical measurements, not derived quantities

full rationale

The paper introduces a fixed-size serialization format (Bebop) and reports measured decode latencies across 19 workloads plus an embedding-vector microbenchmark. No equations, fitted parameters, or derivations appear in the abstract or described claims. The central results (9–213× speedups, 2.8 ns vs. 111 ns) are presented as observed timings rather than quantities defined in terms of themselves or obtained by fitting to the same data. No self-citations are invoked as load-bearing uniqueness theorems, and the design choice of fixed byte widths is stated directly rather than smuggled via prior work. The skeptic concern about variable-length fields is a question of engineering practicality and workload representativeness, not a circularity in any derivation chain. The paper is therefore self-contained against external benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the introduction of the Bebop wire format and the empirical timing results; no free parameters are mentioned. The only explicit background assumption is the well-known behavior of CPU pipelines on data-dependent branches.

axioms (1)

domain assumption Modern CPU pipelines stall on data-dependent branches during decoding.
Invoked to explain why variable-length formats are slow.

invented entities (2)

Bebop serialization format no independent evidence
purpose: Fixed-byte encoding that removes all continuation bits and conditionals
New format introduced by the paper.
batch pipelining RPC protocol no independent evidence
purpose: Server-side resolution of dependent cross-service calls in one round trip
New protocol feature built on the Bebop wire format.

pith-pipeline@v0.9.0 · 5575 in / 1316 out tokens · 67648 ms · 2026-05-15T17:14:44.678275+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

every data type uses a fixed number of bytes. A 32-bit integer is always four bytes. Decoding becomes a single memory read with no conditionals.
IndisputableMonolith/Foundation/DimensionForcing alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

On records above 64 KB, the decoder achieves 86% of peak memory bandwidth.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Google.Protocol Buffers.https://protobuf.dev/, 2008

work page 2008
[2]

K. Varda. Comment on Hacker News, May 2016. https://news.ycombinator.com/i tem?id=11657767

work page 2016
[3]

Protocol Buffers: Google’s Data Interchange Format

K. Varda. “Protocol Buffers: Google’s Data Interchange Format.”Google Open Source Blog, July 2008. https://opensource.googleblog.com/2008/07/protocol-buffers -googles-data.html

work page 2008
[4]

Varda.Cap’n Proto.https://capnproto.org/, 2013

K. Varda.Cap’n Proto.https://capnproto.org/, 2013

work page 2013
[5]

Field Presence

Google. “Field Presence.”Protocol Buffers Documentation, 2020. https://protobuf.d ev/programming-guides/field_presence/

work page 2020
[6]

Google.FlatBuffers.https://flatbuffers.dev/, 2014

work page 2014
[7]

https://github.com/aeron-io/simple-binar y-encoding, 2013

Real Logic.Simple Binary Encoding. https://github.com/aeron-io/simple-binar y-encoding, 2013

work page 2013
[8]

Furuhashi.MessagePack.https://msgpack.org/, 2008

S. Furuhashi.MessagePack.https://msgpack.org/, 2008

work page 2008
[9]

Apache Software Foundation.Apache Avro.https://avro.apache.org/, 2009

work page 2009
[10]

Parsing Gigabytes of JSON per Second

G. Langdale and D. Lemire. “Parsing Gigabytes of JSON per Second.”The VLDB Journal, 28(6):941–960, 2019

work page 2019
[11]

A New Golden Age for Computer Architecture

J. L. Hennessy and D. A. Patterson. “A New Golden Age for Computer Architecture.” Communications of the ACM, 62(2):48–60, 2019. Based on Turing Lecture delivered at ISCA, June 2018

work page 2019
[12]

A 30 Year Retrospective on Dennard’s MOSFET Scaling Paper

M. Bohr. “A 30 Year Retrospective on Dennard’s MOSFET Scaling Paper.”IEEE Solid-State Circuits Newsletter, 12(1):11–13, 2007

work page 2007
[13]

PCI-SIG.PCI Express Base Specification Revision 6.0, Version 1.0, January 2022

work page 2022
[14]

IEEE Standard for Ethernet – Amendment 10: Media Access Control Parameters, Physical Layers, and Management Parameters for 200 Gb/s and 400 Gb/s Operation

IEEE. “IEEE Standard for Ethernet – Amendment 10: Media Access Control Parameters, Physical Layers, and Management Parameters for 200 Gb/s and 400 Gb/s Operation.” IEEE Std 802.3bs-2017, December 2017

work page 2017
[15]

An Introduction to the Compute Express Link (CXL) Interconnect

D. Das Sharma, R. Blankenship, and D. Berger. “An Introduction to the Compute Express Link (CXL) Interconnect.”ACM Computing Surveys, 56(11):1–37, 2024

work page 2024
[16]

402 Tb/s GMI Data-Rate OESCLU-Band Transmis- sion

B. J. Puttnam, H. Furukawa, et al. “402 Tb/s GMI Data-Rate OESCLU-Band Transmis- sion.” Post-deadline paper Th4A.3, Optical Fiber Communication Conference (OFC), San Diego, March 2024

work page 2024
[17]

J. Carmack. Post on fiber optic delay-line memory and flash bandwidth for AI inference. X (formerly Twitter), February 2026. https://x.com/ID_AA_Carmack/status/20198 39335382790342 33

work page 2026
[18]

Characterizing the Branch Misprediction Penalty

S. Eyerman, J. E. Smith, and L. Eeckhout. “Characterizing the Branch Misprediction Penalty.”Proc. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 48–58, 2006

work page 2006
[19]

The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers

A. Fog. “The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers.” Technical University of Denmark, https://www.agner.org/optimize/microarchitecture.pdf, 2025

work page 2025
[20]

Optimization Notes: Apple M1

H. Suzuki. “Optimization Notes: Apple M1.” https://github.com/ocxtal/insn_ben ch_aarch64, 2021

work page 2021
[21]

Popping the Hood on Golden Cove

C. Lam. “Popping the Hood on Golden Cove.” Chips and Cheese, https://chipsand cheese.com/p/popping-the-hood-on-golden-cove, 2021

work page 2021
[22]

Mac Studio Technical Specifications

Apple Inc. “Mac Studio Technical Specifications.” https://www.apple.com/mac-stud io/specs/, 2025

work page 2025
[23]

gRPC over HTTP2

gRPC Authors. “gRPC over HTTP2.” https://github.com/grpc/grpc/blob/mast er/doc/PROTOCOL-HTTP2.md, 2015

work page 2015
[24]

Core concepts, architecture and lifecycle

gRPC Authors. “Core concepts, architecture and lifecycle.” https://grpc.io/docs/w hat-is-grpc/core-concepts/, 2023

work page 2023
[25]

Deadlines

gRPC Authors. “Deadlines.”https://grpc.io/docs/guides/deadlines/, 2023

work page 2023
[26]

gRPC to AWS Lambda: Is it Possible?

P. Henry. “gRPC to AWS Lambda: Is it Possible?”Coinbase Blog, March 2019. https://www.coinbase.com/blog/grpc-to-aws-lambda-is-it-possible

work page 2019
[27]

Support for calling gRPC endpoints from Cloudflare Workers

“Support for calling gRPC endpoints from Cloudflare Workers.” Discussion #4534, cloudflare/workerd GitHub repository, 2025. https://github.com/cloudflare/wo rkerd/discussions/4534

work page 2025
[28]

The state of gRPC in the browser

J. Brandhorst. “The state of gRPC in the browser.”gRPC Blog, January 2019. https: //grpc.io/blog/state-of-grpc-web/

work page 2019
[29]

Twirp: a sweet new RPC framework for Go

S. Nelson. “Twirp: a sweet new RPC framework for Go.”Twitch Blog, January

work page
[30]

https://blog.twitch.tv/en/2018/01/16/twirp-a-sweet-new-rpc-framewo rk-for-go-5f2febbf35f/

work page 2018
[31]

Long Running Operations

Google. “Long Running Operations.”API Design Guide. https://cloud.google.com /apis/design/design_patterns#long_running_operations, 2017

work page 2017
[32]

Thrift: Scalable Cross-Language Services Implementation

M. Slee, A. Agarwal, and M. Kwiatkowski. “Thrift: Scalable Cross-Language Services Implementation.” Facebook Technical Paper, April 2007. https://thrift.apache.or g/static/files/thrift-20070401.pdf

work page 2007
[33]

Addressing Cascading Failures

M. Ulrich. “Addressing Cascading Failures.” InSite Reliability Engineering: How Google Runs Production Systems, ch. 22, O’Reilly Media, 2016. https://sre.google/sre-b ook/addressing-cascading-failures/

work page 2016
[34]

How LinkedIn Adopted Protocol Buffers to Reduce Latency by 60%

N. Kim. “How LinkedIn Adopted Protocol Buffers to Reduce Latency by 60%.”System Design Newsletter, 2023. https://newsletter.systemdesign.one/p/protocol-buf fers-vs-json 34

work page 2023
[35]

Hash Function Prospector

C. Wellons. “Hash Function Prospector.” https://github.com/skeeto/hash-prosp ector, 2018. 35

work page 2018

[1] [1]

Google.Protocol Buffers.https://protobuf.dev/, 2008

work page 2008

[2] [2]

K. Varda. Comment on Hacker News, May 2016. https://news.ycombinator.com/i tem?id=11657767

work page 2016

[3] [3]

Protocol Buffers: Google’s Data Interchange Format

K. Varda. “Protocol Buffers: Google’s Data Interchange Format.”Google Open Source Blog, July 2008. https://opensource.googleblog.com/2008/07/protocol-buffers -googles-data.html

work page 2008

[4] [4]

Varda.Cap’n Proto.https://capnproto.org/, 2013

K. Varda.Cap’n Proto.https://capnproto.org/, 2013

work page 2013

[5] [5]

Field Presence

Google. “Field Presence.”Protocol Buffers Documentation, 2020. https://protobuf.d ev/programming-guides/field_presence/

work page 2020

[6] [6]

Google.FlatBuffers.https://flatbuffers.dev/, 2014

work page 2014

[7] [7]

https://github.com/aeron-io/simple-binar y-encoding, 2013

Real Logic.Simple Binary Encoding. https://github.com/aeron-io/simple-binar y-encoding, 2013

work page 2013

[8] [8]

Furuhashi.MessagePack.https://msgpack.org/, 2008

S. Furuhashi.MessagePack.https://msgpack.org/, 2008

work page 2008

[9] [9]

Apache Software Foundation.Apache Avro.https://avro.apache.org/, 2009

work page 2009

[10] [10]

Parsing Gigabytes of JSON per Second

G. Langdale and D. Lemire. “Parsing Gigabytes of JSON per Second.”The VLDB Journal, 28(6):941–960, 2019

work page 2019

[11] [11]

A New Golden Age for Computer Architecture

J. L. Hennessy and D. A. Patterson. “A New Golden Age for Computer Architecture.” Communications of the ACM, 62(2):48–60, 2019. Based on Turing Lecture delivered at ISCA, June 2018

work page 2019

[12] [12]

A 30 Year Retrospective on Dennard’s MOSFET Scaling Paper

M. Bohr. “A 30 Year Retrospective on Dennard’s MOSFET Scaling Paper.”IEEE Solid-State Circuits Newsletter, 12(1):11–13, 2007

work page 2007

[13] [13]

PCI-SIG.PCI Express Base Specification Revision 6.0, Version 1.0, January 2022

work page 2022

[14] [14]

IEEE Standard for Ethernet – Amendment 10: Media Access Control Parameters, Physical Layers, and Management Parameters for 200 Gb/s and 400 Gb/s Operation

IEEE. “IEEE Standard for Ethernet – Amendment 10: Media Access Control Parameters, Physical Layers, and Management Parameters for 200 Gb/s and 400 Gb/s Operation.” IEEE Std 802.3bs-2017, December 2017

work page 2017

[15] [15]

An Introduction to the Compute Express Link (CXL) Interconnect

D. Das Sharma, R. Blankenship, and D. Berger. “An Introduction to the Compute Express Link (CXL) Interconnect.”ACM Computing Surveys, 56(11):1–37, 2024

work page 2024

[16] [16]

402 Tb/s GMI Data-Rate OESCLU-Band Transmis- sion

B. J. Puttnam, H. Furukawa, et al. “402 Tb/s GMI Data-Rate OESCLU-Band Transmis- sion.” Post-deadline paper Th4A.3, Optical Fiber Communication Conference (OFC), San Diego, March 2024

work page 2024

[17] [17]

J. Carmack. Post on fiber optic delay-line memory and flash bandwidth for AI inference. X (formerly Twitter), February 2026. https://x.com/ID_AA_Carmack/status/20198 39335382790342 33

work page 2026

[18] [18]

Characterizing the Branch Misprediction Penalty

S. Eyerman, J. E. Smith, and L. Eeckhout. “Characterizing the Branch Misprediction Penalty.”Proc. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 48–58, 2006

work page 2006

[19] [19]

The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers

A. Fog. “The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers.” Technical University of Denmark, https://www.agner.org/optimize/microarchitecture.pdf, 2025

work page 2025

[20] [20]

Optimization Notes: Apple M1

H. Suzuki. “Optimization Notes: Apple M1.” https://github.com/ocxtal/insn_ben ch_aarch64, 2021

work page 2021

[21] [21]

Popping the Hood on Golden Cove

C. Lam. “Popping the Hood on Golden Cove.” Chips and Cheese, https://chipsand cheese.com/p/popping-the-hood-on-golden-cove, 2021

work page 2021

[22] [22]

Mac Studio Technical Specifications

Apple Inc. “Mac Studio Technical Specifications.” https://www.apple.com/mac-stud io/specs/, 2025

work page 2025

[23] [23]

gRPC over HTTP2

gRPC Authors. “gRPC over HTTP2.” https://github.com/grpc/grpc/blob/mast er/doc/PROTOCOL-HTTP2.md, 2015

work page 2015

[24] [24]

Core concepts, architecture and lifecycle

gRPC Authors. “Core concepts, architecture and lifecycle.” https://grpc.io/docs/w hat-is-grpc/core-concepts/, 2023

work page 2023

[25] [25]

Deadlines

gRPC Authors. “Deadlines.”https://grpc.io/docs/guides/deadlines/, 2023

work page 2023

[26] [26]

gRPC to AWS Lambda: Is it Possible?

P. Henry. “gRPC to AWS Lambda: Is it Possible?”Coinbase Blog, March 2019. https://www.coinbase.com/blog/grpc-to-aws-lambda-is-it-possible

work page 2019

[27] [27]

Support for calling gRPC endpoints from Cloudflare Workers

“Support for calling gRPC endpoints from Cloudflare Workers.” Discussion #4534, cloudflare/workerd GitHub repository, 2025. https://github.com/cloudflare/wo rkerd/discussions/4534

work page 2025

[28] [28]

The state of gRPC in the browser

J. Brandhorst. “The state of gRPC in the browser.”gRPC Blog, January 2019. https: //grpc.io/blog/state-of-grpc-web/

work page 2019

[29] [29]

Twirp: a sweet new RPC framework for Go

S. Nelson. “Twirp: a sweet new RPC framework for Go.”Twitch Blog, January

work page

[30] [30]

https://blog.twitch.tv/en/2018/01/16/twirp-a-sweet-new-rpc-framewo rk-for-go-5f2febbf35f/

work page 2018

[31] [31]

Long Running Operations

Google. “Long Running Operations.”API Design Guide. https://cloud.google.com /apis/design/design_patterns#long_running_operations, 2017

work page 2017

[32] [32]

Thrift: Scalable Cross-Language Services Implementation

M. Slee, A. Agarwal, and M. Kwiatkowski. “Thrift: Scalable Cross-Language Services Implementation.” Facebook Technical Paper, April 2007. https://thrift.apache.or g/static/files/thrift-20070401.pdf

work page 2007

[33] [33]

Addressing Cascading Failures

M. Ulrich. “Addressing Cascading Failures.” InSite Reliability Engineering: How Google Runs Production Systems, ch. 22, O’Reilly Media, 2016. https://sre.google/sre-b ook/addressing-cascading-failures/

work page 2016

[34] [34]

How LinkedIn Adopted Protocol Buffers to Reduce Latency by 60%

N. Kim. “How LinkedIn Adopted Protocol Buffers to Reduce Latency by 60%.”System Design Newsletter, 2023. https://newsletter.systemdesign.one/p/protocol-buf fers-vs-json 34

work page 2023

[35] [35]

Hash Function Prospector

C. Wellons. “Hash Function Prospector.” https://github.com/skeeto/hash-prosp ector, 2018. 35

work page 2018