From Indexing to Coding: A New Paradigm for Data Availability Sampling
Pith reviewed 2026-05-18 13:29 UTC · model grok-4.3
The pith
Committing to uncoded data and sampling via on-the-fly random linear coding lets light nodes verify data availability with up to orders of magnitude stronger statistical assurances than fixed-rate codes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes a new approach to DAS that modularizes the coding and commitment process by committing to the uncoded data while performing sampling through on-the-fly coding. The protocol realizes this using random linear network coding. The resulting samples are significantly more expressive, enabling light nodes to obtain, in concrete implementations, up to multiple orders of magnitude stronger assurances of data availability than from sampling pre-committed symbols from a fixed-rate redundancy code as done in established DAS schemes using Reed Solomon or low density parity check codes.
What carries the argument
On-the-fly random linear network coding (RLNC) performed after committing to uncoded data, replacing sampling from a predetermined set of pre-coded symbols.
Load-bearing premise
The on-the-fly random linear combinations can be generated and verified with communication and computation costs that remain practical for light nodes while still delivering the claimed statistical advantage over fixed-rate codes.
What would settle it
A prototype implementation or simulation in which light-node verification of on-the-fly samples requires more samples or higher costs than Reed-Solomon DAS to reach equivalent assurance levels, or where costs exceed practical limits for resource-constrained nodes.
Figures
read the original abstract
The data availability problem is a central challenge in blockchain systems and lies at the core of the accessibility and scalability issues faced by platforms such as Ethereum. Modern solutions employ several approaches, with data availability sampling (DAS) being the most self-sufficient and minimalistic in its security assumptions. Existing DAS methods typically form cryptographic commitments on codewords of fixed-rate erasure codes, which restrict light nodes to sampling from a predetermined set of coded symbols. In this paper, we introduce a new approach to DAS that modularizes the coding and commitment process by committing to the uncoded data while performing sampling through on-the-fly coding. The resulting samples are significantly more expressive, enabling light nodes to obtain, in concrete implementations, up to multiple orders of magnitude stronger assurances of data availability than from sampling pre-committed symbols from a fixed-rate redundancy code as done in established DAS schemes using Reed Solomon or low density parity check codes. We present a concrete protocol that realizes this paradigm using random linear network coding (RLNC).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a new paradigm for data availability sampling (DAS) in blockchain systems. Rather than forming cryptographic commitments on codewords of fixed-rate erasure codes (e.g., Reed-Solomon or LDPC), the approach commits directly to the uncoded data and performs sampling via on-the-fly random linear network coding (RLNC). The resulting samples are claimed to be significantly more expressive, enabling light nodes to obtain up to multiple orders of magnitude stronger statistical assurances of data availability than in established DAS schemes.
Significance. If the concrete RLNC protocol can be shown to deliver the claimed per-sample detection probability advantage while keeping verification and communication costs independent of block size and practical for light nodes, the work would represent a meaningful improvement in the security-efficiency tradeoff for blockchain light clients. The modular separation of commitment and coding is a clean conceptual contribution that could influence future DAS designs.
major comments (3)
- [Abstract, §3] Abstract and §3 (protocol description): the central claim that on-the-fly RLNC yields 'up to multiple orders of magnitude stronger assurances' is stated without any quantitative comparison, detection-probability calculation, or security reduction. No table or equation shows the per-sample advantage over fixed-rate RS/LDPC symbols under realistic adversarial models.
- [§4] §4 (verification and cost analysis): the manuscript does not demonstrate that verification of a random linear combination (coefficients + result + proof) has communication and computation cost independent of the number of summed symbols or block dimension. If verification cost grows with the support size of the linear combination, the statistical gain is offset and the headline improvement does not materialize.
- [§5] §5 (security analysis): no concrete parameter choices, field size, or security proof are provided that would allow a reader to verify that the RLNC scheme achieves the stated advantage while remaining sound against a malicious block producer who controls the uncoded data.
minor comments (2)
- [§3] Notation for the commitment scheme and the RLNC coefficients should be introduced earlier and used consistently; several symbols appear without prior definition in the protocol section.
- [§2] The related-work discussion would benefit from explicit comparison of communication complexity per sample with existing DAS constructions (e.g., KZG-based or Merkle-tree DAS).
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive suggestions. We address each of the major comments point by point below, indicating the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (protocol description): the central claim that on-the-fly RLNC yields 'up to multiple orders of magnitude stronger assurances' is stated without any quantitative comparison, detection-probability calculation, or security reduction. No table or equation shows the per-sample advantage over fixed-rate RS/LDPC symbols under realistic adversarial models.
Authors: We agree with this observation. The current manuscript states the advantage qualitatively but lacks explicit calculations. In the revised version, we will add to §3 a quantitative analysis including equations for the detection probability of a single RLNC sample versus a fixed-rate symbol. Specifically, under an adversarial model where the adversary erases a fraction f of the data, the probability that a random linear combination detects the erasure is 1 - (1-f)^k where k is the number of symbols combined, leading to significantly higher detection rates for larger supports. We will include a table comparing these probabilities for typical DAS parameters (e.g., 1% vs 50% erasure rates) and discuss the orders-of-magnitude improvement in statistical assurance for light nodes. revision: yes
-
Referee: [§4] §4 (verification and cost analysis): the manuscript does not demonstrate that verification of a random linear combination (coefficients + result + proof) has communication and computation cost independent of the number of summed symbols or block dimension. If verification cost grows with the support size of the linear combination, the statistical gain is offset and the headline improvement does not materialize.
Authors: We appreciate this important point regarding practicality. In our design, the commitment is to the raw data using a vector commitment scheme that supports homomorphic linear combinations. Verification involves checking the proof for the combined value, which has constant size independent of the support size of the linear combination. The coefficients are generated pseudorandomly from a seed, so only the seed needs to be communicated, keeping communication cost low. We will expand §4 with a formal cost analysis table showing that both computation and communication remain independent of block dimension and support size, relying on the properties of the underlying commitment scheme. revision: yes
-
Referee: [§5] §5 (security analysis): no concrete parameter choices, field size, or security proof are provided that would allow a reader to verify that the RLNC scheme achieves the stated advantage while remaining sound against a malicious block producer who controls the uncoded data.
Authors: We acknowledge that the security analysis in §5 is currently at a high level. To address this, the revised manuscript will include concrete parameter recommendations, such as using a finite field of size 2^32 or larger to minimize collision probabilities, and a detailed security argument. We will provide a reduction showing that if the commitment scheme is binding and the RLNC is performed over a sufficiently large field, the probability of a malicious producer fooling the sampler is negligible. A full formal proof will be outlined in an appendix, with the main text providing the key steps and parameter choices that achieve the claimed advantage. revision: yes
Circularity Check
No circularity: paradigm shift rests on independent protocol construction
full rationale
The paper proposes committing to uncoded data and performing on-the-fly RLNC sampling, claiming stronger statistical assurances than fixed-rate RS/LDPC schemes. No equations, fitted parameters, or self-citations are exhibited that reduce the claimed orders-of-magnitude gain to a quantity defined by the authors' own prior work or by construction. The central advantage is presented as arising directly from the modular separation of commitment and coding, which is externally verifiable against standard erasure-code baselines without requiring the present paper's fitted values or uniqueness theorems. This is the expected non-finding for a conceptual protocol paper whose load-bearing step is the concrete RLNC verification cost, not a self-referential derivation.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
sampling through on-the-fly coding... using random linear network coding (RLNC)... undecodability ratio α_RLNC = 1−1/|F|
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Pedersen vector commitment... inner product argument
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
Abdrashitov, V., M´ edard, M.: Durable network coded distributed storage. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). pp. 851–856. IEEE (2015)
work page 2015
-
[2]
Al-Bassam, M., Sonnino, A., Buterin, V.: Fraud and data availability proofs: Max- imising light client security and scaling blockchains with dishonest majorities. arXiv preprint arXiv:1809.09044 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Cryptology ePrint Archive (2025)
Boneh, D., Neu, J., Nikolaenko, V., Partap, A.: Data availability sampling with repair. Cryptology ePrint Archive (2025)
work page 2025
-
[4]
In: 2018 IEEE symposium on security and privacy (SP)
B¨ unz, B., Bootle, J., Boneh, D., Poelstra, A., Wuille, P., Maxwell, G.: Bulletproofs: Short proofs for confidential transactions and more. In: 2018 IEEE symposium on security and privacy (SP). pp. 315–334. IEEE (2018)
work page 2018
-
[5]
In: International conference on the theory and applica- tion of cryptology and information security
B¨ unz, B., Maller, M., Mishra, P., Tyagi, N., Vesely, P.: Proofs for inner pairing products and applications. In: International conference on the theory and applica- tion of cryptology and information security. pp. 65–97. Springer (2021)
work page 2021
-
[6]
Deb, S.: How good is random linear coding based distributed networked storage. In: Workshop on Network Coding, Theory and Applications (2005) From Indexing to Coding: A New Paradigm for Data Availability Sampling 15
work page 2005
-
[7]
Cryptology ePrint Archive (2023)
Hall-Andersen, M., Simkin, M., Wagner, B.: Foundations of data availability sam- pling. Cryptology ePrint Archive (2023)
work page 2023
-
[8]
IEEE Transactions on information theory52(10), 4413–4430 (2006)
Ho, T., M´ edard, M., Koetter, R., Karger, D.R., Effros, M., Shi, J., Leong, B.: A random linear network coding approach to multicast. IEEE Transactions on information theory52(10), 4413–4430 (2006)
work page 2006
-
[9]
In: International conference on the theory and appli- cation of cryptology and information security
Kate, A., Zaverucha, G.M., Goldberg, I.: Constant-size commitments to polyno- mials and their applications. In: International conference on the theory and appli- cation of cryptology and information security. pp. 177–194. Springer (2010)
work page 2010
-
[10]
M´ edard, M., Vasudevan, V.A., Pedersen, M.V., Duffy, K.R.: Network Coding for Engineers. John Wiley & Sons (2025)
work page 2025
-
[11]
IEEE Transac- tions on Communications70(9), 5742–5759 (2022)
Mitra, D., Tauz, L., Dolecek, L.: Overcoming data availability attacks in blockchain systems: Short code-length ldpc code design for coded merkle tree. IEEE Transac- tions on Communications70(9), 5742–5759 (2022)
work page 2022
-
[12]
Technical re- port, Paradigm (2022)
Neu, J.: Data availability sampling: From basics to open problems. Technical re- port, Paradigm (2022)
work page 2022
-
[13]
Nikolaenko, V., Boneh, D.: Data availability sampling and danksharding: An overview and a proposal for improvements (2023)
work page 2023
-
[14]
Cambridge University Press (2006)
Roth, R.: Introduction to coding theory. Cambridge University Press (2006)
work page 2006
-
[15]
Cryptology ePrint Archive (2024)
Wagner, B., Zapico, A.: A documentation of ethereum’s peerdas. Cryptology ePrint Archive (2024)
work page 2024
-
[16]
Yu, M., Kannan, S., Viswanath, P., Li, S., Avestimehr, A.S.: Determining data availability,https://patents.google.com/patent/US11533127B1/en
-
[17]
Yu, M., Kannan, S., Viswanath, P., Li, S., Avestimehr, A.S.: Verifying a set of remotely stored data,https://patentimages.storage.googleapis.com/f4/3f/ e9/6fb900187dc974/US11356274.pdf
-
[18]
In: International Con- ference on Financial Cryptography and Data Security
Yu, M., Sahraei, S., Li, S., Avestimehr, S., Kannan, S., Viswanath, P.: Coded merkle tree: Solving data availability attacks in blockchains. In: International Con- ference on Financial Cryptography and Data Security. pp. 114–134. Springer (2020) A Proof of Soundness Theorem For soundness, some data entity must be reconstructible from the emitted coded vec...
work page 2020
-
[19]
These are usually denoted byc(variable-node degree) andd(check-node degree)
In a regular LDPC code, the variable node degree (the number of parity- check equations in which each coded symbol, or variable node, participates) is constant across all variable nodes, and the check node degree (the number of coded symbols connected to each parity-check equation) is likewise constant. These are usually denoted byc(variable-node degree) ...
-
[20]
Such codes are typically characterized by their average variable-node degree and check-node degree
In an irregular LDPC code, the variable-node and check-node degrees are not constant but instead follow a degree distribution. Such codes are typically characterized by their average variable-node degree and check-node degree. The LDPC codes used in [11, 18] are a special type of systematic, irregular LDPC codes with rater= 0.25, meaning that the firstkco...
-
[21]
first introduced RS codes as component codes of a 2D tensor code for DAS. Different approaches exist for chunking the original data into shares, encod- ing it, and batching it again. In what follows, we describe a generalized version of 2D-RS-based DAS schemes that captures the key properties of established methods, particularly with respect to sampling c...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.