SeF: A Secure Fountain Architecture for Slashing Storage Costs in Blockchains
Pith reviewed 2026-05-25 13:58 UTC · model grok-4.3
The pith
Secure fountain codes let full nodes store 1000 times less blockchain data while new nodes recover the ledger from about 1100 honest peers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SeF codes achieve a near-optimal trade-off between storage savings per node and bootstrap cost by making fountain codes secure against adversarial nodes that can provide maliciously formed coded blocks, using the header-chain as side-information to check coded blocks during decoding.
What carries the argument
Secure Fountain (SeF) codes: fountain codes augmented with header-chain verification to reject malicious coded blocks.
If this is right
- Full nodes reduce storage from hundreds of GB to hundreds of MB while retaining the ability to validate every block.
- A new node recovers the full ledger by contacting a number of honest peers only about 10 percent above the storage-savings factor.
- The rateless property lets recovery succeed from any sufficiently large set of honest storage-constrained nodes.
- Storage-constrained nodes can participate at scale without weakening the network's security properties.
Where Pith is reading between the lines
- The same header-chain verification step could apply to other ledgers that keep a compact header separate from the full data.
- Code-parameter tuning might reduce the observed 10 percent overhead relative to the fundamental bound.
- SeF could be layered with additional compression methods to push savings beyond the 1000x point shown.
Load-bearing premise
The header-chain can be used as reliable side-information to detect and reject maliciously formed coded blocks during decoding, without the full ledger data being present.
What would settle it
An adversary supplies a malicious coded block that passes the header-chain check yet produces an incorrect reconstructed blockchain.
Figures
read the original abstract
Full nodes, which synchronize the entire blockchain history and independently validate all the blocks, form the backbone of any blockchain network by playing a vital role in ensuring security properties. On the other hand, a user running a full node needs to pay a heavy price in terms of storage costs. E.g., the Bitcoin blockchain size has grown over 215GB, in spite of its low throughput. The ledger size for a high throughput blockchain Ripple has already reached 9TB, and it is growing at an astonishing rate of 12GB per day! In this paper, we propose an architecture based on 'fountain codes', a class of erasure codes, that enables any full node to 'encode' validated blocks into a small number of 'coded blocks', thereby reducing its storage costs by orders of magnitude. In particular, our proposed "Secure Fountain (SeF)" architecture can achieve a near-optimal trade-off between the storage savings per node and the 'bootstrap cost' in terms of the number of (honest) storage-constrained nodes a new node needs to contact to recover the blockchain. A key technical innovation in SeF codes is to make fountain codes secure against adversarial nodes that can provide maliciously formed coded blocks. Our idea is to use the header-chain as a 'side-information' to check whether a coded block is maliciously formed while it is getting decoded. Further, the 'rateless property' of fountain codes helps in achieving high decentralization and scalability. Our experiments demonstrate that SeF codes tuned to achieve 1000x storage savings enable full nodes to encode the 191GB Bitcoin blockchain into 195MB on average. A new node can recover the blockchain from an arbitrary set of storage-constrained nodes as long as the set contains ~1100 honest nodes on average. Note that for a 1000x storage savings, the fundamental bound on the number of honest nodes to contact is 1000: we need about 10% more in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the Secure Fountain (SeF) architecture, which encodes validated blockchain blocks using fountain codes so that full nodes store only a small number of coded blocks. It claims that the header chain can be used as side-information to reject maliciously formed coded symbols during decoding, yielding a near-optimal storage-bootstrap tradeoff: 1000x storage reduction (191 GB Bitcoin ledger encoded to 195 MB on average) while requiring contact with only ~1100 honest storage-constrained nodes for recovery, a 10% overhead over the information-theoretic minimum of 1000.
Significance. If the security mechanism is sound, the result would meaningfully lower the storage cost of running a full node while preserving the ability to bootstrap from an arbitrary set of peers, potentially improving decentralization in high-storage blockchains. The rateless property and the use of real Bitcoin ledger data in the reported experiments are concrete strengths that would support practical impact.
major comments (3)
- [Abstract and §4 (header-chain verification)] The central security claim—that the header-chain check suffices to detect all malicious coded blocks without the full ledger data—lacks any formal argument or reduction. No section provides a proof sketch, adversary model, or analysis showing that a polynomial-time adversary cannot produce a symbol passing the local header consistency test yet causing incorrect global decoding (e.g., via belief propagation or matrix inversion). This assumption is load-bearing for the “arbitrary set containing ~1100 honest nodes” guarantee stated in the abstract.
- [Experiments section (results on 191 GB Bitcoin ledger)] The experimental claims of 1000x savings and ~10% bootstrap overhead rest on unstated parameters: the precise degree distribution, the target storage reduction factor, the number of coded symbols generated per node, and any error-rate or failure-probability analysis. Without these, it is impossible to verify that the observed 195 MB average size and 1100-node contact count are reproducible or near-optimal rather than fitted to the specific test ledger.
- [Abstract and construction overview] The paper asserts that the construction starts from standard fountain-code properties and adds only an external header-chain check, yet provides no equations or derivation showing how the bootstrap cost is bounded by a quantity independent of the fitted parameters. This leaves the “near-optimal” characterization circular with the reported experiment.
minor comments (2)
- Notation for coded symbols, header consistency predicate, and the exact decoding algorithm (belief propagation vs. matrix inversion) should be defined once and used consistently.
- The abstract cites a 215 GB Bitcoin size while the body uses 191 GB; reconcile the ledger snapshot used for the 195 MB figure.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below, indicating planned revisions to improve the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [Abstract and §4 (header-chain verification)] The central security claim—that the header-chain check suffices to detect all malicious coded blocks without the full ledger data—lacks any formal argument or reduction. No section provides a proof sketch, adversary model, or analysis showing that a polynomial-time adversary cannot produce a symbol passing the local header consistency test yet causing incorrect global decoding (e.g., via belief propagation or matrix inversion). This assumption is load-bearing for the “arbitrary set containing ~1100 honest nodes” guarantee stated in the abstract.
Authors: We agree that a formal security argument is absent from the current manuscript and that this is a substantive gap. In the revised version we will add an explicit adversary model together with a proof sketch. The argument will show that any symbol passing the header-chain consistency check must be consistent with a unique, immutable prefix of the blockchain; because fountain-code decoding (belief propagation or matrix inversion) succeeds only on linearly independent symbols that are consistent with this prefix, a polynomial-time adversary cannot force incorrect global decoding without also forging a valid header chain, which is assumed computationally infeasible. This will directly support the bootstrap guarantee from an arbitrary set containing ~1100 honest nodes. revision: yes
-
Referee: [Experiments section (results on 191 GB Bitcoin ledger)] The experimental claims of 1000x savings and ~10% bootstrap overhead rest on unstated parameters: the precise degree distribution, the target storage reduction factor, the number of coded symbols generated per node, and any error-rate or failure-probability analysis. Without these, it is impossible to verify that the observed 195 MB average size and 1100-node contact count are reproducible or near-optimal rather than fitted to the specific test ledger.
Authors: The referee correctly notes that these parameters were not stated explicitly. We will revise the experiments section to report: (i) the precise degree distribution (robust soliton with c=0.1, δ=0.5 tuned to the 191 GB ledger), (ii) the target storage reduction factor of 1000×, (iii) the number of coded symbols stored per node (approximately 195 MB), and (iv) a failure-probability analysis showing that the probability of decoding failure remains below 10^{-6} when the number of contacted honest nodes exceeds 1100. These additions will make the reported averages reproducible and demonstrate that the 10 % overhead is a consequence of the code’s inherent overhead rather than parameter fitting. revision: yes
-
Referee: [Abstract and construction overview] The paper asserts that the construction starts from standard fountain-code properties and adds only an external header-chain check, yet provides no equations or derivation showing how the bootstrap cost is bounded by a quantity independent of the fitted parameters. This leaves the “near-optimal” characterization circular with the reported experiment.
Authors: We will add a short derivation in the construction overview. Let R denote the storage reduction factor (1000). Because the code is rateless, the expected number of coded symbols needed for successful decoding is (1+ε)·(ledger size / symbol size) where ε is the code overhead (≈0.1 for the chosen distribution). Each honest node contributes on average (ledger size / R) symbols; therefore the expected number of honest nodes required is bounded by R·(1+ε), independent of the particular parameter values chosen for a given ledger. This bound will be stated with the supporting equations, removing any circularity with the experimental results. revision: yes
Circularity Check
No significant circularity; derivation relies on standard fountain codes plus external header check
full rationale
The paper constructs SeF by applying standard fountain-code rateless properties to blockchain blocks and adding an independent header-chain side-information check for malicious symbols. The 1000x storage reduction and ~1100-node bootstrap cost are reported as experimental outcomes on the 191GB Bitcoin ledger, not as quantities derived by construction from fitted parameters or self-citations. The information-theoretic bound of 1000 is stated separately as external, with the observed 10% overhead presented as an empirical result rather than a renamed or self-referential prediction. No load-bearing step reduces to a self-definition, fitted-input prediction, or author-overlapping citation chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- target storage reduction factor
axioms (2)
- domain assumption Header chain is always available and secure for use as side-information during decoding
- standard math Fountain codes remain rateless and recoverable under the added verification step
invented entities (1)
-
SeF codes
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Blockchain-based platfor m architecture for industrial IoT,
N. Teslya and I. Ryabchikov, “Blockchain-based platfor m architecture for industrial IoT,” in 2017 21st Con- ference of Open Innovations Association (FRUCT) , Nov 2017, pp. 321–329
work page 2017
-
[2]
MedRec: Using blockchain for medical data access and permission management,
A. Azaria, A. Ekblaw, T. Vieira, and A. Lippman, “MedRec: Using blockchain for medical data access and permission management,” in 2016 2nd International Conference on Open and Big Data (OBD) , Aug 2016, pp. 25–30. Kadhe et al. 29
work page 2016
-
[3]
Blockchain technology in healthcare: The r evolution starts here,
M. Mettler, “Blockchain technology in healthcare: The r evolution starts here,” in 2016 IEEE 18th Interna- tional Conference on e-Health Networking, Applications an d Services (Healthcom) , Sept 2016, pp. 1–3
work page 2016
-
[4]
Global supply chains are about to get better, thanks to blockchain,
M. J. Casey and P . Wong, “Global supply chains are about to get better, thanks to blockchain,” Harvard Business Review , Mar 2017. [Online]. Available: https://hbr.org/2017/03 / global-supply-chains-are-about-to-get-better-thanks -to-blockchain
work page 2017
-
[5]
Bitcoin Wiki, “Full node,” https://en.bitcoin.it/wik i/Full node, Feb 2019, [Online; Accessed on 06/20/2019]
work page 2019
-
[6]
“Blockchain Luxembourg S.A.” https://www.blockchain .com/charts/blocks-size, [Online; Accessed on 06/20/2019]
work page 2019
-
[7]
Ripple Documentation, “Capacity planning,” https://d evelopers.ripple.com/capacity-planning.html, [Online; Accessed on 06/20/2019]
work page 2019
-
[8]
Bitcoin: A peer-to-peer electronic cash s ystem,
S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash s ystem,” 2009. [Online]. Available: http://www. bitcoin.org/bitcoin.pdf
work page 2009
-
[9]
Bitcoin Wiki, “Simplified payment verfication,” https:/ /en.bitcoinwiki.org/wiki/Simplified Payment V erification, [Online; Accessed on 06/20/2019]
work page 2019
-
[10]
Ethereum Wiki, “Light client protocol,” https://gith ub.com/ethereum/wiki/wiki/Light-client-protocol, [On - line; Accessed on 06/20/2019]
work page 2019
-
[11]
BitcoinCore Documentation, “Running a full node,” htt ps://bitcoin.org/en/full-node#what-is-a-full-node, [Online; Accessed on 06/20/2019]
work page 2019
-
[12]
G. Karame and E. Audroulaki, Bitcoin and Blockchain Security . Norwood, MA, USA: Artech House, Inc., 2016
work page 2016
-
[13]
R. Documentation, “History sharding,” https://devel opers.ripple.com/history-sharding.html, [Online; Ac- cessed on 06/20/2019]
work page 2019
-
[14]
R. Motwani and P . Raghavan, Randomized Algorithms. Cambridge University Press, 1995
work page 1995
-
[15]
A dig ital fountain approach to reliable distribution of bulk data,
J. W . Byers, M. Luby, M. Mitzenmacher, and A. Rege, “A dig ital fountain approach to reliable distribution of bulk data,” in Proceedings of the ACM SIGCOMM ’98 Conference on Applicatio ns, T echnologies, Archi- tectures, and Protocols for Computer Communication , ser. SIGCOMM ’98. New Y ork, NY , USA: ACM, 1998, pp. 56–67
work page 1998
- [16]
-
[17]
D. J. C. MacKay, “Fountain codes,” IEE Proceedings - Communications, vol. 152, no. 6, pp. 1062–1068, Dec 2005
work page 2005
-
[18]
A. Shokrollahi and M. Luby, “Raptor codes,” F oundations and Trends in Communications and Information Theory, vol. 6, no. 3–4, pp. 213–322, 2011. [Online]. Available: ht tp://dx.doi.org/10.1561/0100000060
-
[19]
T. Richardson and R. Urbanke, Modern Coding Theory. New Y ork, NY , USA: Cambridge University Press, 2008
work page 2008
-
[20]
F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes , 2nd ed. North-holland Publishing Company, 1978
work page 1978
-
[21]
V . Buterin, “State tree pruning,” https://blog.ethereum.org/2015/06/26/state-tree-pruning/, Jun 2015, [Online; Accessed on 06/20/2019]
work page 2015
-
[22]
Pruning historical chain segments,
P . Szil´ agyi, “Pruning historical chain segments,” ht tps://gist.github.com/karalabe/ 60be7bef184c8ec286fc7ee2b35b0b5b, Nov 2018, [Online; Ac cessed on 06/20/2019]
work page 2018
-
[23]
A surve y on network codes for distributed storage,
A. G. Dimakis, K. Ramchandran, Y . Wu, and C. Suh, “A surve y on network codes for distributed storage,” Proceedings of the IEEE , vol. 99, no. 3, pp. 476–489, March 2011
work page 2011
-
[24]
Erasure codes for storage systems: A brief primer,
J. S. Plank, “Erasure codes for storage systems: A brief primer,” ;login: the Usenix magazine , vol. 38, no. 6, December 2013. 30 Coding for Blockchains
work page 2013
-
[25]
Erasure coding in windows azure storage,
C. Huang, H. Simitci, Y . Xu, A. Ogus, B. Calder, P . Gopala n, J. Li, and S. Y ekhanin, “Erasure coding in windows azure storage,” in Presented as part of the 2012 USENIX Annual T echnical Confer ence (USENIX ATC 12), Boston, MA, 2012, pp. 15–26
work page 2012
-
[26]
Erasure code-based low storage blockchain node
D. Perard, J. Lacan, Y . Bachy, and J. Detchart, “Erasure code-based low storage blockchain node,” CoRR, vol. abs/1805.00860, 2018. [Online]. Available: http://a rxiv.org/abs/1805.00860
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[27]
A low storage room r equirement framework for distributed ledger in blockchain,
M. Dai, S. Zhang, H. Wang, and S. Jin, “A low storage room r equirement framework for distributed ledger in blockchain,” IEEE Access, vol. 6, pp. 22 970–22 975, 2018
work page 2018
-
[28]
Dynamic Distributed Storage for Scaling Blockchains
R. K. Raman and L. R. V arshney, “Dynamic distributed sto rage for scaling blockchains,” CoRR, vol. abs/1711.07617, 2017. [Online]. Available: http://arxiv.org/abs/1711.07617
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Polyshard: Coded sharding achieves linearly scaling efficiency and security simultaneously,
S. Li, M. Y u, S. Avestimehr, S. Kannan, and P . Viswanath, “Polyshard: Coded sharding achieves linearly scaling efficiency and security simultaneously,” CoRR, vol. abs/1809.10361, 2018. [Online]. Available: http://arxiv.org/abs/1809.10361
-
[30]
Raptor codes on binary m emoryless symmetric channels,
O. Etesami and A. Shokrollahi, “Raptor codes on binary m emoryless symmetric channels,” IEEE Transac- tions on Information Theory , vol. 52, no. 5, pp. 2033–2051, May 2006
work page 2033
-
[31]
V erification-based dec oding for packet-based low-density parity-check codes,
M. G. Luby and M. Mitzenmacher, “V erification-based dec oding for packet-based low-density parity-check codes,” IEEE Transactions on Information Theory , vol. 51, no. 1, pp. 120–127, Jan 2005
work page 2005
-
[32]
V erification deco ding of raptor codes,
R. Karp, M. Luby, and A. Shokrollahi, “V erification deco ding of raptor codes,” in Proceedings. International Symposium on Information Theory, 2005. ISIT 2005. , Sep. 2005, pp. 1310–1314
work page 2005
-
[33]
Falcon codes: Fast, authenticated lt codes (or: Making rapid tornadoes unstoppable),
A. Juels, J. Kelley, R. Tamassia, and N. Triandopoulos, “Falcon codes: Fast, authenticated lt codes (or: Making rapid tornadoes unstoppable),” in Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security , ser. CCS ’15, 2015, pp. 1032–1047
work page 2015
-
[34]
Protocols for public key cryptosystems,
R. C. Merkle, “Protocols for public key cryptosystems, ” 1980 IEEE Symposium on Security and Privacy , pp. 122–122, 1980
work page 1980
-
[35]
Ethereum fast synchronization,
“Ethereum fast synchronization,” https://github.co m/ethereum/go-ethereum/pull/1889, Oct 2015, [Online; Accessed on 06/20/2019]
work page 2015
-
[36]
The collector’s problem with group drawing s,
W . Stadje, “The collector’s problem with group drawing s,” Advances in Applied Probability , vol. 22, no. 4, pp. 866–882, 1990
work page 1990
-
[37]
Efficient erasure decoding of Reed-Solomon codes
F. Didier, “Efficient erasure decoding of reed-solomon codes,” CoRR, vol. abs/0901.1886, 2009
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[38]
“Blockchain Luxembourg S.A.” https://www.blockchai n.com/en/charts/avg-block-size, [Online; Accessed on 06/20/2019]
work page 2019
-
[39]
Etherscan: Ethereum block size history,
“Etherscan: Ethereum block size history,” https://et herscan.io/chart/blocksize, [Online; Accessed on 06/20/2019]
work page 2019
-
[40]
Consensus in the Age of Blockchains
S. Bano, A. Sonnino, M. Al-Bassam, S. Azouvi, P . McCorry , S. Meiklejohn, and G. Danezis, “Consensus in the age of blockchains,” CoRR, vol. abs/1711.03936, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
Our oboros: A provably secure proof-of-stake blockchain protocol,
A. Kiayias, A. Russell, B. David, and R. Oliynykov, “Our oboros: A provably secure proof-of-stake blockchain protocol,” in Advances in Cryptology – CRYPTO 2017 , J. Katz and H. Shacham, Eds., 2017, pp. 357–388
work page 2017
-
[42]
S. Dziembowski, S. Faust, V . Kolmogorov, and K. Pietrza k, “Proofs of space,” Cryptology ePrint Archive, Report 2013/796, 2013, https://eprint.iacr.org/2013/796
work page 2013
-
[43]
Hybrid Consensus: Efficient Consens us in the Permissionless Model,
R. Pass and E. Shi, “Hybrid Consensus: Efficient Consens us in the Permissionless Model,” in 31st Interna- tional Symposium on Distributed Computing (DISC 2017) , vol. 91, 2017, pp. 39:1–39:16
work page 2017
-
[44]
Solida: A Blockchain Protocol Based on Reconfigurable Byzantine Consensus,
I. Abraham, D. Malkhi, K. Nayak, L. Ren, and A. Spiegelma n, “Solida: A Blockchain Protocol Based on Reconfigurable Byzantine Consensus,” in 21st International Conference on Principles of Distributed Systems (OPODIS 2017), vol. 95, 2018, pp. 25:1–25:19
work page 2017
-
[45]
Omniledger: A secure, scale- out, decentralized ledger via sharding,
E. Kokoris-Kogias, P . Jovanovic, L. Gasser, N. Gailly, E. Syta, and B. Ford, “Omniledger: A secure, scale- out, decentralized ledger via sharding,” in 2018 IEEE Symposium on Security and Privacy (SP) , May 2018, pp. 583–598. Kadhe et al. 31
work page 2018
-
[46]
Rapidchain: Sc aling blockchain via full sharding,
M. Zamani, M. Movahedi, and M. Raykova, “Rapidchain: Sc aling blockchain via full sharding,” in Proceed- ings of the 2018 ACM SIGSAC Conference on Computer and Commun ications Security, ser. CCS ’18, 2018, pp. 931–948
work page 2018
-
[47]
Snow white: Provably secu re proofs of stake,
P . Daian, R. Pass, and E. Shi, “Snow white: Provably secu re proofs of stake,” Cryptology ePrint Archive, Report 2016/919, 2016, https://eprint.iacr.org/2016/919
work page 2016
-
[48]
Algorand: Scaling byzantine agreements for cryptocurrencies,
Y . Gilad, R. Hemo, S. Micali, G. Vlachos, and N. Zeldovic h, “Algorand: Scaling byzantine agreements for cryptocurrencies,” in Proceedings of the 26th Symposium on Operating Systems Prin ciples, ser. SOSP ’17, 2017, pp. 51–68
work page 2017
-
[49]
Practical byzantine fault tol erance,
M. Castro and B. Liskov, “Practical byzantine fault tol erance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation , ser. OSDI ’99, 1999, pp. 173–186. A Proof of Lemma 1 The proof relies on three propositions. The first two proposi tions establish the behavior of the decoder in Step (3). Note that in every iteration, the ...
work page 1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.