Coded State Machine -- Scaling State Machine Execution under Byzantine Faults
Pith reviewed 2026-05-25 15:38 UTC · model grok-4.3
The pith
Coded State Machine achieves linear scaling in storage, throughput, and security simultaneously with network size for state machine execution under Byzantine faults.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose CSM, which achieves the optimal linear scaling in storage efficiency, throughput, and security simultaneously with the size of the network. The storage efficiency is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest. Using INTERMIX, the network nodes securely delegate their coding operations to a single worker node, and a small group of randomly selected auditor nodes verify its correctness, so that the per
What carries the argument
Lagrange coded states together with the INTERMIX verifiable delegation algorithm for matrix-vector multiplication.
If this is right
- Storage per state machine stays constant rather than growing with replication factor.
- Throughput grows linearly with the number of nodes while security level is preserved.
- A single worker can perform the coding operations for the entire network with verification by a small random auditor set.
- The same INTERMIX primitive can be reused for other verifiable linear computations beyond state-machine updates.
Where Pith is reading between the lines
- The same coding-plus-delegation pattern could be applied to other linear operations that dominate distributed ledgers or consensus protocols.
- If the storage claim holds, existing replication-based systems could be replaced by CSM without increasing total memory footprint.
- Real-world deployment would require checking whether the random auditor selection remains secure when the network size is moderate rather than asymptotic.
Load-bearing premise
Lagrange coded states and coded input commands occupy the same storage as their uncoded originals while still allowing secure recovery and verification in the presence of Byzantine faults.
What would settle it
A demonstration that recovering an original state or command from its Lagrange coded version requires strictly more than the original storage amount, or that the INTERMIX verification procedure fails to detect an incorrect result produced by a Byzantine worker.
Figures
read the original abstract
We introduce an information-theoretic framework, named Coded State Machine (CSM), to securely and efficiently execute multiple state machines on untrusted network nodes, some of which are Byzantine. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. We propose CSM, which achieves the optimal linear scaling in storage efficiency, throughput, and security simultaneously with the size of the network. The storage efficiency is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest. Using INTERMIX, the network nodes securely delegate their coding operations to a single worker node, and a small group of randomly selected auditor nodes verify its correctness, so that computational efficiency can scale almost linearly with the network size, without compromising on security.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Coded State Machine (CSM) framework for securely executing multiple state machines across untrusted nodes subject to Byzantine faults. It claims to achieve optimal linear scaling simultaneously in storage efficiency, throughput, and security with network size, via Lagrange coded states and coded input commands that match the storage size of their uncoded origins, together with the INTERMIX algorithm for information-theoretically verifiable delegation of matrix-vector multiplications.
Significance. If the central constructions hold, the result would be significant for distributed systems and coding theory, as it proposes a route to linear scaling in all three metrics without the usual redundancy overhead of classical state-machine replication. The introduction of INTERMIX as a standalone verifiable computation primitive is noted as potentially of independent interest.
major comments (2)
- [Abstract] Abstract: The headline claim of linear scaling in storage efficiency rests on the assertion that 'Lagrange coded states and coded input commands that require the same storage size as their origins' while still enabling information-theoretic recovery and verification under Byzantine faults. Standard information-theoretic Byzantine tolerance requires explicit redundancy; the manuscript must supply the explicit encoding, recovery threshold, and security proof showing that dimension or metadata overhead is avoided.
- [Abstract] Abstract: The subsequent claim that INTERMIX enables 'computational efficiency [to] scale almost linearly with the network size, without compromising on security' by delegating to a single worker and a small auditor group is load-bearing for the throughput claim. The security reduction from the matrix-vector verification to the overall CSM security must be stated explicitly.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on the abstract. We address the points below and will revise the manuscript to make the claims more explicit while preserving the technical content.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim of linear scaling in storage efficiency rests on the assertion that 'Lagrange coded states and coded input commands that require the same storage size as their origins' while still enabling information-theoretic recovery and verification under Byzantine faults. Standard information-theoretic Byzantine tolerance requires explicit redundancy; the manuscript must supply the explicit encoding, recovery threshold, and security proof showing that dimension or metadata overhead is avoided.
Authors: The full paper defines the Lagrange coded state as the evaluation of a polynomial of degree at most m-1 (m state machines) at a distinct point per node; each coded state is therefore a single vector of the same dimension as an uncoded state, with no auxiliary metadata. Recovery succeeds from any set of at least m + f + 1 honest nodes (f Byzantine), which is the standard threshold for information-theoretic Byzantine tolerance under this coding. The security argument appears in the main body via the properties of Reed-Solomon-like codes over a sufficiently large field. We will add a single sentence to the abstract that states the recovery threshold and points to the encoding definition and proof. revision: yes
-
Referee: [Abstract] Abstract: The subsequent claim that INTERMIX enables 'computational efficiency [to] scale almost linearly with the network size, without compromising on security' by delegating to a single worker and a small auditor group is load-bearing for the throughput claim. The security reduction from the matrix-vector verification to the overall CSM security must be stated explicitly.
Authors: INTERMIX provides an information-theoretic soundness guarantee for each matrix-vector multiplication: any incorrect result is detected by the auditor set except with probability exponentially small in the field size. Because every state update in CSM is realized by such a multiplication, the per-operation soundness composes directly with the existing Byzantine tolerance of the coded states; the random selection of auditors ensures that the probability of an undetected fault remains negligible over polynomially many steps. We will insert an explicit one-paragraph reduction statement in both the abstract and the security section of the revision. revision: yes
Circularity Check
No circularity; construction-based claims are self-contained
full rationale
The paper introduces CSM as a novel framework whose linear scaling claims follow directly from the explicit design of Lagrange coded states (defined to match original storage size) and the INTERMIX verifiable delegation algorithm. No load-bearing step reduces a claimed result to a fitted parameter, self-citation chain, or input by construction; the storage and verification properties are asserted as properties of the new coding scheme itself rather than derived from prior fitted quantities. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Information-theoretic security model for Byzantine faults
invented entities (2)
-
Coded State Machine (CSM)
no independent evidence
-
INTERMIX
no independent evidence
Reference graph
Works this paper leans on
-
[1]
K., Janakiraman, R., and Xu, L
Aguilera, M. K., Janakiraman, R., and Xu, L. Using erasure codes efficiently for storage in a distributed system. In 2005 International Conference on Dependable Systems and Networks (DSN’05) (2005), IEEE, pp. 336–345
work page 2005
-
[2]
Alsberg, P. A., and Day, J. D. A principle for resilient sharing of distributed resources . In Proceedings of the 2nd international conference on Softwar e engineering (1976), IEEE Com- puter Society Press, pp. 562–570
work page 1976
-
[3]
Balasubramanian, B., and Garg, V. K. Fault tolerance in distributed systems using fused data structures. IEEE transactions on parallel and distributed systems 24 , 4 (2013), 701–715
work page 2013
-
[4]
Balasubramanian, B., and Garg, V. K. Fault tolerance in distributed systems using fused state machines. Distributed computing 27 , 4 (2014), 287–311
work page 2014
-
[5]
Scalable, transparent, and post-quantum secure computational integrity
Ben-Sasson, E., Bentov, I., Horesh, Y., and Riabzev, M. Scalable, transparent, and post-quantum secure computational integrity. Cryptol. ePrint Arch., Tech. Rep 46 (2018), 2018
work page 2018
-
[6]
Succinct non-interactive zero knowledge for a von neumann architecture
Ben-Sasson, E., Chiesa, A., Tromer, E., and Virza, M. Succinct non-interactive zero knowledge for a von neumann architecture. In USENIX Security Symposium (2014), pp. 781– 796
work page 2014
-
[7]
Bitansky, N., Canetti, R., Chiesa, A., and Tromer, E. From extractable collision resistance to succinct non-interactive arguments of knowl edge, and back again. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conf erence (2012), ACM, pp. 326–349
work page 2012
-
[8]
Optimal resilience for erasure-coded byzantine distribut ed storage
Cachin, C., and Tessaro, S. Optimal resilience for erasure-coded byzantine distribut ed storage. In International Conference on Dependable Systems and Networ ks (DSN’06) (2006), IEEE, pp. 115–124
work page 2006
-
[9]
Cadambe, V. R., W ang, Z., and Lynch, N. Information-theoretic lower bounds on the storage cost of shared memory emulation. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (2016), ACM, pp. 305–313
work page 2016
-
[10]
Practical byzantine fault tolerance and proactive recover y
Castro, M., and Liskov, B. Practical byzantine fault tolerance and proactive recover y. ACM Transactions on Computer Systems (TOCS) 20 , 4 (2002), 398–461
work page 2002
-
[11]
Practical byzantine fault tolerance
Castro, M., Liskov, B., et al. Practical byzantine fault tolerance. In OSDI (1999), vol. 99, pp. 173–186
work page 1999
-
[12]
G., Ramchandran, K., Wu, Y., and Suh, C
Dimakis, A. G., Ramchandran, K., Wu, Y., and Suh, C. A survey on network codes for distributed storage. Proceedings of the IEEE 99 , 3 (2011), 476–489
work page 2011
-
[13]
Pow- erstore: proofs of writing for efficient and robust storage
Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., and Vuk olić, M. Pow- erstore: proofs of writing for efficient and robust storage. I n Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security (2013), ACM, pp. 285–298
work page 2013
-
[14]
A verifiable random function with short proofs and keys
Dodis, Y., and Yampolskiy, A. A verifiable random function with short proofs and keys. In International Workshop on Public Key Cryptography (2005), Springer, pp. 416–431. 16
work page 2005
-
[15]
On the minimal synchronism needed for distributed consensus
Dolev, D., Dwork, C., and Stockmeyer, L. On the minimal synchronism needed for distributed consensus. Journal of the ACM (JACM) 34 , 1 (1987), 77–97
work page 1987
-
[16]
Short-dot: Computing large linear transforms distributedly using coded short dot products
Dutta, S., Cadambe, V., and Grover, P. Short-dot: Computing large linear transforms distributedly using coded short dot products. In NIPS (2016), pp. 2100–2108
work page 2016
-
[17]
Consensus in the presence of partial syn- chrony
Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the presence of partial syn- chrony. Journal of the ACM (JACM) 35 , 2 (1988), 288–323
work page 1988
-
[18]
Efficient techniques for publicly verifiable delegation of computation
Elkhiyaoui, K., Önen, M., Azraoui, M., and Mol v a, R. Efficient techniques for publicly verifiable delegation of computation. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (2016), ACM, pp. 119–128
work page 2016
-
[19]
Publicly verifiable delegation of large polynomials and mat rix computations, with applications
Fiore, D., and Gennaro, R. Publicly verifiable delegation of large polynomials and mat rix computations, with applications. In Proceedings of the 2012 ACM conference on Computer and communications security (2012), ACM, pp. 501–512
work page 2012
-
[20]
Garg, V. K. Implementing fault-tolerant services using state machine s: Beyond replication. In International Symposium on Distributed Computing (2010), Springer, pp. 450–464
work page 2010
-
[21]
Non-interactive verifiable computing: Out- sourcing computation to untrusted workers
Gennaro, R., Gentry, C., and Parno, B. Non-interactive verifiable computing: Out- sourcing computation to untrusted workers. In Annual Cryptology Conference (2010), Springer, pp. 465–482
work page 2010
-
[22]
Hartmanis, J. Algebraic Structure Theory of Sequential Machines (Prenti ce-Hall Interna- tional Series in Applied Mathematics) . Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1966
work page 1966
-
[23]
Hendricks, J., Ganger, G. R., and Reiter, M. K. Low-overhead byzantine fault-tolerant storage. In ACM SIGOPS Operating Systems Review (2007), vol. 41, ACM, pp. 73–86
work page 2007
-
[24]
Kedlaya, K. S., and Umans, C. Fast polynomial factorization and modular composition. SIAM Journal on Computing 40 , 6 (2011), 1767–1802
work page 2011
-
[25]
Omniledger: A secure, scale-out, decentralized ledger
Kokoris-Kogias, E., Jov anovic, P., Gasser, L., Gailly, N., and Ford, B. Omniledger: A secure, scale-out, decentralized ledger. IACR Cryptology ePrint Archive 2017 (2017), 406
work page 2017
-
[26]
The implementation of reliable distributed multiprocess s ystems
Lamport, L. The implementation of reliable distributed multiprocess s ystems. Computer networks 2 , 2 (1978), 95–114
work page 1978
-
[27]
Lamport, L. The part-time parliament. ACM Transactions on Computer Systems (TOCS) 16, 2 (1998), 133–169
work page 1998
-
[28]
The byzantine generals problem
Lamport, L., Shostak, R., and Pease, M. The byzantine generals problem. ACM Trans- actions on Programming Languages and Systems (TOPLAS) 4 , 3 (1982), 382–401
work page 1982
-
[29]
Closed partition lattice and machine decomposition
Lee, D., and Yannakakis, M. Closed partition lattice and machine decomposition. IEEE Transactions on Computers 51 , 2 (2002), 216–228
work page 2002
-
[30]
Speed- ing up distributed machine learning using codes
Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., and Ram chandran, K. Speed- ing up distributed machine learning using codes. IEEE Transactions on Information Theory 64, 3 (2018), 1514–1529. 17
work page 2018
-
[31]
Li, S., Maddah-Ali, M. A., and A vestimehr, A. S. A unified coding framework for distributed computing with straggling servers. IEEE Workshop on Network Coding and Ap- plications (Sept. 2016)
work page 2016
-
[32]
A., Yu, Q., and A vestimehr, A
Li, S., Maddah-Ali, M. A., Yu, Q., and A vestimehr, A. S. A fundamental tradeoff between computation and communication in distributed comp uting. IEEE Transactions on Information Theory 64 , 1 (Jan. 2018)
work page 2018
-
[33]
A secure sharding protocol for open blockchains
Luu, L., Narayanan, V., Zheng, C., Ba weja, K., Gilbert, S., a nd Saxena, P. A secure sharding protocol for open blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016), ACM, pp. 17–30
work page 2016
-
[34]
MacWilliams, F. J., and Sloane, N. J. A. The theory of error-correcting codes . Elsevier, 1977
work page 1977
-
[35]
Micali, S., Rabin, M., and V adhan, S. Verifiable random functions. In 40th Annual Sympo- sium on Foundations of Computer Science (Cat. No. 99CB37039 ) (1999), IEEE, pp. 120–130
work page 1999
-
[36]
A simple fault tolerant distributed hash table
Naor, M., and Wieder, U. A simple fault tolerant distributed hash table. In International Workshop on Peer-to-Peer Systems (2003), Springer, pp. 88–97
work page 2003
-
[37]
Oki, B. M., and Liskov, B. H. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the seventh annual ACM Symposium on Principles of distributed computing (1988), ACM, pp. 8–17
work page 1988
-
[38]
Pinocchio: Nearly practical verifiable computation
Parno, B., Howell, J., Gentry, C., and Raykov a, M. Pinocchio: Nearly practical verifiable computation. Communications of the ACM 59 , 2 (2016), 103–112
work page 2016
-
[39]
On a decentralized trustless pseudo-random number generat ion algorithm
Popov, S. On a decentralized trustless pseudo-random number generat ion algorithm. Journal of Mathematical Cryptology 11 , 1 (2017), 37–43
work page 2017
-
[40]
Rashmi, K. V., Shah, N. B., and Kumar, P. V. Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-m atrix construction. IEEE Transactions on Information Theory 57 , 8 (2011), 5227–5239
work page 2011
-
[41]
Roth, R. Introduction to coding theory . Cambridge University Press, 2006
work page 2006
-
[42]
Sahraei, S., and A vestimehr, A. S. Interpol: Information theoretically verifiable polyno- mial evaluation. arXiv preprint arXiv:1901.03379 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[43]
Schneider, F. B. Implementing fault-tolerant services using the state mach ine approach: A tutorial. ACM Computing Surveys (CSUR) 22 , 4 (1990), 299–319
work page 1990
-
[44]
Skidanov, A. Unsolved problems in sharding. https: // medium. com/ nearprotocol/ unsolved-problems-in-blockchain-sharding-2327d6517f43
-
[45]
K., Gailly, N., Gasser, L ., Khoffi, I., Fischer, M
Syta, E., Jov anovic, P., Kogias, E. K., Gailly, N., Gasser, L ., Khoffi, I., Fischer, M. J., and Ford, B. Scalable bias-resistant distributed randomness. In IEEE Symposium on Security and Privacy (SP) (2017), IEEE, pp. 444–460
work page 2017
-
[46]
Tandon, R., Lei, Q., Dimakis, A. G., and Karampatziakis, N. Gradient coding: A void- ing stragglers in distributed learning. In Proceedings of the 34th International Conference on Machine Learning (Aug. 2017), pp. 3368–3376. 18
work page 2017
-
[47]
W ang, Z., and Cadambe, V. R. Multi-version coding—an information-theoretic perspect ive of consistent distributed storage. IEEE Transactions on Information Theory 64 , 6 (2018), 4540–4561
work page 2018
-
[48]
Yu, Q., Li, S., Ra viv, N., Kalan, S. M. M., Soltanolkotabi, M. , and A vestimehr, A. S. Lagrange coded computing: Optimal design for resiliency, s ecurity, and privacy. In NIPS Systems for ML Workshop (2018)
work page 2018
-
[49]
Yu, Q., Maddah-Ali, M. A., and A vestimehr, A. S. Polynomial codes: an optimal design for high-dimensional coded matrix multiplication. In NIPS (2017), pp. 4406–4416
work page 2017
-
[50]
New publicly verifiable computation for batch matrix multiplication
Zhang, X., Jiang, T., Li, K.-C., Castiglione, A., and Chen, X . New publicly verifiable computation for batch matrix multiplication. Information Sciences (2017)
work page 2017
-
[51]
Efficient secure and verifiable outsourcing of matrix multipl i- cations
Zhang, Y., and Blanton, M. Efficient secure and verifiable outsourcing of matrix multipl i- cations. In International Conference on Information Security (2014), Springer, pp. 158–178
work page 2014
-
[52]
Zou, Y. M. Representing boolean functions using polynomials: more ca n offer less. In International Symposium on Neural Networks (2011), Springer, pp. 290–296. A Field Extension for General Boolean Functions Using the construction of [52, Theorem 2], we can represent a ny arbitrary Boolean function f : {0, 1}n → { 0, 1} whose inputs are n binary variables ...
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.