pith. sign in

arxiv: 2601.16857 · v2 · submitted 2026-01-23 · 💻 cs.IT · math.IT

Perfect Privacy and Strong Stationary Times for Markovian Sources

Pith reviewed 2026-05-16 11:47 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords perfect privacyMarkov chainredaction mechanismsstrong stationary timeinformation-theoretic privacyerasure schemesoptimal distortionsequential redaction
0
0 comments X

The pith

Erasing data up to a strong stationary time achieves perfect privacy for Markov chain initial states while redacting only a constant average number of points independent of sequence length.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to share data from a finite time-homogeneous Markov chain while keeping the initial state perfectly private under an information-theoretic constraint. It shows that redaction mechanisms, which simply erase or release each point unchanged, can meet this privacy goal by stopping erasure at a strong stationary time. An optimal sequential redaction policy turns out to be equivalent to such a window-based scheme. Both approaches maximize the number of released points and do so with an average redaction count that stays fixed even as the data sequence grows arbitrarily long.

Core claim

For data generated by a finite time-homogeneous Markov chain, erasing observations up to a strong stationary time preserves perfect privacy of the initial state under suitable conditions on the chain. An optimal sequential redaction mechanism admits an equivalent window interpretation, and both schemes achieve the optimal expected Hamming distortion by redacting only a constant average number of data points independent of the data length N.

What carries the argument

Strong stationary time: the first time at which the Markov chain reaches its stationary distribution exactly, used as the stopping point for a window-based redaction prefix that severs dependence on the initial state.

If this is right

  • Perfect privacy is achievable with utility loss that does not grow with sequence length N.
  • Sequential redaction policies can be realized equivalently as fixed-window erasures tied to stationary times.
  • For long Markov sequences, nearly the entire data set can be released while satisfying the privacy constraint.
  • The average number of redactions remains bounded by a constant determined only by the chain, not by N.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to streaming or infinite-length Markov processes where only a finite average prefix needs redaction.
  • Explicit computation of the average redaction count could follow from known bounds on mixing or hitting times of the chain.
  • Similar window-based redaction ideas might apply to other chain-dependent privacy problems such as protecting a hidden state at a fixed time.
  • The equivalence between sequential and window mechanisms suggests that online privacy enforcement can be made offline-equivalent for this setting.

Load-bearing premise

Erasing data up to a strong stationary time is sufficient to preserve privacy of the initial state for the finite time-homogeneous Markov chain under consideration.

What would settle it

A concrete counterexample consisting of a small finite Markov chain in which an observer can still infer the initial state with positive probability from the data released after redaction up to the strong stationary time would disprove the privacy claim.

read the original abstract

We consider the problem of sharing correlated data under a perfect information-theoretic privacy constraint. We focus on redaction (erasure) mechanisms, in which data are either withheld or released unchanged, and measure utility by the average cardinality of the released set, equivalently, the expected Hamming distortion. Assuming the data are generated by a finite time-homogeneous Markov chain, we study the protection of the initial state while maximizing the amount of shared data. We establish a connection between perfect privacy and window-based redaction schemes, showing that erasing data up to a strong stationary time preserves privacy under suitable conditions. We further study an optimal sequential redaction mechanism and prove that it admits an equivalent window interpretation. Interestingly, we show that both mechanisms achieve the optimal distortion while redacting only a constant average number of data points, independent of the data length~$N$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies redaction mechanisms for releasing data generated by a finite time-homogeneous Markov chain while protecting the initial state under perfect information-theoretic privacy. It connects such privacy to strong stationary times, showing that erasure up to a strong stationary time preserves privacy under suitable conditions, establishes equivalence between an optimal sequential redaction mechanism and a window-based scheme, and claims both achieve optimal expected Hamming distortion by redacting only a constant average number of symbols independent of sequence length N.

Significance. If the central claims on privacy preservation and constant-redaction optimality hold, the results would establish a precise link between strong stationary times and perfect privacy for Markov sources, showing that utility loss need not grow with N. This would be of interest in information-theoretic privacy and data release for dependent sources.

major comments (2)
  1. [§3] §3 (privacy via strong stationary times): the argument that the released suffix after τ is independent of X_0 does not automatically extend to the marginal distribution of the observed redaction length, since the law of τ depends on the initial distribution μ; the skeptic concern that P(observed length) can leak information about X_0 therefore requires an explicit counter-argument or additional conditioning in the mechanism definition.
  2. [§5] §5 (optimality and equivalence): the proof that both mechanisms achieve the same optimal distortion with E[redactions] constant in N must specify the precise mixing or aperiodicity conditions on the chain under which the expectation remains bounded independently of N and of the starting distribution; without this, the independence-of-N claim is not fully load-bearing.
minor comments (2)
  1. Notation for the strong stationary time τ and the released suffix should be introduced with an explicit equation showing how the observed length is part of the released object.
  2. [Abstract] The abstract states 'under suitable conditions' for privacy preservation; these conditions should be stated explicitly already in the introduction rather than deferred to the technical sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and constructive comments on our manuscript. We address each major comment below and will incorporate revisions to clarify the privacy argument and specify the required conditions on the Markov chain.

read point-by-point responses
  1. Referee: [§3] §3 (privacy via strong stationary times): the argument that the released suffix after τ is independent of X_0 does not automatically extend to the marginal distribution of the observed redaction length, since the law of τ depends on the initial distribution μ; the skeptic concern that P(observed length) can leak information about X_0 therefore requires an explicit counter-argument or additional conditioning in the mechanism definition.

    Authors: We thank the referee for identifying this subtlety in the privacy argument. While the manuscript establishes that the suffix following a strong stationary time τ is independent of the initial state X_0 under the Markov property, we acknowledge that the dependence of the distribution of τ on the initial distribution μ requires careful handling to ensure the entire observed redacted sequence (including the length) is independent of X_0. We will revise §3 to provide an explicit proof of this independence by incorporating an additional randomization in the stopping rule that renders the distribution of the observed length independent of X_0, while maintaining the strong stationarity of the released suffix. This revision will include the necessary conditioning and a detailed verification that the mutual information I(X_0; observed data) = 0. revision: yes

  2. Referee: [§5] §5 (optimality and equivalence): the proof that both mechanisms achieve the same optimal distortion with E[redactions] constant in N must specify the precise mixing or aperiodicity conditions on the chain under which the expectation remains bounded independently of N and of the starting distribution; without this, the independence-of-N claim is not fully load-bearing.

    Authors: We appreciate the referee's request for precision on the conditions. The manuscript assumes a finite time-homogeneous Markov chain, but to make the constant-in-N claim rigorous, we will revise §5 to explicitly state that the chain is irreducible and aperiodic. Under these standard conditions, there exists a strong stationary time τ with finite expectation E[τ] < ∞ that is independent of the sequence length N and bounded uniformly over starting distributions (due to finiteness of the state space). Consequently, the expected number of redactions remains bounded by E[τ] independently of N. We will add this clarification and a brief reference to the relevant mixing time bounds to support the optimality result. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on standard Markov properties

full rationale

The paper connects perfect privacy to redaction up to a strong stationary time using the definition that the chain reaches stationarity at τ, making the suffix independent of the initial state conditional on τ. The optimal distortion claim and constant average redactions follow from bounding E[τ] for finite-state time-homogeneous Markov chains, which is independent of horizon N by standard renewal theory. No equation reduces a claimed prediction to a fitted parameter by construction, no load-bearing self-citation is invoked for a uniqueness theorem, and no ansatz is smuggled via prior work. The derivation is self-contained against external Markov chain benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the standard assumption that the source is a finite time-homogeneous Markov chain together with the existence and properties of strong stationary times; no free parameters or new invented entities are introduced.

axioms (1)
  • domain assumption Data are generated by a finite time-homogeneous Markov chain
    Stated explicitly as the data generation model in the problem setup.

pith-pipeline@v0.9.0 · 5444 in / 1151 out tokens · 33021 ms · 2026-05-16T11:47:24.000280+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

  1. [1]

    D. A. Levin and Y . Peres,Markov chains and mixing times. American Mathematical Soc., 2017, vol. 107

  2. [2]

    Shuffling cards and stopping times,

    D. Aldous and P. Diaconis, “Shuffling cards and stopping times,”The American Mathematical Monthly, vol. 93, no. 5, pp. 333–348, 1986

  3. [3]

    Pufferfish privacy mechanisms for correlated data,

    S. Song, Y . Wang, and K. Chaudhuri, “Pufferfish privacy mechanisms for correlated data,” inProceedings of the 2017 ACM International Conference on Management of Data, 2017, pp. 1291–1306

  4. [4]

    No free lunch in data privacy,

    D. Kifer and A. Machanavajjhala, “No free lunch in data privacy,” in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011, pp. 193–204

  5. [5]

    Quantifying differential privacy under temporal correlations,

    Y . Cao, M. Yoshikawa, Y . Xiao, and L. Xiong, “Quantifying differential privacy under temporal correlations,” in2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 2017, pp. 821–832

  6. [6]

    Blowfish privacy: Tuning privacy-utility trade-offs using policies,

    X. He, A. Machanavajjhala, and B. Ding, “Blowfish privacy: Tuning privacy-utility trade-offs using policies,” inProceedings of the 2014 ACM SIGMOD international conference on Management of data, 2014, pp. 1447–1458

  7. [7]

    Correlated differential privacy: Hiding information in non-iid data set,

    T. Zhu, P. Xiong, G. Li, and W. Zhou, “Correlated differential privacy: Hiding information in non-iid data set,”IEEE Transactions on Informa- tion Forensics and Security, vol. 10, no. 2, pp. 229–242, 2014

  8. [8]

    Correlated network data publication via differential privacy,

    R. Chen, B. C. Fung, P. S. Yu, and B. C. Desai, “Correlated network data publication via differential privacy,”The VLDB Journal, vol. 23, pp. 653–676, 2014

  9. [9]

    Bayesian differential privacy on correlated data,

    B. Yang, I. Sato, and H. Nakagawa, “Bayesian differential privacy on correlated data,” inProceedings of the 2015 ACM SIGMOD interna- tional conference on Management of Data, 2015, pp. 747–762

  10. [10]

    Dependence makes you vulnber- able: Differential privacy under dependent tuples

    C. Liu, S. Chakraborty, and P. Mittal, “Dependence makes you vulnber- able: Differential privacy under dependent tuples.” inNDSS, vol. 16, 2016, pp. 21–24

  11. [11]

    Protecting location privacy with personalized k- anonymity: Architecture and algorithms,

    B. Gedik and L. Liu, “Protecting location privacy with personalized k- anonymity: Architecture and algorithms,”IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp. 1–18, 2007

  12. [12]

    Protecting locations with differential privacy under temporal correlations,

    Y . Xiao and L. Xiong, “Protecting locations with differential privacy under temporal correlations,” inProceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 1298– 1309

  13. [13]

    Between close enough to reveal and far enough to protect: a new privacy region for correlated data,

    L. Maßny, R. Bitar, F. Ye, and S. E. Rouayheb, “Between close enough to reveal and far enough to protect: a new privacy region for correlated data,”arXiv preprint arXiv:2501.14313, 2025

  14. [14]

    From the information bottleneck to the privacy funnel,

    A. Makhdoumi, S. Salamatian, N. Fawaz, and M. Médard, “From the information bottleneck to the privacy funnel,” in2014 IEEE Information Theory Workshop (ITW 2014). IEEE, 2014, pp. 501–505

  15. [15]

    A compressive privacy approach to generalized informa- tion bottleneck and privacy funnel problems,

    S.-Y . Kung, “A compressive privacy approach to generalized informa- tion bottleneck and privacy funnel problems,”Journal of the Franklin Institute, vol. 355, no. 4, pp. 1846–1872, 2018

  16. [16]

    The information bottleneck method

    N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,”arXiv preprint physics/0004057, 2000

  17. [17]

    Fundamental limits of perfect privacy,

    F. P. Calmon, A. Makhdoumi, and M. Médard, “Fundamental limits of perfect privacy,” in2015 IEEE International Symposium on Information Theory (ISIT). IEEE, 2015, pp. 1796–1800

  18. [18]

    New privacy mech- anism design with direct access to the private data,

    A. Zamani, T. J. Oechtering, and M. Skoglund, “New privacy mech- anism design with direct access to the private data,”arXiv preprint arXiv:2309.09033, 2023

  19. [19]

    Statistical privacy mechanism design using separation technique,

    ——, “Statistical privacy mechanism design using separation technique,” in2024 32nd European Signal Processing Conference (EUSIPCO). IEEE, 2024, pp. 735–739

  20. [20]

    Secrecy by design with appli- cations to privacy and compression,

    Y . Y . Shkel, R. S. Blum, and H. V . Poor, “Secrecy by design with appli- cations to privacy and compression,”IEEE Transactions on Information Theory, vol. 67, no. 2, pp. 824–843, 2020

  21. [21]

    Information-theoretic privacy-preserving schemes based on perfect privacy,

    B. Rassouli and D. Gündüz, “Information-theoretic privacy-preserving schemes based on perfect privacy,”arXiv preprint arXiv:2301.11754, 2023

  22. [22]

    Mechanisms for hiding sensitive genotypes with information-theoretic privacy,

    F. Ye, H. Cho, and S. El Rouayheb, “Mechanisms for hiding sensitive genotypes with information-theoretic privacy,”IEEE transactions on information theory, vol. 68, no. 6, pp. 4090–4105, 2022

  23. [23]

    On-off privacy with correlated requests,

    C. Naim, F. Ye, and S. El Rouayheb, “On-off privacy with correlated requests,” in2019 IEEE International Symposium on Information Theory (ISIT). IEEE, 2019, pp. 817–821

  24. [24]

    Preserving on-off privacy for past and future requests,

    F. Ye, C. Naim, and S. El Rouayheb, “Preserving on-off privacy for past and future requests,” in2019 IEEE Information Theory Workshop (ITW). IEEE, 2019, pp. 1–5

  25. [25]

    On-off privacy against correlation over time,

    ——, “On-off privacy against correlation over time,”IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2104–2117, 2021

  26. [26]

    On-off privacy in the presence of correlation,

    ——, “On-off privacy in the presence of correlation,”IEEE Transactions on Information Theory, vol. 67, no. 11, pp. 7438–7457, 2021

  27. [27]

    Strong uniform times and finite random walks,

    D. Aldous and P. Diaconis, “Strong uniform times and finite random walks,”Advances in applied mathematics, vol. 8, no. 1, pp. 69–97, 1987

  28. [28]

    Strong stationary times via a new form of duality,

    P. Diaconis and J. A. Fill, “Strong stationary times via a new form of duality,”The Annals of Probability, pp. 1483–1522, 1990

  29. [29]

    Reversible markov chains and random walks on graphs,

    D. Aldous and J. A. Fill, “Reversible markov chains and random walks on graphs,” 2002, unfinished monograph, recompiled 2014, available at http://www.stat.berkeley.edu/$\sim$aldous/RWG/book.html

  30. [30]

    Efficient stopping rules for markov chains,

    L. Lovász and P. Winkler, “Efficient stopping rules for markov chains,” inProceedings of the twenty-seventh annual ACM symposium on Theory of computing, 1995, pp. 76–82

  31. [31]

    An interruptible algorithm for perfect sampling via markov chains,

    J. A. Fill, “An interruptible algorithm for perfect sampling via markov chains,” inProceedings of the twenty-ninth annual ACM symposium on Theory of computing, 1997, pp. 688–695

  32. [32]

    Analysis of top to random shuffles,

    P. Diaconis, J. A. Fill, and J. Pitman, “Analysis of top to random shuffles,”Combinatorics, probability and computing, vol. 1, no. 2, pp. 135–155, 1992

  33. [33]

    Group representations in probability and statistics,

    P. Diaconis, “Group representations in probability and statistics,”Lecture notes-monograph series, vol. 11, pp. i–192, 1988

  34. [34]

    Extended version of perfect privacy and strong stationary times for markovian sources,

    F. Ye, Z. Liu, P. Parag, and S. E. Rouayheb, “Extended version of perfect privacy and strong stationary times for markovian sources,” 2026, extended version available at https ://www.dropbox.com/scl/fo/l2jqya2oa2tolum8u5fyj/AJTs3asXrcWJqmx- 6z2-6TY

  35. [35]

    Eigenvalue bounds on convergence to stationarity for non- reversible markov chains, with an application to the exclusion process,

    J. A. Fill, “Eigenvalue bounds on convergence to stationarity for non- reversible markov chains, with an application to the exclusion process,” The annals of applied probability, pp. 62–87, 1991. APPENDIXA PROOF OFRESULTS INSECTIONV-A Proof of Lemma 1.First, by the chain rule, we have I(X0;Y [1:N] ) =I(X 0;Y 1, ..., Yτ , Yτ+1 , ..., YN )(20) (a) =I(X 0;τ,...

  36. [36]

    For everyt≥1, the quantity min y∈X P t(x, y) is independent of the initial statex∈ X

  37. [37]

    Proof of Proposition 1.We first prove thatP t holdsP t(gx, gy) =P t(x, y)by induction ont

    The transition matrixPis doubly stochastic, and hence the uniform distributionπ(y) = 1/|X |is a stationary distribution of the chain. Proof of Proposition 1.We first prove thatP t holdsP t(gx, gy) =P t(x, y)by induction ont. The caset= 1is the assumption. Assume it holds for somet≥1. Then for anyx, y, P t+1(gx, gy) = X z∈Ω P t(gx, z)P(z, gy). Use the bije...