Identification and estimation of dynamic random coefficient models

Wooyong Lee

arxiv: 2505.01600 · v3 · submitted 2025-05-02 · 💰 econ.EM

Identification and estimation of dynamic random coefficient models

Wooyong Lee This is my paper

Pith reviewed 2026-05-22 17:14 UTC · model grok-4.3

classification 💰 econ.EM

keywords panel datarandom coefficientspartial identificationdynamic modelsearnings dynamicsunobserved heterogeneityshort panels

0 comments

The pith

In short panels, linear models with individual-specific coefficients on predetermined regressors are partially identified, with sets for the mean, variance, and CDF of the coefficient distribution fully characterized.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that dynamic linear panel data models with individual-specific coefficients are not point-identified when the time dimension is short. Instead, it derives general characterizations of the identified sets for key features of the coefficient distribution, including its mean, variance, and cumulative distribution function. These characterizations apply to discrete, continuous, or unbounded data without additional restrictions. The resulting methods support practical estimation and inference, as demonstrated in an application to U.S. household earnings data from the PSID that uncovers substantial heterogeneity in earnings persistence.

Core claim

The central discovery is that the model with individual-specific coefficients on predetermined regressors is partially identified in short panels. The identified sets for the mean, variance, and CDF of the random coefficient distribution are characterized in a general manner that handles various data types and leads to tractable estimation procedures. An empirical application to lifecycle earnings dynamics indicates the presence of unobserved heterogeneity in earnings persistence.

What carries the argument

The general characterization of identified sets for the mean, variance, and CDF of the individual-specific coefficient distribution under predetermined regressors in short panels.

If this is right

The average effect of regressors can be bounded from the data.
The degree of heterogeneity, measured by variance, can be set-identified.
The possible distributions of coefficients can be described via their CDF bounds.
Computationally feasible estimators and inference methods are available for applied researchers.
Heterogeneity in earnings persistence contributes to differences in consumption and savings across households.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These partial identification results could be used to assess the robustness of standard fixed effects estimators in dynamic panels.
Extending the framework to include time-varying covariates might help explain sources of coefficient heterogeneity.
The findings on earnings risk variation suggest incorporating random coefficients into life-cycle models to better predict savings behavior.
Researchers in other fields with short panel data could apply similar bounding approaches to quantify heterogeneity.

Load-bearing premise

The panel data are generated by a linear model with predetermined regressors and individual-specific coefficients, meeting the conditions for the partial identification analysis to hold without further assumptions on data support or distributions.

What would settle it

Deriving the identified set for the variance of the coefficient distribution from actual short panel data and checking whether it reduces to a singleton point; if the characterization is correct, the set should generally remain an interval in finite samples from heterogeneous populations.

read the original abstract

I study linear panel data models with predetermined regressors (such as lagged dependent variables) where coefficients are individual-specific, allowing for heterogeneity in the effects of the regressors on the dependent variable. I show that the model is not point-identified in a short panel context but rather partially identified, and I characterize the identified sets for the mean, variance, and CDF of the coefficient distribution. This characterization is general, accommodating discrete, continuous, and unbounded data, and it leads to computationally tractable estimation and inference procedures. I apply the method to study lifecycle earnings dynamics among U.S. households using the Panel Study of Income Dynamics (PSID) dataset. The results suggest the presence of unobserved heterogeneity in earnings persistence, implying that households face varying levels of earnings risk which, in turn, contribute to heterogeneity in their consumption and savings behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lee shows partial identification for the distribution of random coefficients in short dynamic panels with predetermined regressors and gives tractable bounds plus a PSID application.

read the letter

The main thing to know is that this paper moves beyond point identification for random coefficient panel models by deriving sharp identified sets for the mean, variance, and CDF of the coefficient distribution when regressors are predetermined rather than strictly exogenous. The characterization covers discrete, continuous, and unbounded data without extra parametric restrictions on heterogeneity, and it produces moment inequalities that support estimation and inference procedures that look computationally workable. The PSID application then uses the framework to document heterogeneity in earnings persistence, which the author links to differences in consumption and savings risk across households. That empirical step is a straightforward way to show relevance for labor and consumption work. The derivations appear to start from the linear structure and predeterminedness condition without circularity or invented entities, and the stress-test note finds no internal contradictions. Soft spots are modest. The bounds' sharpness and finite-sample behavior of the inference methods would need checking in the full proofs, and the application could be stronger with more explicit robustness to support conditions or alternative lag structures. These are normal issues for a first version rather than load-bearing problems. The paper is aimed at applied econometricians who work with short panels and want to allow for coefficient heterogeneity without forcing point identification. Readers who already use partial ID tools or study earnings dynamics will get the most direct value. It deserves a serious referee because the core identification argument is grounded in standard primitives and the empirical illustration is concrete enough to evaluate.

Referee Report

2 major / 2 minor

Summary. The paper examines linear panel data models with individual-specific random coefficients and predetermined regressors (including lagged dependent variables). It establishes that the model is not point-identified in short panels but is partially identified, deriving sharp identified sets for the mean, variance, and CDF of the coefficient distribution under general conditions that accommodate discrete, continuous, and unbounded support. The results yield computationally tractable estimation and inference procedures, which are applied to PSID data on lifecycle earnings dynamics to document heterogeneity in earnings persistence.

Significance. If the partial identification arguments hold, the paper makes a useful contribution by extending partial identification methods to dynamic random coefficient panels without parametric restrictions on heterogeneity. The claimed generality across data types, the derivation of sharp sets for moments and the full CDF, and the empirical illustration of implications for consumption and savings heterogeneity are strengths that could inform applied work on earnings risk.

major comments (2)

[§3.2, Theorem 2] §3.2, Theorem 2: the sharp identified set for the variance of the random coefficient is characterized via a collection of moment inequalities; however, when the regressor includes a lagged dependent variable, the predeterminedness condition alone may not suffice to rule out all feasible joint distributions of (y_{it-1}, α_i) that violate the variance bounds, and the paper should provide an explicit counter-example or additional support restriction to confirm sharpness.
[§4.1, Eq. (18)] §4.1, Eq. (18): the linear programming formulation for estimating the identified set for the CDF assumes finite support discretization; the consistency proof does not explicitly address the rate at which the discretization grid must refine when the underlying support is unbounded, which is load-bearing for the claim of generality to continuous and unbounded data.

minor comments (2)

[Introduction] The notation for the individual-specific coefficient vector β_i is introduced only in Section 2; moving a compact definition to the introduction would improve readability for readers focused on the identification results.
[Table 1] Table 1 reports identified sets for the PSID application but omits the number of grid points used in the discretization; adding this detail would aid replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our paper. The points raised help clarify the identification arguments and the technical conditions for estimation. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [§3.2, Theorem 2] §3.2, Theorem 2: the sharp identified set for the variance of the random coefficient is characterized via a collection of moment inequalities; however, when the regressor includes a lagged dependent variable, the predeterminedness condition alone may not suffice to rule out all feasible joint distributions of (y_{it-1}, α_i) that violate the variance bounds, and the paper should provide an explicit counter-example or additional support restriction to confirm sharpness.

Authors: We appreciate the referee highlighting this subtlety in the dynamic setting. The predeterminedness assumption restricts the feasible joint distributions of (y_{it-1}, α_i) through the conditional moment restrictions used to derive the variance bounds in Theorem 2. To make this explicit, we will add a short remark and a simple numerical counter-example in the revision showing that distributions violating the bounds are ruled out under the maintained conditions. This confirms sharpness without introducing further support restrictions. We view this as a clarification rather than a substantive change to the result. revision: yes
Referee: [§4.1, Eq. (18)] §4.1, Eq. (18): the linear programming formulation for estimating the identified set for the CDF assumes finite support discretization; the consistency proof does not explicitly address the rate at which the discretization grid must refine when the underlying support is unbounded, which is load-bearing for the claim of generality to continuous and unbounded data.

Authors: We agree that the consistency argument for the linear programming estimator in Section 4.1 requires an explicit rate condition when the support is unbounded. In the revision we will augment the proof of consistency to specify that the discretization mesh must shrink at a rate slower than the convergence rate of the sample moments (e.g., o_p(n^{-1/4})). This addition preserves the paper’s generality claim for continuous and unbounded data while completing the technical argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core contribution is a partial identification analysis of linear panel models with individual-specific coefficients and predetermined regressors. The identified sets for the mean, variance, and CDF of the random coefficient distribution are derived directly from the model's moment inequalities under the maintained linear structure and predeterminedness assumptions, without any reduction to fitted parameters, self-definitional loops, or load-bearing self-citations. The characterization is presented as following from standard partial identification techniques applied to the primitives, and the empirical application to PSID data is separate from the theoretical derivation. The results accommodate general data types without smuggling in ansatzes or renaming known results as new derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the standard linear panel model with random coefficients and predetermined regressors; no free parameters, new axioms, or invented entities are described in the abstract.

axioms (1)

domain assumption Linear panel data model with predetermined regressors and individual-specific random coefficients.
This is the core setup stated in the abstract as the object of study.

pith-pipeline@v0.9.0 · 5657 in / 1200 out tokens · 63911 ms · 2026-05-22T17:14:56.126448+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

I show that the model is not point-identified in a short panel context but rather partially identified, and I characterize the identified sets for the mean, variance, and CDF of the coefficient distribution... recasting the identification problem as a linear programming problem... dual representation of infinite-dimensional linear programming
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

E(ε_it | Bi, Z_i, X_t_i) = 0 ... unconditional moment restrictions E(∑ (R'_it Bi) ε_it) = 0 and E(S_it ε_it) = 0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Consumption inequality and partial insurance

“Consumption inequality and partial insurance.”American Economic Review98 (5):1887–1921. Blundell, Richard, Luigi Pistaferri, and Itay Saporta-Eksten

work page 1921
[2]

Moment restrictions for nonlinear panel data models with feedback

“Moment restrictions for nonlinear panel data models with feedback.” Working paper, arXiv preprint arXiv:2506.12569. Browning, Martin, Mette Ejrnaes, and Javier Alvarez

work page arXiv
[3]

Inference on causal and structural parameters using many moment inequalities

“Inference on causal and structural parameters using many moment inequalities.”Review of Economic Studies 86 (5):1867–1900. Galichon, Alfred and Marc Henry

work page 1900
[4]

Identification and esti- mation of production function with unobserved heterogeneity

“Identification and esti- mation of production function with unobserved heterogeneity.”Working paper, arXiv preprint arXiv:2305.12067. Kiefer, Jack

work page arXiv
[5]

Optimum experimental designs

“Optimum experimental designs.”Journal of the Royal Statistical Society: Series B21 (2):272–304. Lasserre, Jean-Bernard. 2010.Moments, positive polynomials and their applications. World Scientific. 44 ———. 2015.An introduction to polynomial and semi-algebraic optimization. Cambridge University Press. Levinsohn, James and Amil Petrin

work page 2010
[6]

Components of variation in panel earnings data: American scientists 1960-70

“Components of variation in panel earnings data: American scientists 1960-70.”Econometrica47 (2):437–454. MaCurdy, Thomas E

work page 1960
[7]

A practical two-step method for testing moment inequalities

“A practical two-step method for testing moment inequalities.”Econometrica82 (5):1979–2002. Schennach, Susanne M

work page 1979
[8]

A simple, short, but never-empty confidence interval for partially identified parameters

“A simple, short, but never-empty confidence interval for partially identified parameters.”Working paper, arXiv preprint arXiv:2010.10484. Topel, Robert H and Michael P Ward

work page arXiv 2010
[9]

Nonparametric inference on state dependence in unem- ployment

“Nonparametric inference on state dependence in unem- ployment.”Econometrica87 (5):1475–1505. Van der Vaart, Aad W. 2000.Asymptotic statistics. Cambridge university press. 45 Wooldridge, Jeffrey M

work page 2000
[10]

l ∑ k=1 akAk #′

The numerator and denominator in (32) are both weakly positive, and each is equal to zero if and only if(R ′ iRi)−1eand(R ′ iRi)−1R′ iYi are degenerate across individuals, respec- tively. To show this, one can apply the following proposition to the functionsE(R ′ iRi) = e′(R′ iRi)−1eandD(R ′ iYi,R ′ iRi) = (R ′ iYi)′(R′ iRi)−1R′ iYi.□ Proposition 8(Kiefer...

work page 1959
[11]

m(w,b)− K ∑ k=1 λkϕk(w,b) # for allw∈ W. Since (37) maximizes the expectation off h, the optimal solutionf ∗ h for a fixed(λ 1, . . . ,λK) is given by: f ∗ h (w) =min b∈B

In what follows, I show that (14) is the dual representation of (13). The proof is a direct application of the duality theorem for linear programming over topo- logical vector spaces (Anderson, 1983). The same argument applies to (15). To apply the theorem, I first rewrite (13) into a standard form of linear programming, for which I introduce additional n...

work page 1983
[12]

min b∈B ( m(Wi,b) + KU ∑ k=1 λkϕk(Wi,b) + KC ∑ k=1 µk(A k(Wi,b))ψ k(Wi,b) )# (51) and U=min {λk} KU k=1,{µk} KC k=1 E

Assumption 13 (iii) and (iv) are restrictive, but they are useful enough for Lemma 1 and an alternative proof of Propo- sition 1 in Online Appendix B.2. Under these assumptions, I obtain the following theorem and lemma, which are coun- terparts of Theorem 2 and Lemma 2, respectively, by characterizing the identified setIof θand providing a necessary and s...

work page 1983
[13]

there isx0 in the interior ofPwithAx 0 =b

Under the conditions of (B), I use Theorem 9 of Anderson (1983). The sufficient conditions for this Theorem 9 are satisfied as follows. First, Assumption 13 (iv) verifies the condition that “there isx0 in the interior ofPwithAx 0 =b” in Theorem 9 of Anderson (1983). Second, Assumption 13 (i)-(iii) ensures that the primal problem in (53) possesses a finite...

work page 1983
[14]

dominate

B.4 Identified set for a general variance parameter In this subsection, I consider identification of a general variance parameter. Recall the dynamic random coefficient model defined in (1) and (2): Yit =R ′ itBi +ε it,E(ε it|Bi,Z i,X t i ) =0,t=1, . . . ,T, whereR it = (Z ′ it,X ′ it)′. I consider the second moments of the random coefficients: Ve =E(e ′ ...

work page 2010
[15]

to obtain (62) as the simplified dual representation of (61). Proposition 11 implies that theL1-penalized finite sample optimizers ˜λL N(ζ)and ˜λU N(ζ) defined in (57) are precisely the maximizer in (62) for the lower bound and and the cor- responding minimizer for the upper bound. I then implement the procedure of Andrews and Shi (2017) with a modificati...

work page 2017
[16]

I obtain this grid by adding Gaussian perturbations to ˜λL N(ζ)and ˜λU N(ζ), while including these optimizers themselves

Under this setup, I implement the heuristic modification of Andrews and Shi (2017) where I calculate the supremum over a finite grid ofLpoints in the neighborhoods of 73 ˜λL N(ζ)and ˜λU N(ζ). I obtain this grid by adding Gaussian perturbations to ˜λL N(ζ)and ˜λU N(ζ), while including these optimizers themselves. I then obtain the critical values via 100 b...

work page 2017
[17]

In particular, the upper confidence limits of E(ρi)are significantly less than 1, and the confidence intervals for the RIP and the HIP processes show substantial overlap

The estimation results are qualitatively similar to those in Table 2 in the main text. In particular, the upper confidence limits of E(ρi)are significantly less than 1, and the confidence intervals for the RIP and the HIP processes show substantial overlap. In what follows, I describe in detail the procedure that I used to numerically recover ˜Yit from th...

work page 2021

[1] [1]

Consumption inequality and partial insurance

“Consumption inequality and partial insurance.”American Economic Review98 (5):1887–1921. Blundell, Richard, Luigi Pistaferri, and Itay Saporta-Eksten

work page 1921

[2] [2]

Moment restrictions for nonlinear panel data models with feedback

“Moment restrictions for nonlinear panel data models with feedback.” Working paper, arXiv preprint arXiv:2506.12569. Browning, Martin, Mette Ejrnaes, and Javier Alvarez

work page arXiv

[3] [3]

Inference on causal and structural parameters using many moment inequalities

“Inference on causal and structural parameters using many moment inequalities.”Review of Economic Studies 86 (5):1867–1900. Galichon, Alfred and Marc Henry

work page 1900

[4] [4]

Identification and esti- mation of production function with unobserved heterogeneity

“Identification and esti- mation of production function with unobserved heterogeneity.”Working paper, arXiv preprint arXiv:2305.12067. Kiefer, Jack

work page arXiv

[5] [5]

Optimum experimental designs

“Optimum experimental designs.”Journal of the Royal Statistical Society: Series B21 (2):272–304. Lasserre, Jean-Bernard. 2010.Moments, positive polynomials and their applications. World Scientific. 44 ———. 2015.An introduction to polynomial and semi-algebraic optimization. Cambridge University Press. Levinsohn, James and Amil Petrin

work page 2010

[6] [6]

Components of variation in panel earnings data: American scientists 1960-70

“Components of variation in panel earnings data: American scientists 1960-70.”Econometrica47 (2):437–454. MaCurdy, Thomas E

work page 1960

[7] [7]

A practical two-step method for testing moment inequalities

“A practical two-step method for testing moment inequalities.”Econometrica82 (5):1979–2002. Schennach, Susanne M

work page 1979

[8] [8]

A simple, short, but never-empty confidence interval for partially identified parameters

“A simple, short, but never-empty confidence interval for partially identified parameters.”Working paper, arXiv preprint arXiv:2010.10484. Topel, Robert H and Michael P Ward

work page arXiv 2010

[9] [9]

Nonparametric inference on state dependence in unem- ployment

“Nonparametric inference on state dependence in unem- ployment.”Econometrica87 (5):1475–1505. Van der Vaart, Aad W. 2000.Asymptotic statistics. Cambridge university press. 45 Wooldridge, Jeffrey M

work page 2000

[10] [10]

l ∑ k=1 akAk #′

The numerator and denominator in (32) are both weakly positive, and each is equal to zero if and only if(R ′ iRi)−1eand(R ′ iRi)−1R′ iYi are degenerate across individuals, respec- tively. To show this, one can apply the following proposition to the functionsE(R ′ iRi) = e′(R′ iRi)−1eandD(R ′ iYi,R ′ iRi) = (R ′ iYi)′(R′ iRi)−1R′ iYi.□ Proposition 8(Kiefer...

work page 1959

[11] [11]

m(w,b)− K ∑ k=1 λkϕk(w,b) # for allw∈ W. Since (37) maximizes the expectation off h, the optimal solutionf ∗ h for a fixed(λ 1, . . . ,λK) is given by: f ∗ h (w) =min b∈B

In what follows, I show that (14) is the dual representation of (13). The proof is a direct application of the duality theorem for linear programming over topo- logical vector spaces (Anderson, 1983). The same argument applies to (15). To apply the theorem, I first rewrite (13) into a standard form of linear programming, for which I introduce additional n...

work page 1983

[12] [12]

min b∈B ( m(Wi,b) + KU ∑ k=1 λkϕk(Wi,b) + KC ∑ k=1 µk(A k(Wi,b))ψ k(Wi,b) )# (51) and U=min {λk} KU k=1,{µk} KC k=1 E

Assumption 13 (iii) and (iv) are restrictive, but they are useful enough for Lemma 1 and an alternative proof of Propo- sition 1 in Online Appendix B.2. Under these assumptions, I obtain the following theorem and lemma, which are coun- terparts of Theorem 2 and Lemma 2, respectively, by characterizing the identified setIof θand providing a necessary and s...

work page 1983

[13] [13]

there isx0 in the interior ofPwithAx 0 =b

Under the conditions of (B), I use Theorem 9 of Anderson (1983). The sufficient conditions for this Theorem 9 are satisfied as follows. First, Assumption 13 (iv) verifies the condition that “there isx0 in the interior ofPwithAx 0 =b” in Theorem 9 of Anderson (1983). Second, Assumption 13 (i)-(iii) ensures that the primal problem in (53) possesses a finite...

work page 1983

[14] [14]

dominate

B.4 Identified set for a general variance parameter In this subsection, I consider identification of a general variance parameter. Recall the dynamic random coefficient model defined in (1) and (2): Yit =R ′ itBi +ε it,E(ε it|Bi,Z i,X t i ) =0,t=1, . . . ,T, whereR it = (Z ′ it,X ′ it)′. I consider the second moments of the random coefficients: Ve =E(e ′ ...

work page 2010

[15] [15]

to obtain (62) as the simplified dual representation of (61). Proposition 11 implies that theL1-penalized finite sample optimizers ˜λL N(ζ)and ˜λU N(ζ) defined in (57) are precisely the maximizer in (62) for the lower bound and and the cor- responding minimizer for the upper bound. I then implement the procedure of Andrews and Shi (2017) with a modificati...

work page 2017

[16] [16]

I obtain this grid by adding Gaussian perturbations to ˜λL N(ζ)and ˜λU N(ζ), while including these optimizers themselves

Under this setup, I implement the heuristic modification of Andrews and Shi (2017) where I calculate the supremum over a finite grid ofLpoints in the neighborhoods of 73 ˜λL N(ζ)and ˜λU N(ζ). I obtain this grid by adding Gaussian perturbations to ˜λL N(ζ)and ˜λU N(ζ), while including these optimizers themselves. I then obtain the critical values via 100 b...

work page 2017

[17] [17]

In particular, the upper confidence limits of E(ρi)are significantly less than 1, and the confidence intervals for the RIP and the HIP processes show substantial overlap

The estimation results are qualitatively similar to those in Table 2 in the main text. In particular, the upper confidence limits of E(ρi)are significantly less than 1, and the confidence intervals for the RIP and the HIP processes show substantial overlap. In what follows, I describe in detail the procedure that I used to numerically recover ˜Yit from th...

work page 2021