Interpretable Deep Learning for Stock Returns: A Consensus-Bottleneck Asset Pricing Model

Bong-Gyu Jang; Changeun Kim; Younwoo Jeong

arxiv: 2512.16251 · v5 · submitted 2025-12-18 · 💱 q-fin.PR · cs.AI· cs.LG

Interpretable Deep Learning for Stock Returns: A Consensus-Bottleneck Asset Pricing Model

Changeun Kim , Younwoo Jeong , Bong-Gyu Jang This is my paper

Pith reviewed 2026-05-16 21:23 UTC · model grok-4.3

classification 💱 q-fin.PR cs.AIcs.LG

keywords deep learningasset pricinganalyst consensusbottleneck modelstock returnsinterpretabilityrisk factorsbelief-driven risk

0 comments

The pith

Embedding aggregate analyst consensus as a bottleneck in a neural network reveals priced stock-return variation missed by standard factor models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Consensus-Bottleneck Asset Pricing Model, a neural architecture that forces its internal representations through aggregate analyst forecasts before predicting returns. This structural constraint functions as an endogenous regularizer that raises out-of-sample accuracy while keeping the drivers economically legible. Portfolios formed on the resulting forecasts display a clear monotonic return pattern that holds across macroeconomic regimes. Diagnostics show that the extracted consensus component contains priced risk factors not spanned by conventional linear models.

Core claim

The CB-APM embeds aggregate analyst consensus as a structural bottleneck inside a deep network, treating professional beliefs as a sufficient statistic for the market's high-dimensional information set. The bottleneck simultaneously regularizes the model for better predictive performance and anchors its outputs to interpretable belief-driven drivers. Sorted portfolios on CB-APM forecasts exhibit a strong monotonic return gradient that remains robust across regimes, and the learned consensus captures priced variation that canonical factor models systematically miss.

What carries the argument

The consensus bottleneck, a layer that compresses the network's hidden state onto aggregate analyst forecasts and thereby serves as both regularizer and interpretable channel.

If this is right

Portfolios sorted on CB-APM forecasts exhibit a strong monotonic return gradient.
The learned consensus encodes priced variation not spanned by canonical factor models.
The bottleneck improves out-of-sample predictive accuracy while preserving economic interpretability.
The identified risk heterogeneity remains robust across macroeconomic regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the bottleneck truly isolates belief-driven risk, replacing analyst consensus with alternative belief proxies should produce comparable monotonic portfolios.
The approach suggests that belief heterogeneity may explain cross-sectional return patterns that linear models attribute to noise or omitted variables.
Testing the model on non-equity assets or during periods of analyst forecast dispersion spikes would reveal whether the mechanism generalizes beyond stocks.

Load-bearing premise

Professional beliefs serve as a sufficient statistic for the market's high-dimensional information set.

What would settle it

Out-of-sample portfolios sorted on CB-APM forecasts fail to display a monotonic return gradient, or the extracted consensus component is fully spanned by existing factor models such as Fama-French.

Figures

Figures reproduced from arXiv: 2512.16251 by Bong-Gyu Jang, Changeun Kim, Younwoo Jeong.

**Figure 1.** Figure 1: Architecture of the CB-APM. The model is composed of two modules, the consensus module f(ϕ) (left) and the prediction module g(θ) (right). The consensus module compresses firm-specific predictors I f i,t and macroeconomic variables I m t into a lower-dimensional consensus vector Cˆ i,t through a feedforward neural network. This bottleneck enforces interpretability by design, as each coordinate of Cˆ i,t i… view at source ↗

**Figure 2.** Figure 2: Gaussian Error Linear Unit (GELU) activation function. GELU is a smooth nonlinear activation that combines properties of the ReLU and sigmoid functions. 55 [PITH_FULL_IMAGE:figures/full_fig_p057_2.png] view at source ↗

**Figure 3.** Figure 3: Autoencoder-based macroeconomic embedding. The encoder narrows horizontally to compress high-dimensional macroeconomic inputs into a latent state zt , concatenated with firm-level features for return prediction. The decoder is used only during training for reconstruction loss. 56 [PITH_FULL_IMAGE:figures/full_fig_p058_3.png] view at source ↗

**Figure 4.** Figure 4: Expanding window evaluation. This figure illustrates the expanding-window procedure used for model evaluation. At each iteration, the available data are divided into three subsets: I (training set), II (validation set), and III (test set). The training set expands over time, while the validation and test sets are fixed in length at two years and one year, respectively. 57 [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 5.** Figure 5: Out-of-sample R2 of return predictions and consensus approximations. This figure presents monthly R2 of annual stock return estimation (left) and average R2 of analysts’ consensus variable approximation (right) across the entire evaluation sets for different λ settings. Return predictability improves sharply when consensus learning is introduced, peaking around λ = 0.3-0.4, and remains above the benchmark … view at source ↗

**Figure 6.** Figure 6: Estimated coefficients for consensus variables. Prediction module coefficient estimates at (λ = 1), plotted across expanding training windows. Each point denotes a coefficient for one consensus variable in a given split, colored by its out-ofsample R2 . Note: The y-axis displays model-derived consensus variables, not the raw consensus values. 59 [PITH_FULL_IMAGE:figures/full_fig_p061_6.png] view at source ↗

**Figure 7.** Figure 7: Out-of-sample cumulative returns of long-short decile portfolios. The figure plots cumulative log returns of value-weighted long-short decile portfolios formed from annual return forecasts, rebalanced monthly using out-of-sample predictions. Each line corresponds to a different hyperparameter λ, with the S&P 500 index buy-and-hold strategy (dashed) as a benchmark. The na¨ıve neural network (λ = 0) outperfo… view at source ↗

read the original abstract

We introduce the Consensus-Bottleneck Asset Pricing Model (CB-APM), which embeds aggregate analyst consensus as a structural bottleneck, treating professional beliefs as a sufficient statistic for the market's high-dimensional information set. Unlike post-hoc explainability approaches, CB-APM achieves interpretability-by-design: the bottleneck constraint functions as an endogenous regularizer that simultaneously improves out-of-sample predictive accuracy and anchors inference to economically interpretable drivers. Portfolios sorted on CB-APM forecasts exhibit a strong monotonic return gradient, robust across macroeconomic regimes. Pricing diagnostics further reveal that the learned consensus encodes priced variation not spanned by canonical factor models, identifying belief-driven risk heterogeneity that standard linear frameworks systematically miss.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The consensus bottleneck is a straightforward structural tweak for interpretability in neural asset pricing, but the evidence that it isolates new priced variation beyond linear factors is still thin.

read the letter

The paper's main move is to build a neural asset pricing model where aggregate analyst consensus acts as an explicit bottleneck layer. This forces the network to route its predictions through that consensus, which the authors treat as a sufficient statistic for the market's information set. The result is interpretability by construction instead of post-hoc explanations, and they report monotonic return gradients on sorted portfolios that hold across macro regimes. That architectural choice is the genuinely new element here; most prior deep-learning work in finance either skips structure or adds explanations after training. The setup also claims the learned consensus picks up priced risk that standard factor models miss, which would matter for both academic factor research and practical portfolio work if it checks out. The execution looks clean on the surface for what is described. The soft spot is the identification of that unspanned variation. Because the bottleneck is defined directly on the consensus variable, any apparent alpha after Fama-French-Carhart controls could simply reflect the network learning a nonlinear mapping of the same consensus data that linear models already use. The abstract gives no clear diagnostic that the residual after the bottleneck is orthogonal to the factor space once training is complete. If analysts miss tail risks or slow-moving variables, the compression step could discard exactly the heterogeneity the paper wants to highlight. Without seeing the full methods, data splits, and post-training orthogonality checks, it is hard to tell whether the priced-variation claim is independently identified or an artifact of the architecture. This is the kind of paper that belongs in a reading group for people working on structured machine learning in finance. Readers who care about blending economic priors with neural nets will get value from the design, even if the empirical claims need more scrutiny. It deserves a serious referee because the idea is concrete and the architecture is reproducible in principle; the review process would mainly press on the orthogonality tests and out-of-sample robustness rather than reject outright.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Consensus-Bottleneck Asset Pricing Model (CB-APM), a deep neural network architecture that imposes aggregate analyst consensus as a structural bottleneck, treating professional beliefs as a sufficient statistic for the market's high-dimensional information set. The model is claimed to achieve interpretability-by-design via the bottleneck regularizer, deliver superior out-of-sample predictive accuracy, generate portfolios with strong monotonic return gradients across macroeconomic regimes, and identify priced variation in the learned consensus that is not spanned by canonical linear factor models such as Fama-French-Carhart.

Significance. If the central claims hold after rigorous verification, the work offers a novel bridge between deep learning and asset pricing by embedding economic structure directly into the network rather than relying on post-hoc explanations. It could provide evidence for belief-driven risk heterogeneity that linear frameworks miss and supply a practical, interpretable tool for return forecasting and portfolio construction.

major comments (2)

[Abstract and §3] Abstract and §3 (model construction): the bottleneck is defined directly in terms of the same consensus variable used for both training and evaluation; without explicit residual orthogonality diagnostics after training (e.g., regression of CB-APM residuals on Fama-French-Carhart factors), it remains unclear whether the reported alphas reflect independently identified priced variation or a nonlinear transformation of the consensus already partially captured by linear models.
[§4] §4 (pricing diagnostics): the claim that the learned consensus encodes priced variation not spanned by canonical factors requires a formal test that the consensus residual is orthogonal to the factor space post-training; the abstract provides no such statistic or cross-validation procedure, leaving the central pricing result vulnerable to the circularity concern that the bottleneck compresses away precisely the variation it claims to isolate.

minor comments (2)

[Abstract] The abstract refers to 'robust across macroeconomic regimes' without specifying the regime classification method or the exact number of regimes tested.
[Model section] Notation for the bottleneck layer and the consensus variable should be introduced with explicit equations in the model section to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive comments. We address each major point below and will revise the manuscript to incorporate additional diagnostics that directly respond to the concerns.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (model construction): the bottleneck is defined directly in terms of the same consensus variable used for both training and evaluation; without explicit residual orthogonality diagnostics after training (e.g., regression of CB-APM residuals on Fama-French-Carhart factors), it remains unclear whether the reported alphas reflect independently identified priced variation or a nonlinear transformation of the consensus already partially captured by linear models.

Authors: We agree that explicit post-training orthogonality checks would strengthen the interpretation. In the CB-APM, the consensus acts as a structural bottleneck on the high-dimensional inputs, and the network is trained end-to-end to map this compressed representation to returns. The reported alphas arise from out-of-sample portfolio sorts on the resulting forecasts. To address the circularity concern directly, we will add regressions of the CB-APM residuals on the Fama-French-Carhart factors (and report R-squared and significance) in a revised §4, along with a brief discussion in the abstract. revision: yes
Referee: [§4] §4 (pricing diagnostics): the claim that the learned consensus encodes priced variation not spanned by canonical factors requires a formal test that the consensus residual is orthogonal to the factor space post-training; the abstract provides no such statistic or cross-validation procedure, leaving the central pricing result vulnerable to the circularity concern that the bottleneck compresses away precisely the variation it claims to isolate.

Authors: The referee is correct that a formal orthogonality test is needed to substantiate the claim of priced variation outside the linear factor space. While the architecture and out-of-sample results provide supporting evidence, we will add the requested formal test (regression of post-training residuals on the canonical factors) together with cross-validation statistics in the revised §4. We will also reference these diagnostics in the abstract to clarify that the bottleneck isolates incremental priced risk. revision: yes

Circularity Check

1 steps flagged

Consensus bottleneck renders 'unspanned priced variation' claim circular by construction

specific steps

self definitional [Abstract]
"We introduce the Consensus-Bottleneck Asset Pricing Model (CB-APM), which embeds aggregate analyst consensus as a structural bottleneck, treating professional beliefs as a sufficient statistic for the market's high-dimensional information set. ... Pricing diagnostics further reveal that the learned consensus encodes priced variation not spanned by canonical factor models, identifying belief-driven risk heterogeneity that standard linear frameworks systematically miss."

The model explicitly defines the bottleneck as the analyst consensus variable and then claims the output 'learned consensus' encodes additional priced variation. By construction the network is forced to compress all predictive signal through this input, so the unspanned-variation claim is equivalent to the bottleneck assumption rather than derived from it.

full rationale

The paper's core architecture defines the bottleneck directly as aggregate analyst consensus and treats it as a sufficient statistic for the full information set. All subsequent claims about the learned consensus encoding priced variation not spanned by canonical factors therefore reduce to transformations of this same input variable. The pricing diagnostics and portfolio sorts cannot isolate independent belief-driven heterogeneity because the network is structurally constrained to route signal through the consensus; any apparent alpha after factor controls is an artifact of the bottleneck definition rather than an emergent result. This is a self-definitional reduction with no independent identification step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption and no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Aggregate analyst consensus constitutes a sufficient statistic for the market's high-dimensional information set
Explicitly stated in the abstract as the modeling premise that justifies inserting consensus as the structural bottleneck.

pith-pipeline@v0.9.0 · 5414 in / 1154 out tokens · 73529 ms · 2026-05-16T21:23:58.602335+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Consensus-Bottleneck Asset Pricing Model (CB-APM), which embeds aggregate analyst consensus as a structural bottleneck, treating professional beliefs as a sufficient statistic for the market's high-dimensional information set.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the structural constraint acts as an endogenous regularizer that simultaneously improves out-of-sample predictive accuracy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

variables with missing-value rates exceeding 20% across the firm panel are removed

work page
[2]

variables with insufficient historical coverage (sample starting year after January 1994) are excluded. The resulting set of firm-level characteristics provides a balanced trade-off between data complete- ness and information diversity, ensuring that each firm contributes a meaningful set of observations to both the consensus and return-prediction modules...

work page 1994
[3]

4,683 firms with nonmissing analyst consensus data, 66

work page
[4]

114 firm-level predictors and 123 macroeconomic indicators (including 115 from FRED-MD and 8 from Welch and Goyal, 2008),

work page 2008
[5]

dying ReLU

a total of 605,722 firm-month observations spanning January 1994 to December 2023. This refined panel forms the empirical foundation for all model estimation and evaluation procedures described in Section 2. B.3 Data imputation Although the majority of studies neglect the importance of data imputation methods and simply handle missing values by substituti...

work page 1994

[1] [1]

variables with missing-value rates exceeding 20% across the firm panel are removed

work page

[2] [2]

variables with insufficient historical coverage (sample starting year after January 1994) are excluded. The resulting set of firm-level characteristics provides a balanced trade-off between data complete- ness and information diversity, ensuring that each firm contributes a meaningful set of observations to both the consensus and return-prediction modules...

work page 1994

[3] [3]

4,683 firms with nonmissing analyst consensus data, 66

work page

[4] [4]

114 firm-level predictors and 123 macroeconomic indicators (including 115 from FRED-MD and 8 from Welch and Goyal, 2008),

work page 2008

[5] [5]

dying ReLU

a total of 605,722 firm-month observations spanning January 1994 to December 2023. This refined panel forms the empirical foundation for all model estimation and evaluation procedures described in Section 2. B.3 Data imputation Although the majority of studies neglect the importance of data imputation methods and simply handle missing values by substituti...

work page 1994