arxiv: 2604.13458 · v1 · submitted 2026-04-15 · 💱 q-fin.GN · q-fin.PM· q-fin.RM

Recognition: unknown

Interpretable Systematic Risk around the Clock

Songrun He

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:26 UTC · model grok-4.3

classification 💱 q-fin.GN q-fin.PMq-fin.RM

keywords systematic jump riskhigh-frequency datanews narrativesLLM classificationrisk premiafactor-mimicking portfolioFama-MacBethmacroeconomic news

0 comments

The pith

Decomposing market jumps via LLM-classified news shows macroeconomic announcements carry the largest and most persistent risk premium.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper pairs high-frequency price jumps with contemporaneous news stories whose causes are categorized by a reasoning LLM to break down systematic risk into understandable types. This reveals that jumps linked to macroeconomic news command higher and more lasting compensation for risk than those from corporate earnings, politics, or other sources. From the heterogeneity, the author builds an annually rebalanced, real-time portfolio that mimics exposure to the most strongly priced jump category. The portfolio posts a high Sharpe ratio out of sample and adds significant return after standard factors are accounted for. Readers care because the method turns opaque overnight and weekend risk into a concrete, tradable signal for pricing and management.

Core claim

Combining high-frequency market data with news narratives identified as jump causes and classified by a state-of-the-art open-source reasoning LLM decomposes systematic jump risk into interpretable categories. These categories display clear heterogeneity in risk premia, with macroeconomic news delivering the largest and most persistent premium. The resulting insight supports construction of an annually rebalanced real-time Fama-MacBeth factor-mimicking portfolio that isolates the most strongly priced jump risk and achieves high out-of-sample Sharpe ratios plus significant alphas relative to standard factor models.

What carries the argument

LLM-based classification of contemporaneous news narratives that cause market jumps, which decomposes total jump risk into categories and isolates those with priced premia for portfolio construction.

If this is right

Macroeconomic news jumps exhibit the largest and most persistent risk premia among all categories.
The real-time Fama-MacBeth portfolio isolating priced jump risk achieves high out-of-sample Sharpe ratios.
This portfolio generates significant alphas after controlling for standard factor models.
Around-the-clock data uncovers priced risks invisible in daytime-only samples.
LLM narrative classification enables practical, interpretable identification of systematic risks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same classification pipeline could be applied to bonds, currencies, or commodities to extract priced jump factors in those markets.
As LLM accuracy on financial news improves, the separation between priced and unpriced jump categories may become sharper.
Regulators might monitor real-time category exposures to detect emerging concentrations in macroeconomic jump risk.
Traditional statistical factor models could be augmented with these narrative-derived factors for better explanatory power.

Load-bearing premise

The open-source reasoning LLM accurately and without systematic bias identifies the true underlying cause of each market jump from the available news narratives.

What would settle it

If the annually rebalanced factor-mimicking portfolio for the highest-premium jump category fails to deliver a high out-of-sample Sharpe ratio or significant alphas on new data periods.

read the original abstract

In this paper, I present the first comprehensive, around-the-clock analysis of systematic jump risk by combining high-frequency market data with contemporaneous news narratives identified as the underlying causes of market jumps. These narratives are retrieved and classified using a state-of-the-art open-source reasoning LLM. Decomposing market risk into interpretable jump categories reveals significant heterogeneity in risk premia, with macroeconomic news commanding the largest and most persistent premium. Leveraging this insight, I construct an annually rebalanced real-time Fama-MacBeth factor-mimicking portfolio that isolates the most strongly priced jump risk, achieving a high out-of-sample Sharpe ratio and delivering significant alphas relative to standard factor models. The results highlight the value of around-the-clock analysis and LLM-based narrative understanding for identifying and managing priced risks in real time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies LLM classification to high-frequency jump causes to build a real-time mimicking portfolio with reported out-of-sample Sharpe and alphas, but the missing validation on those labels is the central weakness.

read the letter

The main thing to know is that this paper decomposes systematic jump risk around the clock using LLM-labeled news narratives, finds macro news carries the largest premium, and turns the strongest category into an annually rebalanced Fama-MacBeth mimicking portfolio that shows high out-of-sample Sharpe and alphas versus standard factors. That combination of high-frequency jumps, narrative classification, and live portfolio construction is the actual novelty here. Prior jump-risk work has not done the around-the-clock plus LLM step in this way, and the out-of-sample results give the performance claim some grounding beyond in-sample fitting. The empirical steps are laid out clearly enough in the abstract to see the pipeline. Credit for trying to make the risk factor interpretable and tradable in real time. The soft spot is exactly where the stress test flags it: the LLM classification is load-bearing for the category heterogeneity and for which jumps get into the portfolio, yet the text gives no human validation, inter-rater agreement, prompt robustness checks, or tests against alternative models. Without those, any consistent mislabeling (for example, over-attributing jumps to macro events) directly affects the premia estimates and the Sharpe. Jump detection thresholds and data exclusions are also not detailed enough for full reproducibility. These are not minor; they sit at the center of the contribution. The circularity concern is partly addressed by the out-of-sample test and annual rebalancing, but the category selection still draws from the same in-sample patterns. This paper is for readers in empirical asset pricing who work with high-frequency data or alternative news sources. Someone looking for a fresh application of LLMs to priced risk will get value from the setup and the reported numbers, even if they plan to rerun the classification themselves. It is coherent on its own terms and shows clear thinking about the literature on jump risk, so it deserves a serious referee. I would send it to peer review and ask the authors to add validation and robustness sections on the LLM step before acceptance.

Referee Report

3 major / 2 minor

Summary. The paper claims to deliver the first around-the-clock decomposition of systematic jump risk by pairing high-frequency market data with contemporaneous news narratives whose causes are classified by a state-of-the-art open-source reasoning LLM. It reports significant heterogeneity in jump-risk premia across categories (macroeconomic news commanding the largest and most persistent premium) and constructs an annually rebalanced real-time Fama-MacBeth factor-mimicking portfolio that isolates the most strongly priced category, achieving high out-of-sample Sharpe ratios and significant alphas relative to standard factor models.

Significance. If the LLM classifications prove accurate and unbiased, the work would provide a valuable contribution by linking interpretable news-driven jump categories to priced risk premia and by demonstrating a practical, real-time portfolio construction that generates alphas. The out-of-sample testing and annual rebalancing are strengths that support the portfolio claim; however, the absence of any validation for the core classification step limits the immediate impact.

major comments (3)

[Methodology] The methodology section provides no details on jump detection thresholds, window sizes, or data exclusion rules (listed as free parameters in the analysis). These choices directly determine which jumps enter the LLM classification step and therefore affect all downstream heterogeneity results and portfolio performance metrics.
[LLM classification procedure] No human validation, inter-rater checks, prompt-robustness tests, or alternative-LLM comparisons are reported for the LLM-based cause classification. Because the category-specific betas, risk premia, and the selection of the 'most strongly priced' category for the mimicking portfolio rest entirely on these labels, the lack of validation is load-bearing for the central claims of heterogeneity and out-of-sample Sharpe ratios.
[Portfolio construction] The annually rebalanced Fama-MacBeth mimicking portfolio is built directly from the same in-sample jump-risk premia estimates used to document heterogeneity; while the out-of-sample test mitigates some circularity, the choice of which category to isolate is informed by the very heterogeneity the paper measures, requiring explicit discussion of potential selection bias.

minor comments (2)

[Abstract] The abstract and introduction would benefit from an explicit statement of the sample period, data sources (e.g., specific high-frequency index or futures), and number of jumps analyzed.
[Empirical results] Notation for the category-specific jump-risk premia and the mimicking-portfolio weights should be defined more clearly before the empirical results are presented.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We appreciate the emphasis on methodological transparency, validation of the LLM classifications, and potential biases in portfolio construction. Below, we provide point-by-point responses to the major comments and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Methodology] The methodology section provides no details on jump detection thresholds, window sizes, or data exclusion rules (listed as free parameters in the analysis). These choices directly determine which jumps enter the LLM classification step and therefore affect all downstream heterogeneity results and portfolio performance metrics.

Authors: We agree that the lack of specific details on these parameters is a shortcoming that needs to be addressed. In the revised manuscript, we will expand the methodology section to include a comprehensive description of the jump detection algorithm, specifying the exact thresholds used (such as the multiple of standard deviation for identifying jumps), the window sizes for volatility estimation, and all data exclusion rules (e.g., handling of market closures, low-volume periods, or overnight returns). Additionally, we will include robustness analyses showing how variations in these parameters affect the main results on risk premia heterogeneity and portfolio performance. revision: yes
Referee: [LLM classification procedure] No human validation, inter-rater checks, prompt-robustness tests, or alternative-LLM comparisons are reported for the LLM-based cause classification. Because the category-specific betas, risk premia, and the selection of the 'most strongly priced' category for the mimicking portfolio rest entirely on these labels, the lack of validation is load-bearing for the central claims of heterogeneity and out-of-sample Sharpe ratios.

Authors: This is a valid concern, as the reliability of the LLM classifications is central to our findings. Although the original manuscript did not include explicit validation steps, we recognize their importance. In the revision, we will add a new subsection on classification validation. This will include: (1) human annotation of a stratified random sample of 300 jump events by two independent annotators, with inter-rater agreement metrics (e.g., Cohen's kappa); (2) comparison of LLM outputs against these human labels to report accuracy, precision, and recall per category; (3) sensitivity tests to prompt variations and temperature settings; and (4) a comparison with classifications from an alternative model such as Llama-3 or GPT-4o. We believe these additions will substantiate the use of the LLM and bolster confidence in the heterogeneity results. revision: yes
Referee: [Portfolio construction] The annually rebalanced Fama-MacBeth mimicking portfolio is built directly from the same in-sample jump-risk premia estimates used to document heterogeneity; while the out-of-sample test mitigates some circularity, the choice of which category to isolate is informed by the very heterogeneity the paper measures, requiring explicit discussion of potential selection bias.

Authors: We thank the referee for highlighting this potential issue of selection bias. It is true that the choice of the macroeconomic news category for the mimicking portfolio is guided by the full-sample heterogeneity analysis. However, the annual rebalancing and out-of-sample evaluation are performed in a forward-looking manner using only information available at the time of rebalancing. To address the concern explicitly, we will revise the portfolio construction section to discuss the selection process in detail, including why macro news was chosen based on economic rationale and persistence. Furthermore, we will add robustness checks: (i) results for mimicking portfolios based on all categories, (ii) a version where the category is selected using only the first half of the sample and held fixed thereafter, and (iii) a discussion of how this affects the interpretation of the out-of-sample alphas and Sharpe ratios. We maintain that the real-time nature and out-of-sample testing provide substantial protection against overfitting, but agree that explicit discussion is warranted. revision: partial

Circularity Check

0 steps flagged

No significant circularity: derivation relies on external LLM labels and genuine out-of-sample portfolio tests

full rationale

The paper's core chain—LLM classification of jump narratives to reveal premia heterogeneity, followed by construction of an annually rebalanced Fama-MacBeth mimicking portfolio and evaluation of its out-of-sample Sharpe and alphas—does not reduce to self-definition or fitted inputs by construction. The LLM step is an external measurement tool whose accuracy is assumed rather than derived from the target results; the portfolio selection uses in-sample estimates but the performance claims are explicitly out-of-sample and therefore falsifiable on future data. No equations equate a 'prediction' to its own fitted parameters, no self-citations bear the central load, and no ansatz or renaming is smuggled in. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the accuracy of LLM narrative classification and on standard asset-pricing assumptions that risk premia are captured by factor-mimicking portfolios; no new entities are postulated.

free parameters (2)

annual rebalancing frequency
Chosen by the author; affects the real-time portfolio construction and reported Sharpe ratio.
jump detection threshold and window
High-frequency jump identification parameters that determine which events enter the news-matching step.

axioms (2)

domain assumption LLM classifications of news narratives accurately reflect the true economic cause of each price jump
Invoked when the paper states that narratives are 'identified as the underlying causes' and used to decompose risk premia.
standard math Fama-MacBeth cross-sectional regression recovers priced risk factors
Standard method in asset pricing; used to construct the mimicking portfolio.

pith-pipeline@v0.9.0 · 5424 in / 1433 out tokens · 26979 ms · 2026-05-10T12:26:33.334102+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references

[1]

Split the training pool into train/validation (80/20) stratified by the topic
[2]

Optimize cross-entropy with AdamW (lr=2×10 −5, weight decay=0.01)
[3]

Train up to 50 epochs with early stopping (patience=3) on weighted F1 of the validation set; keep the best checkpoint
[4]

regular-market

Evaluate on the held-out test set in yeary. I consider accuracy, precision, recall, and (weighted) F1 of the model. The final out-of-sample classification from ChronoBERT is obtained by concatenating the classifications from all test sets across all years. 65 6.4 Processing High-frequency Data In this section, I discuss the approach of processing the high...

1997