A Three-Variable Benchmark for Post-GFC Covered Interest Parity Deviations

Useong Shin

arxiv: 2605.20137 · v3 · pith:H6KQZZFCnew · submitted 2026-05-19 · 💱 q-fin.GN

A Three-Variable Benchmark for Post-GFC Covered Interest Parity Deviations

Useong Shin This is my paper

Pith reviewed 2026-05-22 09:04 UTC · model grok-4.3

classification 💱 q-fin.GN

keywords covered interest parityCIP deviationsbenchmarkpost-GFCNFCIdollar indexyield curve slopeG10 currencies

0 comments

The pith

Three lagged public variables form a daily benchmark for post-GFC covered interest parity deviations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to supply a public daily benchmark for government-bond covered interest parity deviations after the global financial crisis, filling a gap that has left researchers without a standard reference comparable to factor models in asset pricing. It shows that three readily available lagged series—the National Financial Conditions Index, the nominal broad U.S. dollar index, and the Treasury 10-year minus 2-year yield slope—account for most of the observed deviations across G10 currencies plus the Korean won at various tenors. The same three variables retain explanatory power in leave-one-year-out tests. Cointegration checks, quarter-end filters, and aggregation comparisons indicate that the fit reflects a lasting background component rather than fleeting spikes or spurious level correlations. The resulting benchmark therefore supports consistent daily regressions without reliance on proprietary data.

Core claim

The paper establishes that a linear combination of three lagged public state variables—the National Financial Conditions Index, the nominal broad U.S. dollar index, and the Treasury 10-year minus 2-year slope—delivers strong in-sample and leave-one-year-out explanatory power for post-GFC government-bond CIP deviations in G10 plus KRW currency-tenor panels, while cointegration, quarter-end, and aggregation-difference diagnostics confirm that the benchmark isolates a persistent background component rather than short-maturity spikes or spurious correlations.

What carries the argument

A three-variable linear benchmark that uses lagged values of NFCI, the nominal broad U.S. dollar index, and the Treasury 10-year minus 2-year slope to predict CIP deviations at daily frequency.

If this is right

Enables daily-frequency regressions on CIP deviations that are comparable to standard factor models in asset pricing.
Distinguishes persistent background deviations from transient quarter-end effects.
Supports leave-one-year-out validation as a check against overfitting.
Applies uniformly across multiple tenors and G10 plus KRW currency pairs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could test whether additional candidate drivers of CIP deviations retain incremental power once this benchmark is included.
The same three variables might serve as controls when studying related phenomena such as cross-currency basis swaps or bank funding spreads.
Extensions to real-time releases of the input series could allow monitoring of CIP conditions during future stress episodes.

Load-bearing premise

The three variables capture a genuine persistent economic component rather than statistical artifacts, omitted short-term patterns, or data-specific features.

What would settle it

Substantial deterioration in out-of-sample explanatory power or outright failure of cointegration tests when the same three variables are applied to data after 2022 or to additional currency panels.

Figures

Figures reproduced from arXiv: 2605.20137 by Useong Shin.

**Figure 4.1.** Figure 4.1: Actual CIP deviations and baseline fitted values. The out-of-sample fitted values [PITH_FULL_IMAGE:figures/full_fig_p011_4_1.png] view at source ↗

**Figure 4.2.** Figure 4.2: Leave-one-year-out out-of-sample performance [PITH_FULL_IMAGE:figures/full_fig_p012_4_2.png] view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 4.3.** Figure 4.3: Expanding-window out-of-sample performance [PITH_FULL_IMAGE:figures/full_fig_p014_4_3.png] view at source ↗

**Figure 5.** Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 5.1.** Figure 5.1: Non-overlapping aggregation-difference performance [PITH_FULL_IMAGE:figures/full_fig_p020_5_1.png] view at source ↗

read the original abstract

This paper proposes a public daily-frequency benchmark for post-GFC government-bond CIP deviations. Although CIP deviations are observed daily, the literature lacks a canonical benchmark for daily regressions comparable to standard factor models in asset pricing. Using G10 plus KRW currency-tenor panels, I show that three lagged public state variables-NFCI, the nominal broad U.S. dollar index, and the Treasury 10-year minus 2-year slope-deliver strong in-sample and leave-one-year-out performance. Cointegration, quarter-end, and aggregation-difference diagnostics suggest that the benchmark captures a persistent background component rather than short-maturity quarter-end spikes or spurious level correlation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a simple three-variable lagged benchmark for daily post-GFC CIP deviations that performs adequately in basic checks but rests on cointegration evidence that may not be robust.

read the letter

The main takeaway is that three public lagged variables—NFCI, the nominal broad USD index, and the 10y-2y Treasury slope—deliver usable in-sample and leave-one-year-out fit for CIP deviations across G10 plus KRW currency-tenor pairs after the GFC. The paper positions this combination as a standardized daily benchmark for regressions, which fills a gap the abstract notes in the literature. It does a reasonable job keeping everything lagged and public, running out-of-sample tests, and adding cointegration, quarter-end, and aggregation-difference checks to argue the fit reflects a persistent component rather than temporary spikes or spurious correlation. That setup is transparent and replicable, which is a plus for empirical work. The soft spots center on the cointegration diagnostics. Standard residual-based tests can have low power with highly autocorrelated series like the USD index in moderate samples, and the paper would be stronger with explicit sensitivity checks on lag selection or deterministic terms. Without those, the claim that the benchmark isolates a true background factor is only partly convincing. The performance numbers also look plausible but not dramatically better than what simpler alternatives might achieve. This is aimed at researchers running daily CIP regressions who need a consistent, easy-to-update control set rather than a deep explanation of why deviations persist. A reader focused on post-crisis funding or arbitrage markets could borrow the benchmark for robustness checks. I would send it to peer review because the benchmark idea is practical and the data choices are clean, though revisions on the persistence tests would be needed.

Referee Report

2 major / 2 minor

Summary. This paper proposes a public daily-frequency benchmark for post-GFC government-bond CIP deviations. Using G10 plus KRW currency-tenor panels, three lagged public state variables—NFCI, the nominal broad U.S. dollar index, and the Treasury 10-year minus 2-year slope—deliver strong in-sample and leave-one-year-out performance. Cointegration, quarter-end, and aggregation-difference diagnostics suggest that the benchmark captures a persistent background component rather than short-maturity quarter-end spikes or spurious level correlation.

Significance. If the central claims hold, this would supply a simple, replicable public benchmark for daily CIP regressions analogous to standard factor models in asset pricing. The focus on lagged, publicly available variables and explicit out-of-sample plus diagnostic checks is a constructive contribution that could reduce data-mining concerns and support further work on post-GFC financial conditions.

major comments (2)

[§4.2] §4.2 Cointegration Diagnostics: residual-based tests (Engle-Granger or Phillips-Ouliaris) applied to highly autocorrelated series such as NFCI and the broad USD index can exhibit low power and size distortions in moderate samples. The manuscript should report results under alternative lag selections, deterministic terms, and perhaps Johansen trace tests; if the no-cointegration null is not rejected for most currency-tenor pairs under these variations, the claim that the benchmark isolates a true persistent background factor rather than correlated I(1) processes is materially weakened.
[Table 3] Table 3 (or equivalent regression-results table), leave-one-year-out panel: the reported R² and t-statistics for the three-variable specification must be shown alongside single-variable and random-walk benchmarks with the same lag structure; without these comparisons the incremental explanatory power of the three-variable benchmark cannot be assessed and the 'strong performance' claim remains unsubstantiated.

minor comments (2)

[Abstract] Abstract: include one or two headline quantitative metrics (e.g., average in-sample R² or out-of-sample RMSE) so readers can gauge the magnitude of the claimed performance without immediately consulting the tables.
[§2] §2 Data and variable construction: clarify the exact aggregation method used for the daily NFCI and slope series when aligning with currency-tenor CIP observations; any implicit smoothing or interpolation should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the robustness of our proposed benchmark. We respond to each major comment below and indicate the revisions we will implement.

read point-by-point responses

Referee: [§4.2] §4.2 Cointegration Diagnostics: residual-based tests (Engle-Granger or Phillips-Ouliaris) applied to highly autocorrelated series such as NFCI and the broad USD index can exhibit low power and size distortions in moderate samples. The manuscript should report results under alternative lag selections, deterministic terms, and perhaps Johansen trace tests; if the no-cointegration null is not rejected for most currency-tenor pairs under these variations, the claim that the benchmark isolates a true persistent background factor rather than correlated I(1) processes is materially weakened.

Authors: We agree that residual-based tests can have limited power against alternatives involving highly persistent series. In the revision we will add Johansen trace-test results for the same panels, using alternative lag lengths selected by AIC/BIC and both constant-only and constant-plus-trend specifications. These supplementary tables will be placed alongside the existing Engle-Granger results so readers can judge whether the evidence for cointegration is robust across methods. revision: yes
Referee: [Table 3] Table 3 (or equivalent regression-results table), leave-one-year-out panel: the reported R² and t-statistics for the three-variable specification must be shown alongside single-variable and random-walk benchmarks with the same lag structure; without these comparisons the incremental explanatory power of the three-variable benchmark cannot be assessed and the 'strong performance' claim remains unsubstantiated.

Authors: We accept that incremental explanatory power is best demonstrated by direct comparison. The revised leave-one-year-out table will report R² and t-statistics for each of the three individual lagged regressors, for the three-variable specification, and for a simple random-walk benchmark, all estimated with identical lag structure and sample. This will make the contribution of the multivariate benchmark transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark uses external lagged public variables with out-of-sample validation

full rationale

The paper proposes a benchmark for CIP deviations based on three explicitly public and lagged state variables (NFCI, broad USD index, Treasury slope) and evaluates their performance via in-sample regressions and leave-one-year-out cross-validation, along with separate cointegration, quarter-end, and aggregation diagnostics. These elements constitute standard econometric reporting on observed data rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. The variables are external to the target series, the validation methods are independent of the fitted coefficients, and no ansatz or uniqueness theorem is invoked. The chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms are mentioned in the abstract; the work rests on standard econometric assumptions for regression, cointegration, and out-of-sample testing.

axioms (1)

domain assumption The three public variables capture a persistent component of CIP deviations after appropriate diagnostics.
Invoked via the cointegration and quarter-end diagnostics described in the abstract.

pith-pipeline@v0.9.0 · 5627 in / 1057 out tokens · 34684 ms · 2026-05-22T09:04:35.710703+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three lagged public state variables—NFCI, the nominal broad U.S. dollar index, and the Treasury 10-year minus 2-year slope—deliver strong in-sample and leave-one-year-out performance. Cointegration, quarter-end, and aggregation-difference diagnostics
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Engle–Granger residual-based tests as a diagnostic check against this possibility

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.