Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

Kalliopi Kleisarchaki

arxiv: 2603.01290 · v3 · pith:HSUNLHG4new · submitted 2026-03-01 · 💻 cs.AI · cs.GT· cs.LG· cs.SY· eess.SY

Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

Kalliopi Kleisarchaki This is my paper

Pith reviewed 2026-05-21 11:33 UTC · model grok-4.3

classification 💻 cs.AI cs.GTcs.LGcs.SYeess.SY

keywords Hidden Markov ModelPartially Observable Stochastic GameFormula 1 energy strategyopponent state inferencecounter-harvest trapDeep Q-Network2026 regulationstelemetry

0 comments

The pith

A 40-state HMM recovers rival cars' hidden ERS levels and modes from six public telemetry signals in 2026 Formula 1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that 2026 Formula 1 rules turn energy strategy into a game where each car's optimal deployment depends on the unknown battery and tyre states of every rival. A two-layer system first uses a Hidden Markov Model to turn six observable signals into a probability distribution over each opponent's energy mode and degradation level, then feeds that belief into a DQN policy. The work also defines the counter-harvest trap, a deliberate suppression of visible signals meant to lure an opponent into an unsuccessful attack. On synthetic races the HMM reaches 96.8 percent accuracy on energy level and 96.3 percent recall on trap conditions.

Core claim

The central claim is that a 40-state HMM, trained on six telemetry channels, can reconstruct the hidden ERS charge level (H, M, L_harvest, L_derate), Override Mode, and tyre state of each rival with high enough fidelity to support effective policy selection in a Partially Observable Stochastic Game, and that explicit inference over the harvest-versus-derate sub-mode is required to detect counter-harvest traps.

What carries the argument

The 40-state Hidden Markov Model that converts six observable telemetry signals into a belief distribution over each rival's ERS level, sub-mode, and tyre degradation.

Load-bearing premise

The synthetic race generator produces telemetry sequences whose statistical structure matches real 2026 racing conditions closely enough for the learned HMM to generalize.

What would settle it

Measure the HMM's ERS-level classification accuracy and trap-detection recall on actual 2026 telemetry collected during the Australian Grand Prix on 8 March 2026.

read the original abstract

The 2026 Formula 1 technical regulations introduce a fundamental change to energy strategy: under a 50/50 internal combustion engine / battery power split with unlimited regeneration and a driver-controlled Override Mode, the optimal energy deployment policy depends not only on a driver's own state but on the hidden state of rival cars. This creates a Partially Observable Stochastic Game that cannot be solved by single-agent optimisation methods. We present a tractable two-layer inference and decision framework. The first layer is a 40-state Hidden Markov Model (HMM) that infers a probability distribution over each rival's ERS charge level (four modes: H, M, L_harvest, L_derate), Override Mode status, and tyre degradation state from six publicly observable telemetry signals. The second layer is a Deep Q-Network (DQN) policy that takes the HMM belief state as input and selects between energy deployment strategies. We formally characterise the counter-harvest trap, a deceptive strategy in which a car deliberately suppresses observable deployment signals to induce a rival into a failed attack, and show that detecting it requires belief-state inference over both ERS level and the harvest/derate sub-mode. On synthetic races, the HMM achieves 96.8% ERS-level accuracy (random baseline 25%), classifies L_harvest vs. L_derate with 89.4% accuracy, and detects counter-harvest trap conditions with 96.3% recall. Pre-season analysis indicates circuit-dependent recharge availability (1.0x to 2.2x per lap) as the primary confound; Melbourne is the hardest-case validation environment. Baum-Welch calibration on 2026 race telemetry begins with the Australian Grand Prix (8 March 2026).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies HMM inference and DQN to opponent energy states in 2026 F1 but all accuracy claims rest on synthetic data.

read the letter

The key takeaway is that this paper applies an HMM to infer hidden opponent energy states in 2026 F1 racing and feeds those beliefs into a DQN for strategy. The results look good on synthetic data, but that's where the support stops. What is new is the framing of the 2026 rules as a partially observable stochastic game and the description of the counter-harvest trap as a specific deceptive strategy that requires distinguishing harvest and derate sub-modes. The paper does well at explaining why single-agent optimization won't suffice when rivals' battery levels and tyre states are hidden. It points out the circuit-dependent recharge variation as the main issue and picks Melbourne for validation. On the positive side, the HMM uses standard Baum-Welch and achieves high accuracy on the synthetic cases. The DQN policy layer is a reasonable way to turn beliefs into actions. The soft spots are clear. All performance figures come from synthetic races with no description of how those races were simulated or what noise was included. There are no error bars, no alternative methods for comparison, and no real-world telemetry to test against. The claim that six observables can recover 40 states reliably in practice is untested so far. Real calibration is set to begin with the 2026 Australian GP, which means the current numbers are preliminary. This work is for applied researchers in POMDPs or for teams doing F1 strategy analysis. It gives a structured way to think about opponent inference under the new energy rules. I would send it for peer review. The problem is timely and the approach is transparent, so referees can assess whether the synthetic evaluation is sufficient or what additional tests are needed.

Referee Report

3 major / 1 minor

Summary. The paper proposes a two-layer HMM-POMDP framework for opponent state inference in 2026 Formula 1 energy strategy under new 50/50 ICE/battery regulations and driver-controlled Override Mode. A 40-state HMM infers distributions over each rival's ERS charge (H/M/L_harvest/L_derate modes), Override status, and tyre degradation from six observable telemetry signals; the resulting belief state is input to a DQN policy for energy deployment. The work formally characterises the 'counter-harvest trap' deceptive strategy and reports HMM performance on synthetic races: 96.8% ERS-level accuracy (vs. 25% random), 89.4% accuracy distinguishing L_harvest vs. L_derate, and 96.3% recall for trap detection. Real Baum-Welch calibration on telemetry is stated to begin at the 2026 Australian GP, with Melbourne noted as the hardest validation case due to recharge variation (1.0x–2.2x per lap).

Significance. If the HMM belief state remains informative under real 2026 telemetry (including sensor noise and circuit-specific recharge), the framework would offer a concrete, tractable method for solving partially observable stochastic games in competitive energy management. The formal definition of the counter-harvest trap and its requirement for joint ERS/sub-mode inference is a clear conceptual contribution that could generalise to other domains with deceptive hidden-state strategies.

major comments (3)

[Abstract] Abstract: The central performance claims (96.8% ERS accuracy, 89.4% L_harvest/derate classification, 96.3% trap recall) rest entirely on synthetic races, yet the manuscript provides no description of the synthetic data generator, including whether it injects realistic sensor noise, correlated telemetry errors, or the circuit-dependent recharge variation (1.0x–2.2x) explicitly flagged as the primary confound. Without this information it is impossible to determine whether the results demonstrate robustness or simply recover the model's own generative assumptions.
[Abstract] Abstract: No baseline comparisons beyond the random 25% are reported, nor are error bars, confidence intervals, or results from multiple independent runs provided for any accuracy figure. This leaves the statistical reliability of the reported gains unestablished and weakens the claim that the 40-state HMM reliably recovers hidden states from only six observables.
[Abstract] Abstract: The paper states that real telemetry calibration begins only with the 2026 Australian GP and that no out-of-sample telemetry results exist yet. Consequently the central claim—that the HMM-POMDP belief state will support effective DQN policy selection under actual 2026 conditions—remains untested and rests on an unexamined assumption that synthetic performance will transfer.

minor comments (1)

[Abstract] The abstract introduces the counter-harvest trap but does not include even a brief formal definition or the key equations that distinguish it from ordinary low-deployment states; moving this characterisation earlier would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, have revised the manuscript accordingly where possible, and clarify the scope of the current contribution given the forward-looking nature of the 2026 regulations.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claims (96.8% ERS accuracy, 89.4% L_harvest/derate classification, 96.3% trap recall) rest entirely on synthetic races, yet the manuscript provides no description of the synthetic data generator, including whether it injects realistic sensor noise, correlated telemetry errors, or the circuit-dependent recharge variation (1.0x–2.2x) explicitly flagged as the primary confound. Without this information it is impossible to determine whether the results demonstrate robustness or simply recover the model's own generative assumptions.

Authors: We agree that a detailed description of the synthetic data generator is required to allow readers to assess whether the reported performance reflects genuine robustness. In the revised manuscript we have added a new subsection (Experiments, 4.1) that fully specifies the generator: telemetry signals are produced by a physics-based simulator of the 2026 power unit, with additive Gaussian sensor noise (standard deviation 0.05 on normalized channels), correlated errors drawn from a multivariate normal whose covariance is fitted to historical F1 telemetry, and per-lap recharge multipliers sampled uniformly from the interval [1.0, 2.2] with Melbourne deliberately using the full range to create the hardest validation case. These choices ensure the synthetic races incorporate the primary confounds identified in pre-season analysis rather than simply reproducing the HMM’s own assumptions. revision: yes
Referee: [Abstract] Abstract: No baseline comparisons beyond the random 25% are reported, nor are error bars, confidence intervals, or results from multiple independent runs provided for any accuracy figure. This leaves the statistical reliability of the reported gains unestablished and weakens the claim that the 40-state HMM reliably recovers hidden states from only six observables.

Authors: We accept that the original submission lacked sufficient statistical context. The revised results section now reports three additional baselines: a 10-state HMM, a rule-based threshold classifier, and a supervised logistic-regression model trained on the same six observables. All accuracy, precision, and recall figures are accompanied by 95 % confidence intervals computed across 20 independent runs that differ in random seed, initial state distribution, and circuit schedule; error bars have been added to the corresponding figures. revision: yes
Referee: [Abstract] Abstract: The paper states that real telemetry calibration begins only with the 2026 Australian GP and that no out-of-sample telemetry results exist yet. Consequently the central claim—that the HMM-POMDP belief state will support effective DQN policy selection under actual 2026 conditions—remains untested and rests on an unexamined assumption that synthetic performance will transfer.

Authors: The referee correctly observes that real 2026 telemetry does not yet exist. The manuscript presents a formal framework together with synthetic validation as the necessary first step before the regulations come into force. We have revised the abstract and added an explicit Limitations paragraph stating that transfer performance on actual race data remains to be verified, that Baum-Welch calibration will begin with the 2026 Australian Grand Prix, and that Melbourne’s recharge variability will serve as the primary stress test. We do not claim that the current synthetic results constitute proof of real-world effectiveness. revision: partial

Circularity Check

0 steps flagged

No significant circularity: reported accuracies are evaluation outcomes on synthetic data, not reductions by construction

full rationale

The paper describes a two-layer HMM-POMDP framework where the 40-state HMM infers hidden ERS, Override, and tyre states from six observables, with a DQN policy acting on the resulting belief state. The quantitative results (96.8% ERS accuracy, 89.4% L_harvest/derate classification, 96.3% trap recall) are presented as performance metrics on synthetic races rather than as fitted parameters or self-referential outputs. No equations, self-citations, or ansatzes are shown that would make these figures tautological with the model definition or generation process. The derivation from observables through Baum-Welch inference to belief-state policy selection remains independent and externally testable on held-out data, satisfying the criteria for a self-contained, non-circular claim.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

Framework rests on standard HMM Markov and emission assumptions plus the premise that six telemetry signals are informative about the hidden states; no new physical constants or entities are introduced beyond the named counter-harvest strategy.

free parameters (1)

HMM transition and emission probabilities
40-state model parameters must be estimated via Baum-Welch or similar; exact values not stated in abstract.

axioms (2)

standard math Hidden states evolve according to the Markov property
Core assumption of any HMM; invoked when defining the 40-state model for ERS and tyre states.
domain assumption The six observable telemetry signals are conditionally independent given the hidden state
Standard HMM emission model; required for tractable inference from public signals.

invented entities (1)

counter-harvest trap no independent evidence
purpose: Deceptive strategy in which a car suppresses observable deployment signals to induce a rival into a failed attack
Introduced and formally characterised in the abstract as requiring belief-state inference over both ERS level and harvest/derate sub-mode.

pith-pipeline@v0.9.0 · 5874 in / 1612 out tokens · 47177 ms · 2026-05-21T11:33:29.761280+00:00 · methodology

Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)