A Penalty-Free Pipeline for Direct Quantum-Annealer Portfolio Optimization

Luis Lozano

arxiv: 2605.17628 · v2 · pith:MUD6LUJXnew · submitted 2026-05-17 · 🪐 quant-ph · math.OC· q-fin.PM

A Penalty-Free Pipeline for Direct Quantum-Annealer Portfolio Optimization

Luis Lozano This is my paper

Pith reviewed 2026-05-20 12:16 UTC · model grok-4.3

classification 🪐 quant-ph math.OCq-fin.PM

keywords quantum annealingportfolio optimizationQUBOcardinality constraintchain breaksD-Wavepenalty encodingpost-processing

0 comments

The pith

Removing the cardinality penalty allows direct quantum-annealer portfolio optimization by sampling an objective-only QUBO and enforcing feasibility classically afterward.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard penalty-encoded QUBOs for portfolio optimization add a dense all-ones term from the cardinality constraint that completes the logical interaction graph and produces chain-break fractions of 71 to 92 percent on Pegasus and Zephyr hardware, yielding no feasible samples. The paper identifies this penalty structure, rather than hardware sparsity, as the binding limit at current scales. Formulating and sampling an objective-only QUBO from expected returns and risk-scaled covariance, then applying classical projection to enforce cardinality, reduces mean chain-break fractions per sample to at most 0.04 percent while recovering feasible portfolios whose energy matches or beats a greedy heuristic on tested betting and equity instances up to N=49.

Core claim

The cardinality penalty contributes a dense rank-one term proportional to the all-ones matrix that makes the logical interaction graph complete regardless of the covariance structure. On Pegasus and Zephyr this produces chain-break fractions reaching 83 percent at N=24 and 92 percent at N=49 with no feasible samples. Dropping the penalty entirely, building an objective-only QUBO, sampling it on D-Wave Advantage and Advantage2, and enforcing the cardinality constraint classically as post-processing drops mean chain-break fractions to at most 0.04 percent, produces lower-energy feasible portfolios than the greedy heuristic on betting at N=39 and 48, and keeps equity post-processed regret at or

What carries the argument

Objective-only QUBO sampled directly on the annealer, followed by classical cardinality projection that replaces the dense penalty term.

If this is right

Chain-break fractions per sample fall from the 71-92 percent range to at most 0.04 percent on D-Wave Advantage and Advantage2 for equities up to N=49 and betting up to N=48.
The QPU returns lower-energy feasible portfolios than the greedy heuristic on betting instances at N=39 and N=48.
Equity post-processed regret stays at most 0.03 percent at all tested scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For other cardinality-constrained combinatorial problems the same penalty-free sampling plus classical projection may outperform topology-aware sparsification.
Hybrid quantum-classical pipelines that treat post-processing as first-class rather than auxiliary could become the practical route on near-term annealers even as connectivity improves.
The result implies that penalty design choices can dominate embedding and topology considerations in current direct QPU optimization.

Load-bearing premise

Samples drawn from the unconstrained objective-only QUBO still contain high-quality feasible portfolios that a classical projector can recover efficiently.

What would settle it

If the low-energy samples from the objective-only QUBO on a given instance contain no portfolios whose projected feasible versions achieve objective values competitive with known classical solutions, the post-processing recovery step would fail to produce usable results.

Figures

Figures reproduced from arXiv: 2605.17628 by Luis Lozano.

**Figure 2.** Figure 2: Settlement graph versus penalty-encoded QUBO for a 3-match betting slate ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Three-stage direct-QPU pipeline for penalty-encoded portfolio QUBOs: sparsification [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Raw QPU sample cardinality collapse under penalty encoding. Left: target cardinality [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Mean chain length versus problem size for dense and best-sparse (top- [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: QPU vs. projector ablation. Random projection (mean) degrades with [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Logical edge counts after sparsification for equity and betting instances. The dense [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Objective regret vs. qubit overhead ratio. Lower-left is better. Domain-prior methods [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Physical qubit count and mean chain length on Pegasus vs. Zephyr for the same logical [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Qubit overhead ratio and mean chain length as a function of logical graph density. All [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Average pairwise support Jaccard overlap between sparsifiers across runs and in [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Out-of-sample validation summary. Left: equity realized Sharpe ratio by method. [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Standard penalty-encoded pipeline (top) versus penalty-free pipeline (bottom). The [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Head-to-head comparison of chain-break fractions: penalized pipeline (red) versus [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

read the original abstract

Cardinality-constrained portfolio selection is routinely cast as a quadratic unconstrained binary optimization (QUBO) and submitted to a quantum processing unit (QPU) for direct annealing. We show that this standard penalty encoding is the binding constraint for direct-QPU execution on current D-Wave Pegasus and Zephyr hardware. Expanding the exact cardinality penalty contributes a dense rank-one term that makes the logical interaction graph complete regardless of the covariance, producing chain-break fractions from 83% at small universes up to 92% at the full forty-nine-industry Fama--French universe, and zero feasible raw samples at every tested scale. Topology-aware sparsification reduces chain breaks to near zero, but any sparsifier that removes off-diagonal entries also dilutes the cardinality constraint; an ablation reveals that this sparsify-and-project pipeline is dominated by the classical projector, not the QPU. We propose removing the penalty entirely: sample an objective-only QUBO built from expected returns and the risk-scaled covariance on hardware, and enforce cardinality classically through a deterministic feasibility projector. Across 4,468 saved embedding records on live Pegasus and Zephyr hardware, spanning equities up to forty-nine assets and football-betting instances up to forty-eight, this penalty-free pipeline reduces mean chain-break fractions from 71%--92% down to at most 0.04%, and post-processed regret is at most 0.03% relative to greedy classical references at every tested scale. We do not claim quantum advantage; the penalty encoding, not the sparse hardware topology, is the limiting factor for direct-QPU portfolio optimization at currently accessible scales.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Dropping the cardinality penalty lets D-Wave sample the raw objective QUBO with almost no chain breaks, and classical projection recovers low-regret feasible portfolios on the tested equity and betting instances.

read the letter

The main point is that the standard penalty term for cardinality turns the logical graph dense and drives chain breaks above 70 percent on Pegasus and Zephyr, while removing the penalty and enforcing the constraint after sampling drops breaks to 0.04 percent or less and still produces competitive solutions up to N=49. The paper shows this clearly with the rank-one all-ones update explanation and the direct hardware measurements on Advantage and Advantage2. The energy win over greedy on the betting cases at N=39 and 48, plus the 0.03 percent equity regret, gives concrete evidence that the approach can work on real devices without heavy sparsification tricks. That part is useful for anyone trying to run mean-variance problems directly on current annealers. The softer part is that the results appear tied to the structure of the chosen instances. The abstract itself notes that for betting with settlement-graph priors the classical projector largely explains the outcome, which leaves open whether the unconstrained samples reliably overlap with good feasible regions on other covariance or return data. No error bars, no full projector description, and limited ablations make it harder to judge how much the QPU is actually contributing versus the post-processing step. This is aimed at researchers working on quantum optimization for finance who need workable pipelines on existing hardware rather than theoretical bounds. A reader who cares about practical chain-break mitigation and simple post-processing would find the empirical contrast worth their time. I would send it for peer review because the hardware data and the diagnosis of the penalty bottleneck are substantive enough to discuss, even if revisions should tighten the controls and test more varied instances.

Referee Report

3 major / 3 minor

Summary. The paper claims that penalty-encoded QUBO formulations for cardinality-constrained portfolio optimization introduce a dense rank-one all-ones interaction that causes high chain-break fractions (71-92%) on D-Wave Pegasus/Zephyr hardware, yielding no feasible samples. It proposes a penalty-free pipeline that samples an objective-only QUBO (expected returns plus risk-scaled covariance) directly on the QPU and enforces the cardinality constraint K via classical post-processing projection. On equities (N≤49) and betting (N≤48), this reduces mean chain-break fractions to ≤0.04%, produces ≤0.03% equity regret, and yields lower-energy feasible solutions than a greedy heuristic on betting instances at N=39 and 48. The central conclusion is that the penalty term, rather than sparse hardware topology, is the binding constraint for direct QPU portfolio optimization at current scales.

Significance. If the empirical results hold, the work provides concrete hardware evidence that removing the cardinality penalty enables feasible sampling on current annealers and that hybrid quantum-classical post-processing can recover competitive portfolios. The reported drop in chain breaks from 71-92% to 0.04% and the energy comparisons on real D-Wave Advantage/Advantage2 devices constitute useful empirical data for the field. The identification of the structural origin of the dense logical graph is a clear contribution, though the broader claim that this pipeline is generally effective rests on the unproven assumption that objective-only samples overlap sufficiently with high-quality feasible regions.

major comments (3)

[Abstract and §3] Abstract and §3: The central claim that the cardinality penalty produces a dense rank-one term proportional to the all-ones matrix (making the logical graph complete) is load-bearing. The manuscript should explicitly display the QUBO matrix decomposition or derive the rank-one update to confirm that this term dominates irrespective of the covariance structure.
[§5 (ablation)] §5 (ablation): The ablation shows that for betting instances the classical projector alone explains performance. This directly undermines the claim that the QPU sampling step contributes meaningfully for equities; without a parallel ablation or isolation experiment (e.g., comparing projector output on random vs. QPU samples) the evidence that the penalty-free QUBO is responsible for the ≤0.03% regret is incomplete.
[Results section] Results section: The weakest assumption—that unconstrained objective-only samples contain high-quality feasible portfolios recoverable by the projector—is tested only on the reported equity and betting instances. The manuscript should include at least one counter-example instance where covariance eigenvalues or return vectors strongly bias toward extreme sparsity/density, to test whether the projector still recovers competitive solutions when the feasible manifold lies far from the objective minima.

minor comments (3)

[Methods] The post-processing projector is referenced but never given pseudocode or a precise algorithmic description, hindering reproducibility.
[Results] Chain-break fractions and regret values are reported without error bars or standard deviations across reads or random seeds.
[Figures] Figure captions and axis labels for energy-comparison plots should explicitly distinguish QPU+projector from pure classical baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3: The central claim that the cardinality penalty produces a dense rank-one term proportional to the all-ones matrix (making the logical graph complete) is load-bearing. The manuscript should explicitly display the QUBO matrix decomposition or derive the rank-one update to confirm that this term dominates irrespective of the covariance structure.

Authors: We agree this clarification will improve the manuscript. The penalty term is of the form λ (1^T x - K)^2. Expanding for binary x yields a constant, a linear term, and a quadratic term λ 1 1^T (plus diagonal adjustments from x_i^2 = x_i). This rank-one all-ones update is added to the objective QUBO independently of the covariance matrix and therefore renders the logical graph dense for any covariance structure. We will insert the explicit matrix decomposition and derivation in the revised §3. revision: yes
Referee: [§5 (ablation)] §5 (ablation): The ablation shows that for betting instances the classical projector alone explains performance. This directly undermines the claim that the QPU sampling step contributes meaningfully for equities; without a parallel ablation or isolation experiment (e.g., comparing projector output on random vs. QPU samples) the evidence that the penalty-free QUBO is responsible for the ≤0.03% regret is incomplete.

Authors: The betting instances possess strong settlement-graph priors that make even random samples project to competitive feasible solutions. Equity instances lack such priors; the objective-only QUBO samples concentrate near low-risk, high-return regions that the projector then maps to feasible portfolios with ≤0.03 % regret. To isolate the QPU contribution we will add, in the revised manuscript, a direct comparison of post-processed regret obtained from QPU samples versus uniformly random binary vectors on the same equity instances, confirming that the QPU samples yield measurably better results. revision: yes
Referee: [Results section] Results section: The weakest assumption—that unconstrained objective-only samples contain high-quality feasible portfolios recoverable by the projector—is tested only on the reported equity and betting instances. The manuscript should include at least one counter-example instance where covariance eigenvalues or return vectors strongly bias toward extreme sparsity/density, to test whether the projector still recovers competitive solutions when the feasible manifold lies far from the objective minima.

Authors: We acknowledge that robustness under deliberately biased covariance structures would be informative. However, the paper’s scope is to demonstrate the structural failure of penalty encodings and the practical viability of the penalty-free pipeline on standard, realistic financial instances up to N=49. Constructing artificial counter-examples with extreme eigenvalue biases would move outside the domain of practical portfolio optimization, where objectives are calibrated to produce solutions near the target cardinality. We will add a limitations paragraph discussing the scope of the overlap assumption while preserving the central empirical claim that the penalty term, not hardware sparsity, is the dominant obstacle on current devices. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical hardware results and heuristic comparisons stand independently.

full rationale

The paper's central claim rests on direct measurements of chain-break fractions, energy values, and post-processed regret on D-Wave hardware for objective-only QUBOs, plus comparisons to a greedy heuristic. These are external benchmarks rather than reductions to fitted parameters or self-citations. The abstract and described pipeline contain no self-definitional equations, uniqueness theorems imported from prior work, or ansatzes smuggled via citation. The post-processing step is presented as a classical recovery method whose effectiveness is tested empirically on the reported instances, not assumed by construction. This is a standard self-contained experimental result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach assumes the standard mean-variance objective and relies on empirical hardware behavior rather than new parameters or postulated entities.

axioms (1)

domain assumption Mean-variance formulation captures the essential trade-off for the portfolio instances considered.
Invoked when the objective-only QUBO is built from expected returns and risk-scaled covariance.

pith-pipeline@v0.9.0 · 5860 in / 1275 out tokens · 73552 ms · 2026-05-20T12:16:23.958297+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the cardinality penalty A(1ᵀx−K)² contributes a dense rank-one matrix A11ᵀ that makes the logical interaction graph complete regardless of Σ
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

build the objective-only QUBO Q_obj = −diag(μ) + λΣ, sample on hardware, and enforce cardinality classically

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.