Learning Safely Without Knowing the World:COMPASS-Hedge

Emmanouil-Vasileios Vlatakis-Gkaragkounis; Luanda Cai; Ting Hu

A parameter-free online algorithm achieves optimal adversarial, gap-dependent stochastic, and baseline-safe regret at once.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.5

2026-07-13 21:04 UTC pith:OGYU7U44

load-bearing objection Abstract claims a clean first best-of-three-worlds guarantee (adversarial + gap-dependent stochastic + baseline safety) for a parameter-free full-info anytime algorithm, but with no proofs or algorithm details the claim is uncheckable. the 3 major comments →

arxiv 2603.22348 v4 pith:OGYU7U44 submitted 2026-03-22 cs.LG cs.GT

Learning Safely Without Knowing the World:COMPASS-Hedge

Ting Hu , Luanda Cai , Emmanouil-Vasileios Vlatakis-Gkaragkounis This is my paper

classification cs.LG cs.GT

keywords online learningregret minimizationbest-of-three-worldsfull-informationparameter-freebaseline safetyHedgeanytime algorithms

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Online learning algorithms usually face a trilemma: they can be robust to worst-case (adversarial) losses, efficient in well-behaved (stochastic) environments, or safe relative to a fixed baseline policy, but not all three at once without giving up rates or needing secret knowledge of the problem. This paper introduces COMPASS-Hedge, an anytime full-information algorithm that claims to deliver all three guarantees simultaneously, up to log factors. It matches the minimax adversarial rate, attains instance-optimal gap-dependent stochastic regret, and keeps only Õ(1) regret versus a designated baseline, while remaining completely parameter-free and ignorant of whether the world is adversarial or stochastic and of how large any gaps are. The practical promise is that a single deployable method can be both robust and efficient without hand-tuning or environment classification. If the claim holds, baseline safety no longer has to be purchased by sacrificing either worst-case robustness or stochastic efficiency.

Core claim

COMPASS-Hedge is the first full-information anytime method that simultaneously achieves, up to logarithmic factors, (i) minimax-optimal adversarial regret, (ii) instance-optimal gap-dependent stochastic regret, and (iii) Õ(1) regret relative to a designated baseline policy, while remaining parameter-free and requiring no knowledge of the environment type or gap magnitudes.

What carries the argument

A novel integration of adaptive pseudo-regret scaling, phase-based aggression, and a comparator-aware mixing strategy that together produce the three rates without any problem-dependent parameters.

Load-bearing premise

That the combination of adaptive pseudo-regret scaling, phase-based aggression, and comparator-aware mixing can be tuned without any problem-dependent parameters and still preserve all three rates at once.

What would settle it

Exhibit a full-information sequence (adversarial, stochastic, or mixed) on which COMPASS-Hedge either exceeds the minimax adversarial rate by more than log factors, fails to achieve the instance-optimal gap-dependent rate, or incurs super-constant regret against the designated baseline.

Watch this falsifier — get emailed when new claim-graph text bears on it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

Abstract claims a clean first best-of-three-worlds guarantee (adversarial + gap-dependent stochastic + baseline safety) for a parameter-free full-info anytime algorithm, but with no proofs or algorithm details the claim is uncheckable.

read the letter

The one thing to know is that the abstract asserts COMPASS-Hedge is the first full-information anytime method that simultaneously hits (up to logs) minimax adversarial regret, instance-optimal gap-dependent stochastic regret, and Õ(1) baseline regret, all without parameters or knowledge of the environment type. That combination would be a real advance if the analysis holds; best-of-both-worlds results already exist, so the novelty is specifically the third axis plus parameter-freeness in the anytime full-info setting.

What the abstract does well is state the trilemma cleanly and name the three rates without obvious circularity. The high-level ingredients—adaptive pseudo-regret scaling, phase-based aggression, and comparator-aware mixing—sound like a plausible way to stitch the pieces together, and the claim is framed against standard external-regret notions rather than fitted quantities. Credit where due: if the full paper delivers tight proofs of those three rates under a single schedule, it is a solid contribution to the online-learning literature.

The soft spot is exactly what the stress-test flags and it is load-bearing: we have only the abstract. No algorithm pseudocode, no theorem statements, no intermediate lemmas, no literature comparison beyond the claim of primacy. You cannot verify that the three rates survive simultaneous parameter-free tuning, nor that the constants and log factors are as advertised. That is not a manufactured flaw; it is simply the information we have. Circularity risk looks low from the wording, but soundness is currently unverifiable.

This is for people who work on best-of-both-worlds, safe online learning, and parameter-free Hedge variants. A reader who cares about the precise regret landscape in full information will get value once the proofs appear; casual readers will not. It deserves a serious referee rather than a desk reject—the claim is sharp enough and the setting standard enough that the community should check the math. I would send it out for review and expect the referees to demand the full analysis and a careful comparison to existing best-of-two results. Without the body we cannot go further, but the abstract alone is enough to justify that step.

Referee Report

3 major / 0 minor

Summary. The manuscript claims that COMPASS-Hedge is the first full-information anytime online learning algorithm that simultaneously achieves, up to logarithmic factors, (i) minimax-optimal adversarial regret, (ii) instance-optimal gap-dependent stochastic regret, and (iii) Õ(1) regret relative to a designated baseline policy, while remaining completely parameter-free and requiring no knowledge of the environment type or gap magnitudes. The abstract attributes this 'best-of-three-worlds' guarantee to a novel integration of adaptive pseudo-regret scaling, phase-based aggression, and comparator-aware mixing, and asserts that baseline safety need not sacrifice worst-case robustness or stochastic efficiency.

Significance. If the three simultaneous rates are correctly proved under a single parameter-free schedule, the result would close a genuine gap in the full-information best-of-both-worlds literature by adding baseline safety without rate degradation. That would be a meaningful contribution for safe online decision-making. However, only the abstract is available: no algorithm definition, theorems, intermediate lemmas, or proof sketches are supplied. Consequently the claimed significance cannot yet be verified and remains conditional on the missing technical development.

major comments (3)

The central multi-objective claim is uncheckable from the abstract alone. No algorithm pseudocode, precise regret statements, intermediate lemmas, or proof sketches are provided, so it is impossible to verify that adaptive pseudo-regret scaling, phase-based aggression and comparator-aware mixing can be combined under one parameter-free schedule while preserving all three rates simultaneously. This is load-bearing for every claim in the paper.
Novelty relative to prior best-of-both-worlds and safety literature cannot be assessed without the full technical development. The abstract asserts 'first' status, yet supplies neither a comparison table nor the precise rates achieved by the closest existing methods; without those objects the priority claim remains unsupported.
The abstract asserts parameter-freeness and anytime validity, but does not exhibit the schedule or the potential-function argument that would confirm the absence of hidden problem-dependent constants or knowledge of the horizon. Until those objects appear, the parameter-free claim is an assertion rather than a demonstrated property.

Circularity Check

0 steps flagged

Abstract-only review: no derivation chain or equations available to inspect for circularity.

full rationale

Only the abstract is provided; the full text, algorithm definition, lemmas, and proofs are unavailable. Circularity analysis requires quoting specific equations or self-citations that reduce a claimed prediction or first-principles result to its inputs by construction. The abstract asserts that COMPASS-Hedge is parameter-free and simultaneously achieves three standard external regret notions (minimax adversarial, gap-dependent stochastic, and baseline-relative) via a novel combination of techniques, without defining those techniques in terms of the target bounds or fitting parameters to the same data being predicted. No self-definitional loops, fitted-input-as-prediction steps, load-bearing self-citations, uniqueness theorems imported from the authors, ansatz smuggling, or renaming of known results can be exhibited from the given text. Per the hard rules, absence of inspectable derivation material yields score 0 with empty steps; residual risk that the missing analysis is circular is not evidence of circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 1 invented entities

Abstract-only; free parameters are claimed to be absent (parameter-free algorithm). Axioms are the standard full-information online learning model and the usual adversarial/stochastic environment classes. No new physical entities; the invented object is the algorithm and its mixing rule. All entries are inferred from the abstract wording.

axioms (3)

domain assumption Full-information feedback model (learner observes the entire loss vector each round).
The claimed rates are stated for the full-information setting; partial-feedback would change the minimax rates.
domain assumption Standard adversarial and stochastic online learning environments with a fixed designated baseline policy.
The three-world guarantees presuppose these classical environment classes and a known baseline comparator.
standard math Regret analysis may hide only logarithmic factors and may use phase-based arguments.
Common in anytime adaptive online learning; the abstract explicitly allows log factors.

invented entities (1)

COMPASS-Hedge algorithm (adaptive pseudo-regret scaling + phase-based aggression + comparator-aware mixing) no independent evidence
purpose: Simultaneously achieve the three regret targets without environment-type or gap knowledge.
The algorithm and its three named mechanisms are the paper's central construction; independent evidence would be the (unseen) regret theorems and any later empirical checks.

pith-pipeline@v1.1.0-grok45 · 6142 in / 2420 out tokens · 29233 ms · 2026-07-13T21:04:10.293847+00:00 · methodology

0 comments

read the original abstract

Online learning algorithms often face a fundamental trilemma: balancing regret guarantees between adversarial and stochastic settings and providing baseline safety against a fixed comparator. While existing methods excel in one or two of these regimes, they typically fail to unify all three without sacrificing optimal rates or requiring oracle access to problem-dependent parameters. In this work, we bridge this gap by introducing COMPASS-Hedge. To the best of our knowledge, our algorithm is the first full-information anytime method to simultaneously achieve, up to logarithmic factors: i) minimax-optimal regret in adversarial environments; ii) instance-optimal, gap-dependent regret in stochastic environments; and iii) $\tilde{\mathcal{O}}(1)$ regret relative to a designated baseline policy. Crucially, COMPASS-Hedge is parameter-free and requires no prior knowledge of the environment's nature or the magnitude of the stochastic suboptimality gaps. Our approach hinges on a novel integration of adaptive pseudo-regret scaling and phase-based aggression, coupled with a comparator-aware mixing strategy. To the best of our knowledge, this provides the first "best-of-three-world" guarantee in the full-information setting, establishing that baseline safety does not have to come at the cost of worst-case robustness or stochastic efficiency.