Cosine similarity-based adversarial process

Ha-Jin Yu; Hee-Soo Heo; Hye-Jin Shim; IL-Ho Yang; Jee-weon Jung

arxiv: 1907.00542 · v1 · pith:XYMSM3U2new · submitted 2019-07-01 · 💻 cs.LG · eess.AS· eess.IV· stat.ML

Cosine similarity-based adversarial process

Hee-Soo Heo , Jee-weon Jung , Hye-Jin Shim , IL-Ho Yang , Ha-Jin Yu This is my paper

Pith reviewed 2026-05-25 11:48 UTC · model grok-4.3

classification 💻 cs.LG eess.ASeess.IVstat.ML

keywords adversarial trainingcosine similarityspeaker identificationimage recognitionsubsidiary taskorthogonal featuresrobust models

0 comments

The pith

Cosine similarity in an adversarial process degrades subsidiary model performance more efficiently than cross-entropy by searching orthogonal feature space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an adversarial training framework between a primary discriminative model and a subsidiary one, where the goal is to remove unwanted subsidiary information such as channel or domain effects from the input features. Conventional methods maximize inverted categorical cross entropy to degrade the subsidiary model, but experiments show this does not reliably reduce subsidiary performance. The proposed alternative uses cosine similarity to drive the primary model toward features orthogonal to the subsidiary task. On speaker identification and image recognition tasks, this produces subsidiary outputs independent of the input and raises primary model accuracy.

Core claim

The proposed adversarial process using cosine similarity degrades the performance of the subsidiary model more efficiently than maximizing categorical cross entropy by searching feature space orthogonal to the subsidiary model, making subsidiary outputs independent of the input and improving primary model performance.

What carries the argument

Cosine similarity objective that replaces inverted categorical cross entropy to enforce orthogonality between primary features and subsidiary task directions.

If this is right

Subsidiary model outputs become independent of the input features.
Primary model accuracy increases on both speaker identification and image recognition.
The cosine approach succeeds in cases where maximizing cross entropy leaves subsidiary performance intact.
The same process applies across audio and visual identification domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The orthogonality mechanism could be tested on other multi-task setups where one task acts as unwanted interference.
Measuring the angle between feature gradients of the two models would give a direct diagnostic of whether orthogonality was achieved.

Load-bearing premise

Removing subsidiary information such as channel or domain effects from the input will improve accuracy on the primary identification task.

What would settle it

An experiment on speaker identification in which subsidiary model accuracy on channel identification remains high after cosine adversarial training yet primary accuracy still fails to rise.

read the original abstract

An adversarial process between two deep neural networks is a promising approach to train a robust model. In this paper, we propose an adversarial process using cosine similarity, whereas conventional adversarial processes are based on inverted categorical cross entropy (CCE). When used for training an identification model, the adversarial process induces the competition of two discriminative models; one for a primary task such as speaker identification or image recognition, the other one for a subsidiary task such as channel identification or domain identification. In particular, the adversarial process degrades the performance of the subsidiary model by eliminating the subsidiary information in the input which, in assumption, may degrade the performance of the primary model. The conventional adversarial processes maximize the CCE of the subsidiary model to degrade the performance. We have studied a framework for training robust discriminative models by eliminating channel or domain information (subsidiary information) by applying such an adversarial process. However, we found through experiments that using the process of maximizing the CCE does not guarantee the performance degradation of the subsidiary model. In the proposed adversarial process using cosine similarity, on the contrary, the performance of the subsidiary model can be degraded more efficiently by searching feature space orthogonal to the subsidiary model. The experiments on speaker identification and image recognition show that we found features that make the outputs of the subsidiary models independent of the input, and the performances of the primary models are improved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper swaps CCE for cosine similarity in adversarial training to enforce orthogonality and remove subsidiary domain info, but the key assumption that this removal helps the primary task stays unexamined.

read the letter

The main takeaway is that this work replaces the usual inverted categorical cross-entropy in an adversarial loop with cosine similarity. The goal is to push primary-task features into a space orthogonal to a subsidiary classifier (channel or domain ID), so the subsidiary outputs become independent of the input and the primary model improves on speaker ID and image recognition tasks. That specific loss choice is the concrete change they introduce over prior adversarial setups. It does address a recurring practical headache where recording conditions or image domains leak into the features and hurt identification accuracy. The authors note from their runs that standard CCE maximization often fails to degrade the subsidiary model reliably, while the cosine version appears to do so more consistently by directly targeting orthogonality. That observation is useful if it holds. The soft spot is the load-bearing assumption that subsidiary information is always harmful or irrelevant to the primary task. The abstract flags this only as an assumption and does not test the cases where those cues might actually carry useful signal for the main labels. If the subsidiary factors correlate with the target, forcing orthogonality could discard signal rather than noise. The provided summary also gives no numbers, error bars, or exact protocol for measuring independence, so the strength of the reported gains is hard to judge from the abstract alone. The underlying math is simple and does not appear circular. This is aimed at people building robust identification systems in speech or vision who already use adversarial domain adaptation. A reader who needs a new loss variant for stripping channel effects might extract a practical idea, provided the full experiments include proper controls and ablation. It is coherent enough on its own terms to warrant referee time, even if the assumption needs more scrutiny in revision.

Referee Report

2 major / 0 minor

Summary. The paper proposes an adversarial training framework between a primary discriminative model (speaker ID or image recognition) and a subsidiary model (channel or domain ID). It claims that maximizing categorical cross-entropy fails to reliably degrade subsidiary performance, whereas an adversarial process based on cosine similarity enforces orthogonality in feature space, making subsidiary outputs independent of the input and thereby improving primary-model accuracy by removing subsidiary information.

Significance. If the orthogonality mechanism can be shown to remove only harmful subsidiary cues without discarding useful signal for the primary task, the approach could provide a more stable alternative to CCE-based adversarial training for domain-invariant or channel-robust models. The manuscript identifies a plausible failure mode of standard methods but supplies no quantitative evidence, error bars, or ablation studies, so the practical significance cannot yet be assessed.

major comments (2)

[Abstract] Abstract: the claim that CCE maximization 'does not guarantee the performance degradation of the subsidiary model' is asserted without any tables, figures, quantitative metrics, or experimental protocol showing this failure; the soundness assessment notes the complete absence of such supporting data.
[Abstract] Abstract: the central premise that subsidiary information 'may degrade the performance of the primary model' is introduced only 'in assumption' with no derivation, correlation analysis, or controlled experiment testing when removal helps versus harms the primary task (e.g., when subsidiary cues are correlated with primary labels).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, agreeing where the abstract requires strengthening and proposing targeted revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that CCE maximization 'does not guarantee the performance degradation of the subsidiary model' is asserted without any tables, figures, quantitative metrics, or experimental protocol showing this failure; the soundness assessment notes the complete absence of such supporting data.

Authors: The abstract states that the observation was made 'through experiments,' and the manuscript body reports the corresponding results. We agree, however, that the abstract itself provides no direct quantitative support or pointer to the evidence. We will revise the abstract to include a concise reference to the key experimental observation (e.g., subsidiary accuracy remaining near chance under CCE) and a citation to the relevant figure or table, thereby making the claim traceable within the abstract. revision: yes
Referee: [Abstract] Abstract: the central premise that subsidiary information 'may degrade the performance of the primary model' is introduced only 'in assumption' with no derivation, correlation analysis, or controlled experiment testing when removal helps versus harms the primary task (e.g., when subsidiary cues are correlated with primary labels).

Authors: The wording 'in assumption' was chosen precisely to flag this as a motivating hypothesis rather than an established result. The paper's empirical contribution is the demonstration that the cosine-similarity process improves primary-task accuracy on the evaluated speaker-ID and image-recognition tasks. We nevertheless accept that a dedicated analysis of when subsidiary cues are harmful versus neutral or beneficial is absent. We will add a short discussion paragraph (with any available label-subsidiary correlation statistics from the datasets) in the introduction or experimental section of the revision. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical claims rest on independent experimental validation

full rationale

The paper defines a cosine-similarity adversarial process to enforce orthogonality between primary and subsidiary feature spaces, then reports experimental outcomes on speaker ID and image tasks showing degraded subsidiary performance and improved primary accuracy. No equations, fitted parameters, or derivations are shown that reduce the claimed improvement to a quantity defined by the method itself. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The central premise is presented as an assumption tested via observation rather than derived by construction from its inputs, satisfying the criteria for a self-contained, non-circular derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, new entities, or non-standard axioms are stated. The central claim rests on the domain assumption that subsidiary information can be isolated and removed to benefit the primary task.

axioms (2)

standard math Standard assumptions of gradient-based optimization in deep neural networks.
Implicit in any description of adversarial training between two networks.
domain assumption Subsidiary information (channel/domain) can be separated from primary task information in the learned features.
Stated explicitly in the abstract as the operating assumption of the adversarial process.

pith-pipeline@v0.9.0 · 5792 in / 1384 out tokens · 53532 ms · 2026-05-25T11:48:17.403980+00:00 · methodology

Cosine similarity-based adversarial process

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)