pith. sign in
Pith Number

pith:T353RQIG

pith:2026:T353RQIGVTH4WTECFG3AYW33N6
not attested not anchored not stored refs resolved

Robust Audio Tagging under Class-wise Supervision Unreliability

Jian Guan, Qiaoqiao Ren, Stephen Roberts, Tong Ye, Wenwu Wang, Yuanbo Hou, Zhaoyi Liu

Learning one unreliability scalar per sound class down-weights noisy labels and improves audio tagging on weak data.

arxiv:2605.17512 v1 · 2026-05-17 · eess.AS · cs.SD

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{T353RQIGVTH4WTECFG3AYW33N6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

explicit class-wise modeling of supervision unreliability is an effective and practical strategy for robust audio tagging under large-scale weakly labeled training

C2weakest assumption

that a single scalar unreliability parameter per class is sufficient to capture and correct the combined effects of spurious additions, misassignments between similar classes, and weakened label evidence without introducing new biases or requiring architecture changes

C3one line summary

CSU learns per-class unreliability parameters to reduce class-dependent supervision bias from spurious, misassigned, or weak labels in audio tagging, with gains shown on AudioSet and a new ESC-FreeGen50 benchmark mixing real and generated audio.

References

46 extracted · 46 resolved · 0 Pith anchors

[1] J. F. Gemmeke, D. P. Ellis, D. Freedman, et al., AudioSet: An ontology and human-labeled dataset for audio events, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, 2017
[2] E. Fonseca, J. Pons, X. Favory, F. Font, D. Bogdanov, et al., FSD50K: an 30 open dataset of human-labeled sound events, IEEE/ACM Transactions on Au- dio, Speech, and Language Processing 30 (2022) 829– 2022
[3] Y . Hou, Q. Ren, A. Mitchell, W. Wang, J. Kang, T. Belpaeme, D. Botteldooren, Soundscape captioning using sound affective quality network and large language model, IEEE Transactions on Multimedia 28 ( 2026
[4] E. Fonseca, M. Plakal, F. Font, D. P. Ellis, X. Serra, Audio tagging with noisy labels and minimal supervision, in: IEEE AASP DCASE 2019, 2019, p. 69 2019
[5] E. Fonseca, S. Hershey, M. Plakal, D. P. Ellis, et al., Addressing missing labels in large-scale sound event recognition using a teacher-student framework with loss masking, IEEE Signal Processing Let 2020

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:04:43.134776Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9efbb8c106accfcb4c8229b60c5b7b6f960a7ee9138da1d4de0e0f8a4c62d996

Aliases

arxiv: 2605.17512 · arxiv_version: 2605.17512v1 · doi: 10.48550/arxiv.2605.17512 · pith_short_12: T353RQIGVTH4 · pith_short_16: T353RQIGVTH4WTEC · pith_short_8: T353RQIG
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/T353RQIGVTH4WTECFG3AYW33N6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9efbb8c106accfcb4c8229b60c5b7b6f960a7ee9138da1d4de0e0f8a4c62d996
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "bb12281067f9a383846537b76137ed16d3c014e3be31d18a22c035d6f5865b4c",
    "cross_cats_sorted": [
      "cs.SD"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "eess.AS",
    "submitted_at": "2026-05-17T15:51:30Z",
    "title_canon_sha256": "861d534d1de2d7deb0fc46e7da41cff377c1e8e9e88ea960a7af0dceb997cca4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17512",
    "kind": "arxiv",
    "version": 1
  }
}