Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge

Annamaria Mesaros; Florian Schmid; Gerhard Widmer; Irene Mart\'in-Morat\'o; Paul Primus; Toni Heittola

arxiv: 2505.01747 · v2 · submitted 2025-05-03 · 📡 eess.AS · cs.SD

Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge

Florian Schmid , Paul Primus , Toni Heittola , Annamaria Mesaros , Irene Mart\'in-Morat\'o , Gerhard Widmer This is my paper

Pith reviewed 2026-05-22 16:38 UTC · model grok-4.3

classification 📡 eess.AS cs.SD

keywords acoustic scene classificationdevice informationlow-complexity modelsdevice mismatchDCASE challengetransfer learningbaseline systeminference-time adaptation

0 comments

The pith

Providing device information at inference enables device-specific fine-tuning that lifts baseline accuracy from 50.72% to 51.89% in low-complexity acoustic scene classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up a DCASE 2025 task for classifying acoustic scenes under low-complexity constraints while addressing device mismatch. Device identity is now supplied at inference time so that models can adapt to the specific recording hardware rather than remaining device-agnostic. The provided baseline reaches 50.72 percent accuracy without device information and improves to 51.89 percent once device-specific fine-tuning is applied. Training data is restricted to the same 25 percent subset used the previous year, making transfer learning from external sources a central strategy. Multiple teams submitted systems that exceeded the baseline, confirming that the new information can be leveraged effectively.

Core claim

The paper establishes a baseline system for acoustic scene classification that achieves 50.72 percent accuracy when operating without knowledge of the recording device. When device identity is supplied at inference time and the model is fine-tuned in a device-specific manner, accuracy rises to 51.89 percent. The task re-uses the limited training subset from 2024 with unrestricted external data allowed, and evaluation on the held-out set shows that eleven of twelve participating teams surpass the baseline while the strongest entry exceeds it by more than eight percentage points.

What carries the argument

Device-specific fine-tuning, which adapts the model parameters using knowledge of the recording device supplied at inference time.

If this is right

Real-world systems can be deployed with prior knowledge of the microphone hardware and still maintain low computational cost.
Transfer learning from external data becomes the primary route to performance when labeled training material is limited to 25 percent of the previous year's set.
Low-complexity architectures must incorporate lightweight adaptation mechanisms rather than relying solely on device-invariant features.
Future challenges can test whether similar metadata at inference improves other audio classification tasks under hardware variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same device-aware approach could be tested in sound event detection or speaker verification to measure cross-task gains from metadata.
If device identity proves useful here, comparable sensor-type labels might benefit image or video classification under varying capture hardware.
Larger gains may appear once adaptation methods move beyond simple fine-tuning to more parameter-efficient techniques.

Load-bearing premise

Supplying device identity at inference time permits adaptation that meaningfully reflects real-world hardware mismatch.

What would settle it

An experiment that supplies device labels at test time yet records no accuracy gain over the device-agnostic baseline on the official evaluation set.

Figures

Figures reproduced from arXiv: 2505.01747 by Annamaria Mesaros, Florian Schmid, Gerhard Widmer, Irene Mart\'in-Morat\'o, Paul Primus, Toni Heittola.

**Figure 1.** Figure 1: Overview of Low-Complexity Acoustic Scene Classification with Device Information. At inference time, models must operate under low-complexity constraints and handle both known (seen during training) and unknown (unseen during training) recording devices, with the device ID provided. The baseline follows a twostage training process: first, learning a general model, then adapting it to device-specific ch… view at source ↗

read the original abstract

This paper presents the Low-Complexity Acoustic Scene Classification with Device Information Task of the DCASE 2025 Challenge, along with its baseline system. Continuing the focus on low-complexity models, data efficiency, and device mismatch from previous editions (2022-2024), this year's task introduces a key change: recording device information is now provided at inference time. This enables the development of device-specific models that leverage device characteristics-reflecting real-world deployment scenarios in which a model is designed with awareness of the underlying hardware. The training set matches the 25% subset used in the corresponding DCASE 2024 challenge, with no restrictions on external data use, highlighting transfer learning as a central topic. The baseline achieves 50.72% accuracy with a device-agnostic model, improving to 51.89% when incorporating device-specific fine-tuning. The task attracted 31 submissions from 12 teams, with 11 teams outperforming the baseline. The top-performing submission achieved an accuracy gain of more than 8 percentage points over the baseline on the evaluation set.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DCASE 2025 task adds device info at inference but the 1.17 pp baseline lift lacks any variance or significance check.

read the letter

This paper sets up the DCASE 2025 low-complexity acoustic scene classification task and supplies device identity at test time so teams can train device-specific models. The baseline moves from 50.72% accuracy without that information to 51.89% with device-specific fine-tuning on the same 25% training split used last year. External data is allowed, which keeps the focus on transfer learning and low-complexity constraints from the 2022-2024 editions. The task drew 31 submissions from 12 teams and 11 of them beat the baseline, with the best entry more than 8 points higher on the evaluation set. That participation level and the straightforward agnostic-versus-specific comparison are the useful parts; they give the community a clear reference point for device mismatch in edge audio settings. The gain itself is small and the paper gives no standard deviations, multiple seeds, or statistical test, so it is unclear whether device information produces a reliable signal or whether the difference sits inside normal run-to-run noise. Model architecture and full evaluation protocol details are also light in the description, which is typical for challenge task papers but still limits how much one can judge the strength of the baseline. Readers already working on DCASE or on practical device-aware audio models will find the task definition and numbers directly usable for their own experiments. The work is organizational rather than a new derivation, yet it still deserves peer review because these benchmarks shape what the field measures next year. I would send it to referees with a request for variance estimates on the baseline.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the Low-Complexity Acoustic Scene Classification with Device Information task for the DCASE 2025 Challenge. It describes the task setup continuing prior editions' focus on low-complexity models and device mismatch, with the key change that device identity is supplied at inference time to enable device-specific adaptation. The training data is the 25% subset from DCASE 2024 with no external data restrictions. The baseline system is reported to achieve 50.72% accuracy in the device-agnostic case and 51.89% after device-specific fine-tuning. Participation details note 31 submissions from 12 teams, with the top entry exceeding the baseline by more than 8 percentage points on the evaluation set.

Significance. If the reported 1.17 pp gain from device-specific fine-tuning holds under statistical scrutiny, the work would provide a useful reference point for how explicit device information can support adaptation to hardware mismatch in real-world ASC deployments. The emphasis on low-complexity models and transfer learning continues a practically relevant thread in the DCASE series, and the observed participation indicates community interest. The top submission's larger gain also highlights the headroom for further progress.

major comments (1)

[Abstract] Abstract (baseline accuracies): The central empirical claim is that device information at inference enables effective adaptation, evidenced by the rise from 50.72% (device-agnostic) to 51.89% (device-specific fine-tuning). This 1.17 pp difference is presented without standard deviations, results from multiple random seeds, or any statistical significance test, so it is impossible to determine whether the gain exceeds typical run-to-run variability in ASC models and therefore whether the task modification produces a reliable signal.

minor comments (2)

[Abstract] Abstract: The description of the baseline provides only high-level accuracy figures; a brief statement of the model architecture (e.g., CNN variant, parameter count) and evaluation protocol (e.g., cross-validation folds, exact fine-tuning procedure) would improve immediate readability.
[Task description] Task description: The manuscript references the 25% training subset from DCASE 2024 but does not explicitly state whether the evaluation set composition or scene/device distribution matches prior years; a short comparison table would clarify continuity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the DCASE 2025 Low-Complexity Acoustic Scene Classification with Device Information task. The single major comment concerns the statistical robustness of the reported baseline improvement; we address this point directly below and will update the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract (baseline accuracies): The central empirical claim is that device information at inference enables effective adaptation, evidenced by the rise from 50.72% (device-agnostic) to 51.89% (device-specific fine-tuning). This 1.17 pp difference is presented without standard deviations, results from multiple random seeds, or any statistical significance test, so it is impossible to determine whether the gain exceeds typical run-to-run variability in ASC models and therefore whether the task modification produces a reliable signal.

Authors: We agree that the 1.17 pp improvement should be accompanied by measures of variability and a statistical test to allow readers to judge whether it exceeds typical run-to-run fluctuation. In the revised manuscript we will report baseline accuracies averaged over five independent random seeds together with standard deviations for both the device-agnostic and device-specific fine-tuning settings. We will also add a paired statistical significance test (McNemar’s test on the per-sample predictions) and state the resulting p-value. These additions will be placed in the abstract and in the experimental section describing the baseline. revision: yes

Circularity Check

0 steps flagged

Empirical baseline report with no derivations or self-referential fitting

full rationale

The paper is an empirical description of a DCASE 2025 challenge task and its baseline system. It reports measured accuracies (50.72% device-agnostic and 51.89% with device-specific fine-tuning) on held-out evaluation data with no equations, first-principles derivations, fitted parameters renamed as predictions, or mathematical claims that could reduce to their own inputs by construction. References to prior DCASE editions are contextual background rather than load-bearing self-citations justifying a uniqueness theorem or ansatz. The central results are externally falsifiable experimental measurements, not derived outputs, making the paper self-contained with no circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No theoretical derivations or new postulated entities; the paper is a challenge task definition and empirical baseline report.

pith-pipeline@v0.9.0 · 5743 in / 960 out tokens · 52850 ms · 2026-05-22T16:38:45.359572+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge

INTRODUCTION Acoustic Scene Classification (ASC) aims to identify the type of environment in which an audio recording was made, based on a short excerpt [1]. Environments are defined as a set of real-world locations, such as Metro station, Urban park , or Public square . The ASC task has a long-standing presence in the DCASE Chal- lenge, evolving through ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

The most commonly used meth- ods in 2023 and 2024 were augmentation-based methods, such as Freq-MixStyle [7,8] and device impulse response augmentation [9]

PREVIOUS EDITIONS In past editions of the task, various strategies have been pro- posed to improve generalization across different—and potentially unknown—recording devices. The most commonly used meth- ods in 2023 and 2024 were augmentation-based methods, such as Freq-MixStyle [7,8] and device impulse response augmentation [9]. Other approaches aimed to ...

work page 2023
[3]

However, this year’s setup introduces key variations to the handling of device mismatch and transfer learning

TASK SETUP As discussed in the previous section, device mismatch, low- complexity constraints, and transfer learning have been extensively studied in the context of the ASC task. However, this year’s setup introduces key variations to the handling of device mismatch and transfer learning. Regarding device mismatch, the recording de- vice ID is now provide...

work page 2022
[4]

It employs a receptive-field-regularized, factorized CNN architecture

BASELINE SYSTEM Following the 2024 edition [5], the baseline system builds on a sim- plified variant of the top-performing submission from the 2023 edi- tion [25]. It employs a receptive-field-regularized, factorized CNN architecture. Audio recordings are first resampled to 32 kHz, then converted into mel spectrograms using a 4096-point FFT with a window ...

work page 2024
[5]

CHALLENGE RESULTS The challenge results will be added after the challenge has ended

work page
[6]

Building on previous editions, we con- tinue to address challenges such as low-complexity constraints, de- vice mismatch, and data scarcity

CONCLUSION This paper presented the setup and baseline system for Task 1 of the DCASE 2025 Challenge. Building on previous editions, we con- tinue to address challenges such as low-complexity constraints, de- vice mismatch, and data scarcity. A key refinement is the provision of device information at inference time, enabling device-specific modeling. The ...

work page 2025
[7]

ACKNOWLEDGMENT The LIT AI Lab is supported by the Federal State of Upper Austria. Gerhard Widmer’s work is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme, grant agreement No 101019375 (Whither Music?)

work page 2020
[8]

Approaches to complex sound scene analysis,

E. Benetos, D. Stowell, and M. D. Plumbley, “Approaches to complex sound scene analysis,” in Cham: Springer International Publishing , 2018. 2Source Code: https://github.com/CPJKU/dcase2025 task1 baseline/tree/main Detection and Classification of Acoustic Scenes and Events 2025

work page 2018
[9]

Acoustic scene classifica- tion in DCASE 2020 challenge: Generalization across devices and low complexity solutions,

T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classifica- tion in DCASE 2020 challenge: Generalization across devices and low complexity solutions,” inDCASE Workshop, 2020

work page 2020
[10]

Low- complexity acoustic scene classification for multi-device audio: Anal- ysis of DCASE 2021 challenge systems,

I. Mart ´ın-Morat´o, T. Heittola, A. Mesaros, and T. Virtanen, “Low- complexity acoustic scene classification for multi-device audio: Anal- ysis of DCASE 2021 challenge systems,” inDCASE Workshop, 2021

work page 2021
[11]

Low-complexity acoustic scene classification in DCASE 2022 challenge,

I. Mart ´ın-Morat´o, F. Paissan, A. Ancilotto, T. Heittola, A. Mesaros, E. Farella, A. Brutti, and T. Virtanen, “Low-complexity acoustic scene classification in DCASE 2022 challenge,” inDCASE Workshop, 2022

work page 2022
[12]

Data-efficient low-complexity acoustic scene classification in the DCASE 2024 challenge,

F. Schmid, P. Primus, T. Heittola, A. Mesaros, I. Mart ´ın-Morat´o, K. Koutini, and G. Widmer, “Data-efficient low-complexity acoustic scene classification in the DCASE 2024 challenge,” in DCASE Work- shop, 2024

work page 2024
[13]

A multi-device dataset for urban acoustic scene classification,

A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” inDCASE Workshop, 2018

work page 2018
[14]

Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification,

B. Kim, S. Yang, J. Kim, H. Park, J. Lee, and S. Chang, “Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification,” inInterspeech, 2022

work page 2022
[15]

CP-JKU submission to DCASE22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,

F. Schmid, S. Masoudian, K. Koutini, and G. Widmer, “CP-JKU submission to DCASE22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,” DCASE Challenge, Tech. Rep., 2022

work page 2022
[16]

Device-robust acoustic scene classification via impulse response augmentation,

T. Morocutti, F. Schmid, K. Koutini, and G. Widmer, “Device-robust acoustic scene classification via impulse response augmentation,” in EUSIPCO, 2023

work page 2023
[17]

Ascdomain: Domain invari- ant device-adversarial isotropic knowledge distillation convolutional neural architecture,

H. Truchan, T. H. Ngo, and Z. Ahmadi, “Ascdomain: Domain invari- ant device-adversarial isotropic knowledge distillation convolutional neural architecture,” inICASSP, 2025

work page 2025
[18]

CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs,

K. Koutini, F. Henkel, H. Eghbal-zadeh, and G. Widmer, “CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs,” DCASE Challenge, Tech. Rep., 2020

work page 2020
[19]

QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,

B. Kim, S. Yang, J. Kim, and S. Chang, “QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,” DCASE Challenge, Tech. Rep., 2021

work page 2021
[20]

Hyu submis- sion for the DCASE 2022: Efficient fine-tuning method using device- aware data-random-drop for device-imbalanced acoustic scene classi- fication,

J.-H. Lee, J.-H. Choi, P. M. Byun, and J.-H. Chang, “Hyu submis- sion for the DCASE 2022: Efficient fine-tuning method using device- aware data-random-drop for device-imbalanced acoustic scene classi- fication,” DCASE Challenge, Tech. Rep., 2022

work page 2022
[21]

CPJKU submission to DCASE21: Cross-device audio scene classification with wide sparse frequency-damped CNNs,

K. Koutini, J. Schl ¨uter, and G. Widmer, “CPJKU submission to DCASE21: Cross-device audio scene classification with wide sparse frequency-damped CNNs,” DCASE Challenge, Tech. Rep., 2021

work page 2021
[22]

Data-efficient acoustic scene classification via ensemble teachers distillation and pruning,

H. Bing, H. Wen, C. Zhengyang, J. Anbai, C. Xie, F. Pingyi, L. Cheng, L. Zhiqiang, L. Jia, Z. Wei-Qiang, and Q. Yanmin, “Data-efficient acoustic scene classification via ensemble teachers distillation and pruning,” DCASE Challenge, Tech. Rep., 2024

work page 2024
[23]

A lottery ticket hy- pothesis framework for low-complexity device-robust neural acoustic scene classification,

C.-H. H. Yang, H. Hu, S. M. Siniscalchi, Q. Wang, W. Yuyang, X. Xia, Y . Zhao, Y . Wu, Y . Wang, J. Du, and C.-H. Lee, “A lottery ticket hy- pothesis framework for low-complexity device-robust neural acoustic scene classification,” DCASE Challenge, Tech. Rep., 2021

work page 2021
[24]

Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation,

J. Tan and Y . Li, “Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation,” DCASE Challenge, Tech. Rep., 2023

work page 2023
[25]

DCASE2023 task1 sub- mission: Device simulation and time-frequency separable convolu- tion for acoustic scene classification,

Y . Cai, M. Lin, C. Zhu, S. Li, and X. Shao, “DCASE2023 task1 sub- mission: Device simulation and time-frequency separable convolu- tion for acoustic scene classification,” DCASE Challenge, Tech. Rep., 2023

work page 2023
[26]

CP-JKU submission to DCASE23: Efficient acoustic scene classifi- cation with cp-mobile,

F. Schmid, T. Morocutti, S. Masoudian, K. Koutini, and G. Widmer, “CP-JKU submission to DCASE23: Efficient acoustic scene classifi- cation with cp-mobile,” DCASE Challenge, Tech. Rep., 2023

work page 2023
[27]

Low-complexity acoustic scene clas- sification with limited training data,

Y .-F. Shao, P. Jiang, and W. Li, “Low-complexity acoustic scene clas- sification with limited training data,” DCASE Challenge, Tech. Rep., 2024

work page 2024
[28]

Audio set: An ontology and human-labeled dataset for audio events,

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” inICASSP, 2017

work page 2017
[29]

DCASE2024 task1 submission: Data-efficient acoustic scene classification with self-supervised teach- ers,

Y . Cai, M. Lin, S. Li, and X. Shao, “DCASE2024 task1 submission: Data-efficient acoustic scene classification with self-supervised teach- ers,” DCASE Challenge, Tech. Rep., 2024

work page 2024
[30]

Data-efficient acoustic scene classification with pre-trained CP-Mobile,

N. David, R. Aida, and S. Patrick, “Data-efficient acoustic scene classification with pre-trained CP-Mobile,” DCASE Challenge, Tech. Rep., 2024

work page 2024
[31]

Upb-nt submission to DCASE24: Dataset pruning for targeted knowledge distillation,

A. Werning and R. Haeb-Umbach, “Upb-nt submission to DCASE24: Dataset pruning for targeted knowledge distillation,” DCASE Chal- lenge, Tech. Rep., 2024

work page 2024
[32]

Distilling the knowledge of transformers and CNNs with CP-mobile,

F. Schmid, T. Morocutti, S. Masoudian, K. Koutini, and G. Widmer, “Distilling the knowledge of transformers and CNNs with CP-mobile,” in DCASE Workshop, 2023

work page 2023

[1] [1]

Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge

INTRODUCTION Acoustic Scene Classification (ASC) aims to identify the type of environment in which an audio recording was made, based on a short excerpt [1]. Environments are defined as a set of real-world locations, such as Metro station, Urban park , or Public square . The ASC task has a long-standing presence in the DCASE Chal- lenge, evolving through ...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

The most commonly used meth- ods in 2023 and 2024 were augmentation-based methods, such as Freq-MixStyle [7,8] and device impulse response augmentation [9]

PREVIOUS EDITIONS In past editions of the task, various strategies have been pro- posed to improve generalization across different—and potentially unknown—recording devices. The most commonly used meth- ods in 2023 and 2024 were augmentation-based methods, such as Freq-MixStyle [7,8] and device impulse response augmentation [9]. Other approaches aimed to ...

work page 2023

[3] [3]

However, this year’s setup introduces key variations to the handling of device mismatch and transfer learning

TASK SETUP As discussed in the previous section, device mismatch, low- complexity constraints, and transfer learning have been extensively studied in the context of the ASC task. However, this year’s setup introduces key variations to the handling of device mismatch and transfer learning. Regarding device mismatch, the recording de- vice ID is now provide...

work page 2022

[4] [4]

It employs a receptive-field-regularized, factorized CNN architecture

BASELINE SYSTEM Following the 2024 edition [5], the baseline system builds on a sim- plified variant of the top-performing submission from the 2023 edi- tion [25]. It employs a receptive-field-regularized, factorized CNN architecture. Audio recordings are first resampled to 32 kHz, then converted into mel spectrograms using a 4096-point FFT with a window ...

work page 2024

[5] [5]

CHALLENGE RESULTS The challenge results will be added after the challenge has ended

work page

[6] [6]

Building on previous editions, we con- tinue to address challenges such as low-complexity constraints, de- vice mismatch, and data scarcity

CONCLUSION This paper presented the setup and baseline system for Task 1 of the DCASE 2025 Challenge. Building on previous editions, we con- tinue to address challenges such as low-complexity constraints, de- vice mismatch, and data scarcity. A key refinement is the provision of device information at inference time, enabling device-specific modeling. The ...

work page 2025

[7] [7]

ACKNOWLEDGMENT The LIT AI Lab is supported by the Federal State of Upper Austria. Gerhard Widmer’s work is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme, grant agreement No 101019375 (Whither Music?)

work page 2020

[8] [8]

Approaches to complex sound scene analysis,

E. Benetos, D. Stowell, and M. D. Plumbley, “Approaches to complex sound scene analysis,” in Cham: Springer International Publishing , 2018. 2Source Code: https://github.com/CPJKU/dcase2025 task1 baseline/tree/main Detection and Classification of Acoustic Scenes and Events 2025

work page 2018

[9] [9]

Acoustic scene classifica- tion in DCASE 2020 challenge: Generalization across devices and low complexity solutions,

T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classifica- tion in DCASE 2020 challenge: Generalization across devices and low complexity solutions,” inDCASE Workshop, 2020

work page 2020

[10] [10]

Low- complexity acoustic scene classification for multi-device audio: Anal- ysis of DCASE 2021 challenge systems,

I. Mart ´ın-Morat´o, T. Heittola, A. Mesaros, and T. Virtanen, “Low- complexity acoustic scene classification for multi-device audio: Anal- ysis of DCASE 2021 challenge systems,” inDCASE Workshop, 2021

work page 2021

[11] [11]

Low-complexity acoustic scene classification in DCASE 2022 challenge,

I. Mart ´ın-Morat´o, F. Paissan, A. Ancilotto, T. Heittola, A. Mesaros, E. Farella, A. Brutti, and T. Virtanen, “Low-complexity acoustic scene classification in DCASE 2022 challenge,” inDCASE Workshop, 2022

work page 2022

[12] [12]

Data-efficient low-complexity acoustic scene classification in the DCASE 2024 challenge,

F. Schmid, P. Primus, T. Heittola, A. Mesaros, I. Mart ´ın-Morat´o, K. Koutini, and G. Widmer, “Data-efficient low-complexity acoustic scene classification in the DCASE 2024 challenge,” in DCASE Work- shop, 2024

work page 2024

[13] [13]

A multi-device dataset for urban acoustic scene classification,

A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” inDCASE Workshop, 2018

work page 2018

[14] [14]

Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification,

B. Kim, S. Yang, J. Kim, H. Park, J. Lee, and S. Chang, “Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification,” inInterspeech, 2022

work page 2022

[15] [15]

CP-JKU submission to DCASE22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,

F. Schmid, S. Masoudian, K. Koutini, and G. Widmer, “CP-JKU submission to DCASE22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,” DCASE Challenge, Tech. Rep., 2022

work page 2022

[16] [16]

Device-robust acoustic scene classification via impulse response augmentation,

T. Morocutti, F. Schmid, K. Koutini, and G. Widmer, “Device-robust acoustic scene classification via impulse response augmentation,” in EUSIPCO, 2023

work page 2023

[17] [17]

Ascdomain: Domain invari- ant device-adversarial isotropic knowledge distillation convolutional neural architecture,

H. Truchan, T. H. Ngo, and Z. Ahmadi, “Ascdomain: Domain invari- ant device-adversarial isotropic knowledge distillation convolutional neural architecture,” inICASSP, 2025

work page 2025

[18] [18]

CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs,

K. Koutini, F. Henkel, H. Eghbal-zadeh, and G. Widmer, “CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs,” DCASE Challenge, Tech. Rep., 2020

work page 2020

[19] [19]

QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,

B. Kim, S. Yang, J. Kim, and S. Chang, “QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,” DCASE Challenge, Tech. Rep., 2021

work page 2021

[20] [20]

Hyu submis- sion for the DCASE 2022: Efficient fine-tuning method using device- aware data-random-drop for device-imbalanced acoustic scene classi- fication,

J.-H. Lee, J.-H. Choi, P. M. Byun, and J.-H. Chang, “Hyu submis- sion for the DCASE 2022: Efficient fine-tuning method using device- aware data-random-drop for device-imbalanced acoustic scene classi- fication,” DCASE Challenge, Tech. Rep., 2022

work page 2022

[21] [21]

CPJKU submission to DCASE21: Cross-device audio scene classification with wide sparse frequency-damped CNNs,

K. Koutini, J. Schl ¨uter, and G. Widmer, “CPJKU submission to DCASE21: Cross-device audio scene classification with wide sparse frequency-damped CNNs,” DCASE Challenge, Tech. Rep., 2021

work page 2021

[22] [22]

Data-efficient acoustic scene classification via ensemble teachers distillation and pruning,

H. Bing, H. Wen, C. Zhengyang, J. Anbai, C. Xie, F. Pingyi, L. Cheng, L. Zhiqiang, L. Jia, Z. Wei-Qiang, and Q. Yanmin, “Data-efficient acoustic scene classification via ensemble teachers distillation and pruning,” DCASE Challenge, Tech. Rep., 2024

work page 2024

[23] [23]

A lottery ticket hy- pothesis framework for low-complexity device-robust neural acoustic scene classification,

C.-H. H. Yang, H. Hu, S. M. Siniscalchi, Q. Wang, W. Yuyang, X. Xia, Y . Zhao, Y . Wu, Y . Wang, J. Du, and C.-H. Lee, “A lottery ticket hy- pothesis framework for low-complexity device-robust neural acoustic scene classification,” DCASE Challenge, Tech. Rep., 2021

work page 2021

[24] [24]

Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation,

J. Tan and Y . Li, “Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation,” DCASE Challenge, Tech. Rep., 2023

work page 2023

[25] [25]

DCASE2023 task1 sub- mission: Device simulation and time-frequency separable convolu- tion for acoustic scene classification,

Y . Cai, M. Lin, C. Zhu, S. Li, and X. Shao, “DCASE2023 task1 sub- mission: Device simulation and time-frequency separable convolu- tion for acoustic scene classification,” DCASE Challenge, Tech. Rep., 2023

work page 2023

[26] [26]

CP-JKU submission to DCASE23: Efficient acoustic scene classifi- cation with cp-mobile,

F. Schmid, T. Morocutti, S. Masoudian, K. Koutini, and G. Widmer, “CP-JKU submission to DCASE23: Efficient acoustic scene classifi- cation with cp-mobile,” DCASE Challenge, Tech. Rep., 2023

work page 2023

[27] [27]

Low-complexity acoustic scene clas- sification with limited training data,

Y .-F. Shao, P. Jiang, and W. Li, “Low-complexity acoustic scene clas- sification with limited training data,” DCASE Challenge, Tech. Rep., 2024

work page 2024

[28] [28]

Audio set: An ontology and human-labeled dataset for audio events,

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” inICASSP, 2017

work page 2017

[29] [29]

DCASE2024 task1 submission: Data-efficient acoustic scene classification with self-supervised teach- ers,

Y . Cai, M. Lin, S. Li, and X. Shao, “DCASE2024 task1 submission: Data-efficient acoustic scene classification with self-supervised teach- ers,” DCASE Challenge, Tech. Rep., 2024

work page 2024

[30] [30]

Data-efficient acoustic scene classification with pre-trained CP-Mobile,

N. David, R. Aida, and S. Patrick, “Data-efficient acoustic scene classification with pre-trained CP-Mobile,” DCASE Challenge, Tech. Rep., 2024

work page 2024

[31] [31]

Upb-nt submission to DCASE24: Dataset pruning for targeted knowledge distillation,

A. Werning and R. Haeb-Umbach, “Upb-nt submission to DCASE24: Dataset pruning for targeted knowledge distillation,” DCASE Chal- lenge, Tech. Rep., 2024

work page 2024

[32] [32]

Distilling the knowledge of transformers and CNNs with CP-mobile,

F. Schmid, T. Morocutti, S. Masoudian, K. Koutini, and G. Widmer, “Distilling the knowledge of transformers and CNNs with CP-mobile,” in DCASE Workshop, 2023

work page 2023