Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

Hossein Zeinali; Jan "Honza'' \v{C}ernock\'y; Luk\'a\v{s} Burget

arxiv: 1907.07127 · v1 · pith:CHCXDEVInew · submitted 2019-07-13 · 📡 eess.AS · cs.SD

Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

Hossein Zeinali , Luk\'a\v{s} Burget , Jan "Honza'' \v{C}ernock\'y This is my paper

Pith reviewed 2026-05-24 21:45 UTC · model grok-4.3

classification 📡 eess.AS cs.SD

keywords acoustic scene classificationconvolutional neural networksself-attention poolingDCASE 2019network fusionlog Mel-spectrogramVGGx-vector

0 comments

The pith

Fusion of three attentive CNN topologies improves acoustic scene classification on the DCASE 2019 test set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a system for acoustic scene classification that combines three convolutional neural network architectures. A VGG-like 2D CNN, a Light-CNN using max-feature-map activation, and a 1D x-vector network each process 256-dimensional log Mel-spectrograms and apply self-attention for statistic pooling. Networks are trained separately on a 4-fold evaluation setup, then combined using multiple fusion strategies. The resulting submissions are entered in Task 1 of the DCASE 2019 challenge. The central idea is that the different topologies capture complementary information when attention pooling is used in each.

Core claim

The authors establish that fusing outputs from a VGG-like 2D CNN, an LCNN with max-feature-map activation, and an x-vector 1D CNN, each equipped with self-attention pooling and trained on 4-fold splits of log Mel-spectrogram features, produces the submitted systems for acoustic scene classification.

What carries the argument

Self-attention mechanism for statistic pooling applied inside each of the three CNN topologies (VGG-like, LCNN, x-vector) before fusion.

If this is right

Different CNN topologies remain complementary even after each applies self-attention pooling.
4-fold training allows multiple models to be combined without additional labeled data.
Fusion at the score level can raise accuracy on acoustic scenes not seen during training.
Log Mel-spectrograms of 256 dimensions serve as sufficient input for all three topologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Attention pooling may reduce the need for hand-crafted segment-level features in audio classification.
The same fusion approach could be tested on related tasks such as sound event detection.
Results on the DCASE test set would indicate whether the complementarity holds for real-world recording variations.

Load-bearing premise

The three CNN topologies supply sufficiently complementary information when each uses self-attention pooling so that their fusion improves accuracy on the unseen DCASE test set.

What would settle it

An experiment that trains the three networks on the same 4-fold setup, fuses them, and shows no accuracy gain over the single best network on the official DCASE 2019 test set.

read the original abstract

In this report, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2019 challenge are described. Also, the analysis of different methods is provided. The proposed approach is a fusion of three different Convolutional Neural Network (CNN) topologies. The first one is a VGG like two-dimensional CNNs. The second one is again a two-dimensional CNN network which uses Max-Feature-Map activation and called Light-CNN (LCNN). The third network is a one-dimensional CNN which mainly used for speaker verification and called x-vector topology. All proposed networks use self-attention mechanism for statistic pooling. As a feature, we use a 256-dimensional log Mel-spectrogram. Our submissions are a fusion of several networks trained on 4-folds generated evaluation setup using different fusion strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a routine DCASE technical report describing a fusion of existing CNNs with attention for acoustic scene classification, with no new methods or results.

read the letter

The main takeaway is that this is a DCASE challenge technical report describing the Brno team's three submissions for acoustic scene classification. They fuse a VGG-like CNN, a Light-CNN, and an x-vector network, all using self-attention pooling on log-Mel spectrograms, trained via 4-fold splits of the development set. The paper does a good job laying out the exact topologies and training details in plain terms. Adapting the x-vector model, typically used for speakers, to scene classification is a reasonable engineering choice, and the consistent use of self-attention across models makes the fusion setup easy to follow. The description of the different fusion strategies is also direct. Beyond that, there is little new. These are all established networks from prior work in images and speaker verification. No new math or architecture is derived. The report stops at describing what was submitted without showing results or breakdowns, which is common for these documents but means readers get limited insight into what actually drove any performance differences. The 4-fold setup on development data does create some overlap between training and evaluation choices, but the paper makes no claims beyond 'this is what we did,' so it does not overreach. This kind of report is mainly useful to other teams in the same challenge who are looking for implementation ideas. It is not the sort of work that calls for peer review in a journal. I would not bring it to a reading group and would not cite it.

Referee Report

0 major / 2 minor

Summary. The manuscript describes the Brno University of Technology (BUT) team's submissions to DCASE 2019 Task 1 (Acoustic Scene Classification). The central approach is a fusion of three CNN topologies—a VGG-like 2D CNN, a Light-CNN (LCNN) using Max-Feature-Map activation, and an x-vector 1D CNN—each employing self-attention for statistic pooling on 256-dimensional log-Mel spectrograms. All networks are trained with a 4-fold cross-validation split of the development data, and the submissions apply different fusion strategies; the text also states that an analysis of the methods is provided.

Significance. As a challenge technical report the work supplies a concrete engineering description of system configurations that participated in DCASE 2019. The uniform application of self-attention pooling across architecturally diverse backbones is a coherent design choice that may promote complementarity. Its primary value lies in documenting reproducible training and fusion protocols for the shared task rather than in establishing new methodological claims with quantified gains.

minor comments (2)

[Abstract] Abstract: the statement that 'the analysis of different methods is provided' is not accompanied by any reference to specific sections, tables, or quantitative comparisons (e.g., per-network accuracies or fusion gains), making it impossible to locate the promised analysis.
The manuscript does not enumerate the exact fusion strategies (score averaging, learned weights, etc.) or the number of component networks per submission, which are central to reproducing the described systems.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the review of our DCASE 2019 Task 1 technical report and for the recommendation of minor revision. The report provides a concise summary of our fusion approach but lists no specific major comments requiring point-by-point rebuttal.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a descriptive DCASE 2019 technical report detailing submitted systems (fusions of VGG-like, LCNN and x-vector networks with self-attention pooling on log-Mel features, trained via 4-fold splits of development data). No derivation chain, first-principles result, or mathematical prediction is claimed; the text is limited to configuration description and fusion strategies. Standard challenge practice of developing on dev folds does not constitute self-definitional or fitted-input circularity under the enumerated patterns, as no internal reduction of a claimed output to its own inputs occurs. The external challenge test set provides an independent benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, or invented entities are described in the abstract; the work is an empirical system description relying on standard neural-network training assumptions.

pith-pipeline@v0.9.0 · 5701 in / 1114 out tokens · 24296 ms · 2026-05-24T21:45:00.343737+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The proposed approach is a fusion of three different Convolutional Neural Network (CNN) topologies... All proposed networks use self-attention mechanism for statistic pooling. As a feature, we use a 256-dimensional log Mel-spectrogram.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our submissions are a fusion of several networks trained on 4-folds generated evaluation setup using different fusion strategies.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 4 internal anchors

[1]

We proposed three different deep neural network topologies for this task

INTRODUCTION This report describes Brno University of Technology (BUT) t eam submissions for the ASC challenge of DCASE 2019. We proposed three different deep neural network topologies for this task. The ﬁrst one is a VGG like [1] two-dimensional CNN network for process - ing audio segments. The second network is again a 2-dimensio nal CNN network which c...

work page 2019
[2]

The dataset consists of recordings from 10 scene c lasses and was collected in 12 large European cities and in differen t en- vironments in each city

DATASET In this challenge, an enhanced version of ASC dataset was used [11]. The dataset consists of recordings from 10 scene c lasses and was collected in 12 large European cities and in differen t en- vironments in each city. The development set of the dataset f or task1a consists of 1440 segments for each acoustic scene and in to- tal 40 hours of audio...

work page
[3]

Features The log Mel-scale spectrogram was used as a feature in this ch al- lenge

DATA PROCESSING 3.1. Features The log Mel-scale spectrogram was used as a feature in this ch al- lenge. For extracting the features, ﬁrst, we converted the a udio to a mono-channel and removed the amplitude bias by subtract - ing the audio segment’s mean from the signal. Then short time Fourier transform is computed on 2048 samples Hamming win- dowed fram...

work page 2048
[4]

The ﬁrst one is a VGG like two-dimensional CNN

CNN TOPOLOGIES We have used three different CNN topologies for this challen ge. The ﬁrst one is a VGG like two-dimensional CNN. The second topology is an enhanced version of Light-CNN (LCNN) which used Max-Feature-Map (MFM) as an additional non-linearity. MFM re- duces the number of kernels to half. As a result, the ﬁnal netw ork has fewer parameters and ...

work page 2018
[5]

SYSTEMS AND FUSION In this challenge, we fused outputs of different networks to obtain the ﬁnal results. First, we made a 4-folds cross-validation setup us- ing the whole development data in addition to the ofﬁcial pro vided setup (we only use the ofﬁcial validation set for report resu lts here and for the ﬁnal system we only used the generated folds). By...

work page
[6]

EXPERIMENTS AND RESULTS 6.1. Experimental Setups Similar to the baseline system provided by the organizers, o ur net- works training was performed by optimizing the categorical cross- entropy using Adam optimizer [15]. The initial learning rat e was set to 0.001 and the network training was early-stopped if the validation loss did not decrease for more th...

work page
[7]

Diff erent systems were designed for this challenge and the ﬁnal system s were fusions of the output scores from the individual system

CONCLUSIONS We have described the systems submitted by BUT team to Acous- tic Scene Classiﬁcation (ASC) challenge of DCASE2019. Diff erent systems were designed for this challenge and the ﬁnal system s were fusions of the output scores from the individual system. A tr ained fusion as well as a majority vote fusion were used for the ﬁnal sys- tem. The prop...

work page
[8]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “V ery deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[9]

A light CNN for deep face representation with noisy labels,

X. Wu, R. He, Z. Sun, and T. Tan, “A light CNN for deep face representation with noisy labels,” IEEE Transactions on Information F orensics and Security, vol. 13, no. 11, pp. 2884– 2896, 2018

work page 2018
[10]

Audio replay at- tack detection with deep learning frameworks,

G. Lavrentyeva, S. Novoselov, E. Malykh, A. Kozlov, O. Kudashev, and V . Shchemelinin, “Audio replay at- tack detection with deep learning frameworks,” in Proc. Interspeech 2017 , 2017, pp. 82–86. [Online]. Available: http://dx.doi.org/10.21437/Interspeech.2017-360

work page doi:10.21437/interspeech.2017-360 2017
[11]

Detecting spooﬁng at - tacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge,

H. Zeinali, T. Stafylakis, G. Athanasopoulou, J. Rohdin , I. Gkinis, L. Burget, and J. Cernocky, “Detecting spooﬁng at - tacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge,” in Proc. Interspeech 2019, 2019

work page 2019
[12]

X-vectors: Robust DNN embeddings for speaker recognition,

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khu - danpur, “X-vectors: Robust DNN embeddings for speaker recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2018, pp. 5329–5333

work page 2018
[13]

How to improve your speaker embeddings extrac- tor in generic toolkits,

H. Zeinali, L. Burget, J. Rohdin, T. Stafylakis, and J. H. Cer- nocky, “How to improve your speaker embeddings extrac- tor in generic toolkits,” in ICASSP 2019-2019 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 6141–6145

work page 2019
[14]

Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

H. Zeinali, L. Burget, and J. Cernocky, “Convolutional neural networks and x-vector embedding for DCASE2018 acoustic scene classiﬁcation challenge,” arXiv preprint arXiv:1810.04273, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Self- attentive speaker embeddings for text-independent speake r veriﬁcation,

Y . Zhu, T. Ko, D. Snyder, B. Mak, and D. Povey, “Self- attentive speaker embeddings for text-independent speake r veriﬁcation,” Proc. Interspeech 2018, pp. 3573–3577, 2018

work page 2018
[16]

Attentive Statistics Pooling for Deep Speaker Embedding

K. Okabe, T. Koshinaka, and K. Shinoda, “Attentive stati s- tics pooling for deep speaker embedding,” arXiv preprint arXiv:1803.10963, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Attention-based models for text-dependent speaker veriﬁ ca- tion,

F. R. rahman Chowdhury, Q. Wang, I. L. Moreno, and L. Wan, “Attention-based models for text-dependent speaker veriﬁ ca- tion,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2018, pp. 5359–5363

work page 2018
[18]

A multi-devi ce dataset for urban acoustic scene classiﬁcation,

A. Mesaros, T. Heittola, and T. Virtanen, “A multi-devi ce dataset for urban acoustic scene classiﬁcation,” in IEEE AASP Challenge on Detection and Classiﬁcation of Acoustic Scene s and Events (DCASE) , 2018

work page 2018
[19]

librosa: Audio and music sig- nal analysis in python,

B. McFee, C. Raffel, D. Liang, D. P . Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music sig- nal analysis in python,” in Proceedings of the 14th python in science conference, 2015, pp. 18–25. Detection and Classiﬁcation of Acoustic Scenes and Events 2 019 Challenge

work page 2015
[20]

Acoustic scene classiﬁcation with fully convolutional neural networks and I-vectors,

M. Dorfer, B. Lehner, H. Eghbal-zadeh, H. Christop, P . Fabian, and W. Gerhard, “Acoustic scene classiﬁcation with fully convolutional neural networks and I-vectors,” DCASE2018 Challenge, Tech. Rep., September 2018

work page 2018
[21]

FoCal multi-class: Toolkit for evaluati on, fu- sion and calibration of multi-class recognition scorestut orial and user manual,

N. Br¨ ummer, “FoCal multi-class: Toolkit for evaluati on, fu- sion and calibration of multi-class recognition scorestut orial and user manual,” Software available at http://sites. google. com/site/nikobrummer/focalmulticlass, 2007

work page 2007
[22]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochastic op- timization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[1] [1]

We proposed three different deep neural network topologies for this task

INTRODUCTION This report describes Brno University of Technology (BUT) t eam submissions for the ASC challenge of DCASE 2019. We proposed three different deep neural network topologies for this task. The ﬁrst one is a VGG like [1] two-dimensional CNN network for process - ing audio segments. The second network is again a 2-dimensio nal CNN network which c...

work page 2019

[2] [2]

The dataset consists of recordings from 10 scene c lasses and was collected in 12 large European cities and in differen t en- vironments in each city

DATASET In this challenge, an enhanced version of ASC dataset was used [11]. The dataset consists of recordings from 10 scene c lasses and was collected in 12 large European cities and in differen t en- vironments in each city. The development set of the dataset f or task1a consists of 1440 segments for each acoustic scene and in to- tal 40 hours of audio...

work page

[3] [3]

Features The log Mel-scale spectrogram was used as a feature in this ch al- lenge

DATA PROCESSING 3.1. Features The log Mel-scale spectrogram was used as a feature in this ch al- lenge. For extracting the features, ﬁrst, we converted the a udio to a mono-channel and removed the amplitude bias by subtract - ing the audio segment’s mean from the signal. Then short time Fourier transform is computed on 2048 samples Hamming win- dowed fram...

work page 2048

[4] [4]

The ﬁrst one is a VGG like two-dimensional CNN

CNN TOPOLOGIES We have used three different CNN topologies for this challen ge. The ﬁrst one is a VGG like two-dimensional CNN. The second topology is an enhanced version of Light-CNN (LCNN) which used Max-Feature-Map (MFM) as an additional non-linearity. MFM re- duces the number of kernels to half. As a result, the ﬁnal netw ork has fewer parameters and ...

work page 2018

[5] [5]

SYSTEMS AND FUSION In this challenge, we fused outputs of different networks to obtain the ﬁnal results. First, we made a 4-folds cross-validation setup us- ing the whole development data in addition to the ofﬁcial pro vided setup (we only use the ofﬁcial validation set for report resu lts here and for the ﬁnal system we only used the generated folds). By...

work page

[6] [6]

EXPERIMENTS AND RESULTS 6.1. Experimental Setups Similar to the baseline system provided by the organizers, o ur net- works training was performed by optimizing the categorical cross- entropy using Adam optimizer [15]. The initial learning rat e was set to 0.001 and the network training was early-stopped if the validation loss did not decrease for more th...

work page

[7] [7]

Diff erent systems were designed for this challenge and the ﬁnal system s were fusions of the output scores from the individual system

CONCLUSIONS We have described the systems submitted by BUT team to Acous- tic Scene Classiﬁcation (ASC) challenge of DCASE2019. Diff erent systems were designed for this challenge and the ﬁnal system s were fusions of the output scores from the individual system. A tr ained fusion as well as a majority vote fusion were used for the ﬁnal sys- tem. The prop...

work page

[8] [8]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “V ery deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[9] [9]

A light CNN for deep face representation with noisy labels,

X. Wu, R. He, Z. Sun, and T. Tan, “A light CNN for deep face representation with noisy labels,” IEEE Transactions on Information F orensics and Security, vol. 13, no. 11, pp. 2884– 2896, 2018

work page 2018

[10] [10]

Audio replay at- tack detection with deep learning frameworks,

G. Lavrentyeva, S. Novoselov, E. Malykh, A. Kozlov, O. Kudashev, and V . Shchemelinin, “Audio replay at- tack detection with deep learning frameworks,” in Proc. Interspeech 2017 , 2017, pp. 82–86. [Online]. Available: http://dx.doi.org/10.21437/Interspeech.2017-360

work page doi:10.21437/interspeech.2017-360 2017

[11] [11]

Detecting spooﬁng at - tacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge,

H. Zeinali, T. Stafylakis, G. Athanasopoulou, J. Rohdin , I. Gkinis, L. Burget, and J. Cernocky, “Detecting spooﬁng at - tacks using vgg and sincnet: but-omilia submission to asvspoof 2019 challenge,” in Proc. Interspeech 2019, 2019

work page 2019

[12] [12]

X-vectors: Robust DNN embeddings for speaker recognition,

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khu - danpur, “X-vectors: Robust DNN embeddings for speaker recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2018, pp. 5329–5333

work page 2018

[13] [13]

How to improve your speaker embeddings extrac- tor in generic toolkits,

H. Zeinali, L. Burget, J. Rohdin, T. Stafylakis, and J. H. Cer- nocky, “How to improve your speaker embeddings extrac- tor in generic toolkits,” in ICASSP 2019-2019 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 6141–6145

work page 2019

[14] [14]

Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

H. Zeinali, L. Burget, and J. Cernocky, “Convolutional neural networks and x-vector embedding for DCASE2018 acoustic scene classiﬁcation challenge,” arXiv preprint arXiv:1810.04273, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Self- attentive speaker embeddings for text-independent speake r veriﬁcation,

Y . Zhu, T. Ko, D. Snyder, B. Mak, and D. Povey, “Self- attentive speaker embeddings for text-independent speake r veriﬁcation,” Proc. Interspeech 2018, pp. 3573–3577, 2018

work page 2018

[16] [16]

Attentive Statistics Pooling for Deep Speaker Embedding

K. Okabe, T. Koshinaka, and K. Shinoda, “Attentive stati s- tics pooling for deep speaker embedding,” arXiv preprint arXiv:1803.10963, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Attention-based models for text-dependent speaker veriﬁ ca- tion,

F. R. rahman Chowdhury, Q. Wang, I. L. Moreno, and L. Wan, “Attention-based models for text-dependent speaker veriﬁ ca- tion,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2018, pp. 5359–5363

work page 2018

[18] [18]

A multi-devi ce dataset for urban acoustic scene classiﬁcation,

A. Mesaros, T. Heittola, and T. Virtanen, “A multi-devi ce dataset for urban acoustic scene classiﬁcation,” in IEEE AASP Challenge on Detection and Classiﬁcation of Acoustic Scene s and Events (DCASE) , 2018

work page 2018

[19] [19]

librosa: Audio and music sig- nal analysis in python,

B. McFee, C. Raffel, D. Liang, D. P . Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music sig- nal analysis in python,” in Proceedings of the 14th python in science conference, 2015, pp. 18–25. Detection and Classiﬁcation of Acoustic Scenes and Events 2 019 Challenge

work page 2015

[20] [20]

Acoustic scene classiﬁcation with fully convolutional neural networks and I-vectors,

M. Dorfer, B. Lehner, H. Eghbal-zadeh, H. Christop, P . Fabian, and W. Gerhard, “Acoustic scene classiﬁcation with fully convolutional neural networks and I-vectors,” DCASE2018 Challenge, Tech. Rep., September 2018

work page 2018

[21] [21]

FoCal multi-class: Toolkit for evaluati on, fu- sion and calibration of multi-class recognition scorestut orial and user manual,

N. Br¨ ummer, “FoCal multi-class: Toolkit for evaluati on, fu- sion and calibration of multi-class recognition scorestut orial and user manual,” Software available at http://sites. google. com/site/nikobrummer/focalmulticlass, 2007

work page 2007

[22] [22]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochastic op- timization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014