FBK-HUPBA Submission to the EPIC-Kitchens 2019 Action Recognition Challenge
Pith reviewed 2026-05-25 19:17 UTC · model grok-4.3
The pith
An ensemble of CNN-LSTA and HF-TSN variants achieves 35.54% top-1 accuracy on EPIC-Kitchens 2019 S1 action recognition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The FBK-HUPBA submission compiled predictions from an ensemble of CNN-LSTA and HF-TSN model variants and attained top-1 action recognition accuracies of 35.54% on the S1 setting and 20.25% on the S2 setting of the EPIC-Kitchens 2019 challenge.
What carries the argument
Ensemble compiled out of multiple CNN-LSTA and HF-TSN variants that aggregates class predictions from the two families of deep models.
If this is right
- The ensemble of the two model families produces higher accuracy than either family alone would achieve.
- CNN-LSTA variants contribute temporal modeling suited to the sequential nature of kitchen actions.
- HF-TSN variants supply spatial feature extraction that remains stable across the two evaluation settings.
- The reported scores establish a concrete performance level for any future method submitted to the same splits.
Where Pith is reading between the lines
- The results imply that mixing recurrent and two-stream temporal models can compensate for the limited field of view and motion blur typical in egocentric recordings.
- Similar ensembles could be tested on other first-person video datasets without retraining the base architectures from scratch.
- The gap between S1 and S2 performance points to sensitivity in how the models generalize across different participants or environments.
Load-bearing premise
The CNN-LSTA and HF-TSN variants were trained on the challenge data without leakage or overfitting and the ensemble was scored according to the official protocol.
What would settle it
An independent run of the same model variants and ensemble procedure on the hidden test sets that produces accuracy figures different from the reported 35.54% and 20.25%.
Figures
read the original abstract
In this report we describe the technical details of our submission to the EPIC-Kitchens 2019 action recognition challenge. To participate in the challenge we have developed a number of CNN-LSTA [3] and HF-TSN [2] variants, and submitted predictions from an ensemble compiled out of these two model families. Our submission, visible on the public leaderboard with team name FBK-HUPBA, achieved a top-1 action recognition accuracy of 35.54% on S1 setting, and 20.25% on S2 setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a short report on the FBK-HUPBA submission to the EPIC-Kitchens 2019 Action Recognition Challenge. It states that variants of the CNN-LSTA and HF-TSN architectures were developed and combined into an ensemble whose predictions were submitted, yielding a public-leaderboard top-1 accuracy of 35.54% on the S1 setting and 20.25% on the S2 setting.
Significance. If the reported leaderboard scores are accurate, the work documents a competitive entry on a challenging egocentric video dataset. The public verifiability of the numbers on the official leaderboard constitutes a modest strength, as it permits independent confirmation without reliance on unreleased code or internal validation splits. However, the manuscript introduces no new methodological contributions beyond the two cited base models and therefore has limited significance for advancing the broader field of action recognition.
minor comments (2)
- [Abstract] The abstract asserts that the report 'describe[s] the technical details' of the CNN-LSTA and HF-TSN variants, yet the manuscript supplies no information on architecture modifications, training protocols, hyper-parameters, or ensemble construction.
- References [2] and [3] are cited but the manuscript contains no References section or bibliographic details for these works.
Simulated Author's Rebuttal
We thank the referee for reviewing our manuscript. We address the key point raised in the significance assessment below.
read point-by-point responses
-
Referee: the manuscript introduces no new methodological contributions beyond the two cited base models and therefore has limited significance for advancing the broader field of action recognition.
Authors: We agree that the manuscript does not introduce new methodological contributions. It is explicitly a short technical report documenting the details of our EPIC-Kitchens 2019 challenge submission, including the specific CNN-LSTA and HF-TSN variants we developed and the ensemble we formed. The primary contribution is the public, verifiable leaderboard performance (35.54% top-1 on S1 and 20.25% on S2) achieved by this ensemble. Such reports serve the community by providing concrete, reproducible details on competitive approaches for this dataset without claiming methodological novelty. revision: no
Circularity Check
No significant circularity
full rationale
The manuscript is a competition report that states the public leaderboard scores achieved by an ensemble of CNN-LSTA and HF-TSN variants. No derivation chain, equations, fitted parameters presented as predictions, or self-referential uniqueness claims exist in the text. The central factual claim (35.54% S1, 20.25% S2) is externally verifiable on the EPIC-Kitchens leaderboard and does not reduce to any internal construction or self-citation load-bearing step.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
Hierarchical Feature Aggregation Networks for Video Action Recognition
S. Sudhakaran, S. Escalera, and O. Lanz. Hierarchical Feature Aggregation Networks for Video Action Recognition . arXiv preprint arXiv:1905.12462 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[3]
S. Sudhakaran, S. Escalera, and O. Lanz. LSTA: Long Short-Term Attention for Egocentric Action Recognition . In Proc. CVPR , 2019
work page 2019
-
[4]
S. Sudhakaran and O. Lanz. Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition . In Proc. BMVC , 2018
work page 2018
-
[5]
" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.