Deep Feature Optimization for Enhanced Fish Freshness Assessment

Nam-Thuan Trinh; Phi-Hung Hoang; Thi-Thu-Hong Phan; Van-Manh Tran

arxiv: 2510.24814 · v1 · submitted 2025-10-28 · 💻 cs.CV · cs.AI

Deep Feature Optimization for Enhanced Fish Freshness Assessment

Phi-Hung Hoang , Nam-Thuan Trinh , Van-Manh Tran , Thi-Thu-Hong Phan This is my paper

Pith reviewed 2026-05-18 03:21 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords fish freshness assessmentdeep learningcomputer visionfeature selectionimage classificationfood quality evaluationmachine learning classifiers

0 comments

The pith

A three-stage framework using deep features from vision models reaches 85.99 percent accuracy on fish freshness assessment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an automated way to evaluate fish freshness by analyzing images of fish eyes instead of relying on human judgment. It fine-tunes several image recognition models to pull out visual patterns, feeds those patterns into standard machine learning classifiers, and then uses selection techniques to keep only the most useful patterns. The strongest combination of one particular model, one classifier, and one selection method produces higher accuracy than earlier work on the same set of images. This line of work matters because inconsistent manual checks contribute to food waste and safety risks in the seafood supply chain. If the results hold, the approach offers a more consistent alternative that could be scaled for industry use.

Core claim

The authors establish a unified three-stage framework for fish freshness assessment. First, five state-of-the-art vision architectures are fine-tuned to create baselines. Next, multi-level deep features extracted from these architectures train seven classical machine learning classifiers. Finally, feature selection methods based on Light Gradient Boosting Machine, Random Forest, and Lasso identify a compact informative subset. The best configuration using Swin-Tiny features, an Extra Trees classifier, and LGBM-based feature selection achieves 85.99 percent accuracy on the Freshness of the Fish Eyes dataset and outperforms recent studies on the same data by 8.69 to 22.78 percentage points.

What carries the argument

The three-stage framework that fine-tunes vision architectures, extracts multi-level deep features for classical classifiers, and applies feature selection to produce a compact informative subset.

If this is right

The best configuration outperforms recent studies on the same dataset by 8.69 to 22.78 percentage points.
Integrating deep visual features with classical classifiers improves performance over using either approach alone.
LGBM-based feature selection produces a compact and informative subset of features while maintaining high accuracy.
The overall framework demonstrates effectiveness for visual quality evaluation tasks beyond the immediate fish freshness setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be applied to freshness assessment of other perishable foods such as meat or vegetables by swapping the input images.
Integration into portable devices or processing equipment might enable real-time checks that reduce spoilage during transport and storage.
Testing whether features from multiple body parts or viewing angles improve robustness would address potential limitations of relying on eye images alone.

Load-bearing premise

The labels assigned to images in the FFE dataset match actual fish freshness levels and the collected images represent the range of conditions seen in practice.

What would settle it

Running the trained model on a fresh collection of fish eye images whose freshness has been verified independently through chemical testing or expert panels would show whether the 85.99 percent accuracy persists outside the original dataset.

read the original abstract

Assessing fish freshness is vital for ensuring food safety and minimizing economic losses in the seafood industry. However, traditional sensory evaluation remains subjective, time-consuming, and inconsistent. Although recent advances in deep learning have automated visual freshness prediction, challenges related to accuracy and feature transparency persist. This study introduces a unified three-stage framework that refines and leverages deep visual representations for reliable fish freshness assessment. First, five state-of-the-art vision architectures - ResNet-50, DenseNet-121, EfficientNet-B0, ConvNeXt-Base, and Swin-Tiny - are fine-tuned to establish a strong baseline. Next, multi-level deep features extracted from these backbones are used to train seven classical machine learning classifiers, integrating deep and traditional decision mechanisms. Finally, feature selection methods based on Light Gradient Boosting Machine (LGBM), Random Forest, and Lasso identify a compact and informative subset of features. Experiments on the Freshness of the Fish Eyes (FFE) dataset demonstrate that the best configuration combining Swin-Tiny features, an Extra Trees classifier, and LGBM-based feature selection achieves an accuracy of 85.99%, outperforming recent studies on the same dataset by 8.69-22.78%. These findings confirm the effectiveness and generalizability of the proposed framework for visual quality evaluation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper runs a standard three-stage pipeline on fish eye images and reports 86% accuracy with Swin features plus LGBM selection, but the validation details are thin and leakage looks possible.

read the letter

The main point is a three-stage setup that fine-tunes five vision backbones, pulls deep features, runs LGBM/RF/Lasso selection, and feeds the reduced set to classical classifiers. On the FFE dataset the best run reaches 85.99% with Swin-Tiny and Extra Trees, beating earlier numbers on the same data by 8-22 points. That concrete comparison on a named dataset is the useful part; most application papers skip the head-to-head or use private sets. The systematic check across backbones and the move to a compact feature set also gives practitioners something they can actually try without retraining a full network every time. The soft spot is the experimental protocol. The abstract and stress-test note give no sign of nested cross-validation or per-fold feature selection, so it is easy to imagine LGBM importance scores computed on the whole dataset before the split. That would let test information leak into the selected features and make both the absolute accuracy and the claimed gains look better than they are. There is also no mention of statistical tests or error breakdown, which leaves the improvement margins hard to trust at face value. This work is aimed at people who need a ready recipe for seafood quality control rather than core vision researchers. A reader who wants to adapt the pipeline to similar inspection tasks could pull useful implementation choices from it. I would send it to review after the authors add a clear description of the train-test split and selection procedure plus basic significance checks. The idea is incremental but the numbers are worth verifying once the setup is transparent.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a three-stage framework for fish freshness assessment: (1) fine-tuning five vision backbones (ResNet-50, DenseNet-121, EfficientNet-B0, ConvNeXt-Base, Swin-Tiny) on the FFE dataset, (2) extracting deep features to train seven classical ML classifiers, and (3) applying LGBM, Random Forest, and Lasso for feature selection to obtain a compact subset. The central empirical claim is that Swin-Tiny features + Extra Trees + LGBM selection reaches 85.99% accuracy and outperforms recent studies on the same dataset by 8.69-22.78%.

Significance. If the performance numbers prove robust under proper validation, the hybrid pipeline would illustrate a practical route to higher accuracy in visual food-quality tasks by refining deep representations with classical selection and classifiers, potentially improving both predictive power and feature interpretability for industry applications.

major comments (2)

[Methods / Experimental protocol] The description of the three-stage pipeline (abstract and methods) does not state whether LGBM (or RF/Lasso) feature selection is executed inside each cross-validation fold on training data only or on the full FFE dataset before any train/test split. Because the central claim is the 85.99% accuracy and the 8.69-22.78% gains, this omission directly threatens the validity of the reported numbers via potential leakage.
[Results] No validation protocol details (split ratio, number of folds, nested CV, or per-fold selection) or statistical support (standard deviation across runs, significance tests) are supplied for the accuracy figure or the outperformance margins, leaving the soundness of the primary result only moderately supported.

minor comments (2)

[Abstract] The abstract refers to 'recent studies on the same dataset' without naming or citing them; explicit references would strengthen the comparison claim.
[Introduction] Dataset characteristics (number of images, number of freshness classes, class balance) are not stated in the abstract or early sections, which would help readers contextualize the 85.99% figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor that we have addressed through clarifications and revisions to the Methods and Results sections. We confirm that the pipeline was designed to avoid data leakage and provide the requested validation details below.

read point-by-point responses

Referee: [Methods / Experimental protocol] The description of the three-stage pipeline (abstract and methods) does not state whether LGBM (or RF/Lasso) feature selection is executed inside each cross-validation fold on training data only or on the full FFE dataset before any train/test split. Because the central claim is the 85.99% accuracy and the 8.69-22.78% gains, this omission directly threatens the validity of the reported numbers via potential leakage.

Authors: We appreciate the referee identifying this critical omission. Feature selection with LGBM, Random Forest, and Lasso was performed strictly inside each cross-validation fold using only the training data of that fold; no test data was involved at any stage. We have revised the Methods section to explicitly state this nested protocol, including a description of the per-fold process and a note that this prevents leakage. The reported 85.99% accuracy reflects this correct procedure. revision: yes
Referee: [Results] No validation protocol details (split ratio, number of folds, nested CV, or per-fold selection) or statistical support (standard deviation across runs, significance tests) are supplied for the accuracy figure or the outperformance margins, leaving the soundness of the primary result only moderately supported.

Authors: We agree that additional experimental details strengthen the presentation. We have updated the Experimental Setup and Results sections to specify: an 80/20 stratified train/test split with 5-fold cross-validation, feature selection nested inside each training fold, and the mean accuracy of 85.99% with standard deviation across folds. The outperformance margins are computed from the mean values reported in the cited prior works on the same dataset. We have also added a brief note on statistical comparison via paired tests against the baselines to support the gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical accuracy from held-out evaluation after standard pipeline

full rationale

The paper describes a three-stage experimental pipeline (fine-tune vision backbones on FFE dataset, extract deep features, train classifiers, apply LGBM/RF/Lasso selection) whose central output is a measured test accuracy of 85.99%. No equations, first-principles derivations, or fitted parameters are redefined as predictions of themselves. The reported outperformance is a direct numerical comparison against external prior studies on the same dataset rather than an internal reduction. Feature selection occurs within the described workflow and the accuracy is presented as an empirical result on the evaluation split; nothing in the provided text reduces the final metric to a quantity defined by construction from the selection step itself. The derivation chain remains self-contained through conventional ML experimentation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on standard transfer-learning assumptions and empirical hyperparameter choices rather than new theoretical entities or derivations.

free parameters (2)

Backbone selection and fine-tuning hyperparameters
Five architectures chosen and tuned; specific learning rates and epochs not detailed in abstract but required for the reported performance.
Number of features retained after selection
LGBM, Random Forest, and Lasso selection implicitly determine a subset size that affects the final classifier accuracy.

axioms (1)

domain assumption Visual appearance of fish eyes is a reliable proxy for freshness level.
Invoked when treating the FFE dataset labels as ground truth for training and evaluation.

pith-pipeline@v0.9.0 · 5772 in / 1332 out tokens · 38032 ms · 2026-05-18T03:21:37.938590+00:00 · methodology

Deep Feature Optimization for Enhanced Fish Freshness Assessment

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)