arxiv: 2605.00793 · v1 · submitted 2026-05-01 · 📡 eess.IV · cs.AI· cs.CV

Recognition: unknown

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

Jingxi Pu , Tonghua Liu , Zhilin Guan , Siqiao Li , Yang Ming , Zheng Cong , Wei Zhang , Fangwei Li

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:01 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CV

keywords unsupervised denoisinglow-dose CTliver CTCycle-GANperceptual lossattention networksmedical image processingdeep learning

0 comments

The pith

An unsupervised Cycle-GAN-inspired network with perceptual attention can denoise real unpaired low-dose liver CT images to clinical standards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to remove noise from low-dose liver CT scans using only unpaired real clinical data, since paired high-dose and low-dose images of the same patient are rarely available. The approach adapts Cycle-GAN training with a U-Net structure, attention-based feature fusion, residual transformations, and perceptual loss to map noisy inputs toward cleaner outputs while preserving diagnostic features. A dedicated real low-dose liver CT dataset is assembled for training and evaluation. Quantitative image metrics and qualitative reviews by imaging physicians both indicate the outputs meet clinical requirements, bypassing the data-pair barrier that blocks most supervised denoising models.

Core claim

The central claim is that an end-to-end unsupervised framework combining U-Net multi-scale extraction, attention mechanisms for feature fusion, residual blocks for transformation, and perceptual loss inside a Cycle-GAN cycle-consistency setup learns a reliable denoising mapping directly from unpaired real low-dose and normal-dose liver CT volumes; the resulting images outperform classical methods on standard metrics and receive physician approval for diagnostic use.

What carries the argument

The perceptual attention network that performs multi-scale feature extraction via U-Net, fuses features with attention, applies residual transformations, and optimizes with cycle-consistency plus perceptual losses for unpaired domain translation between noisy and cleaner CT images.

If this is right

Denoising models can now be trained on existing clinical archives without collecting new paired high-dose exposures.
The method delivers image quality that passes both quantitative benchmarks and direct physician visual assessment.
The constructed real low-dose liver CT dataset supplies a public resource for testing future unsupervised approaches.
Classical iterative or filter-based denoising techniques are outperformed on the same clinical data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same unpaired-training pattern could extend to denoising other body regions or modalities where paired references remain scarce.
Routine clinical low-dose scans alone might suffice for building large training sets, reducing the radiation burden of data collection.
Integration into clinical workflows would require separate tests of diagnostic accuracy rather than image quality alone.

Load-bearing premise

The assumption that cycle-consistency training on unpaired real low-dose and normal-dose scans will produce a mapping that removes noise without erasing or fabricating diagnostic features.

What would settle it

A reader study in which multiple physicians perform blinded diagnostic tasks on both the denoised low-dose images and the true high-dose reference scans from the same patients, measuring agreement in lesion detection or characterization.

Figures

Figures reproduced from arXiv: 2605.00793 by Fangwei Li, Jingxi Pu, Siqiao Li, Tonghua Liu, Wei Zhang, Yang Ming, Zheng Cong, Zhilin Guan.

**Figure 1.** Figure 1: The effect of the method proposed in this paper. (a) view at source ↗

**Figure 2.** Figure 2: Proposed unsupervised LDCT denoising framework and generator architecture. The generator combines a U-Net structure view at source ↗

**Figure 5.** Figure 5: 3D migration learning parameter sharing during LDCT view at source ↗

**Figure 4.** Figure 4: The attention structure. q(l, h) = ψ T σ1 view at source ↗

**Figure 6.** Figure 6: Visual comparison chart of window width and window view at source ↗

**Figure 7.** Figure 7: Comparison of classic methods. CYCLE-GAN and is more suitable for this dataset. At the same time, it can be seen in Table II that the effect of the method in this paper is indistinguishable from the WGANVGG method, but for an unsupervised learning method, the effect is already very good. (a) FULL-DOSE (b) LOW-DOSE (c) CYCLEGAN-RES (d) CYCLEGAN-U (e) CYCLEGAN-RAS view at source ↗

**Figure 8.** Figure 8: Experimental results of the method in this paper view at source ↗

**Figure 9.** Figure 9: Low-dose 100 kV 1mm experimental effect (a) 100 kV, 1 mm (b) fake 120 kV, 1 mm (c) real 120 kV, 1 mm view at source ↗

**Figure 10.** Figure 10: Low-dose 80kV 5mm experimental results The results are as follows: V. CONCLUSION The problem addressed in this paper is the study of algorithms for the practical application of denoising of LDCT. Previous research works use supervised learning methods, but they do not work in clinical practical data. This paper innovatively proposes an end-to-end unsupervised LDCT denoising framework. It combines the U-… view at source ↗

read the original abstract

With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, which may interfere with visual interpretation by physicians and affect diagnostic results. To address this problem, inspired by Cycle-GAN for unsupervised learning, this paper proposes an end-to-end unsupervised low-dose computed tomography denoising framework. The proposed framework combines a U-Net structure for multi-scale feature extraction, an attention mechanism for feature fusion, and a residual network for feature transformation. It also introduces perceptual loss to improve the network for the characteristics of medical images. In addition, we construct a real low-dose computed tomography dataset and design a large number of comparative experiments to validate the proposed method, using both image-based evaluation metrics and medical evaluation criteria. Compared with classical methods, the main advantage of this paper is that it addresses the limitation that real clinical data cannot be directly used for supervised learning, while still achieving excellent performance. The experimental results are also professionally evaluated by imaging physicians and meet clinical needs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies a Cycle-GAN variant with U-Net, attention, and perceptual loss to a new real unpaired liver CT dataset, but the performance claims rest on unshown metrics and subjective physician scores.

read the letter

The paper builds a real unpaired low-dose liver CT dataset and trains an unsupervised network that combines U-Net for multi-scale features, attention for fusion, residual blocks, and perceptual loss inside a Cycle-GAN framework. That is the concrete contribution: moving the same family of methods from simulated or paired data onto actual clinical scans where paired normal-dose references do not exist. The physician visual assessment is also a reasonable step for clinical relevance. Those pieces are straightforward extensions rather than new theory, but they target a real bottleneck in radiation-dose reduction. The soft spots are more substantial. The abstract states that comparative experiments and image-based metrics were run and that physicians judged the output clinically acceptable, yet it supplies none of the numbers, no error bars, no statistical tests, and no description of how the metrics were computed on the unpaired real data. Without paired ground truth on the target domain, standard full-reference scores cannot be calculated there, so any reported image metrics are either no-reference or come from synthetic pairs. Physician scoring is necessary but known to overlook subtle hallucinations or lost low-contrast structures that could affect diagnosis. The central claim that the method achieves excellent performance while solving the unpaired-data problem therefore depends on evidence that is not visible in the provided text. This work is mainly for groups already working on unsupervised medical-image denoising who need another real-data example. A reader seeking rigorous benchmarks or new architectural insight will find little. If the full manuscript contains the missing quantitative results, failure-mode analysis, and clear description of the metrics on real scans, it is worth sending to review; on the current showing the evidence looks too thin for the strength of the claims.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an end-to-end unsupervised denoising framework for real clinical low-dose liver CT scans, inspired by Cycle-GAN. It combines a U-Net backbone for multi-scale features, attention for fusion, residual blocks for transformation, and perceptual loss to better suit medical image characteristics. The authors construct an unpaired real LDCT dataset, conduct comparative experiments against classical methods, and report both image-based metrics and physician visual assessments, claiming the approach overcomes the supervised-learning limitation of needing paired data while delivering excellent clinical performance.

Significance. If the central performance claims hold under objective scrutiny, the work would be moderately significant for clinical CT denoising: it directly targets the practical barrier of unpaired real-world data and demonstrates a pathway for unsupervised training that could reduce radiation dose without requiring additional normal-dose acquisitions. The construction of a real unpaired clinical dataset is a concrete contribution, though the absence of paired references on the target domain limits the strength of the validation.

major comments (3)

[Abstract and §4] Abstract and §4 (Experiments): the assertion of 'excellent performance' and that results 'meet clinical needs' rests on comparative experiments and physician evaluation, yet no quantitative values (PSNR, SSIM, or equivalent), error bars, statistical tests, or failure-mode analysis are supplied for the real unpaired test set; without paired normal-dose references, standard full-reference metrics cannot be computed on the target domain, leaving the performance claim dependent on unreported no-reference metrics or synthetic-data proxies.
[§3 and §4] §3 (Method) and §4: the Cycle-GAN-inspired training with cycle-consistency plus perceptual loss is presented as sufficient to learn a reliable noise-to-clean mapping, but no ablation or diagnostic analysis is given to confirm absence of hallucinations, lesion erasure, or structural artifacts on real low-contrast liver features; physician scoring alone is known to be insensitive to such subtle changes.
[§4] §4: details on training stability (convergence behavior, loss-weight sensitivity, multiple random seeds) and on the physician evaluation protocol (number of readers, scoring rubric, inter-rater agreement, specific criteria for diagnostic utility) are missing, undermining reproducibility and the claim that the method addresses clinical needs.

minor comments (2)

[§3] Notation for the attention and residual modules is introduced without explicit equations or diagrams showing how they integrate into the Cycle-GAN generators and discriminators.
[Abstract and §4] The abstract states 'a large number of comparative experiments' but the text does not list the exact baseline methods, hyper-parameter settings, or dataset split sizes.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive comments. We address each major comment point by point below, providing the strongest honest defense of the manuscript while proposing revisions where the concerns can be directly addressed without misrepresenting our unsupervised real-data setting.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the assertion of 'excellent performance' and that results 'meet clinical needs' rests on comparative experiments and physician evaluation, yet no quantitative values (PSNR, SSIM, or equivalent), error bars, statistical tests, or failure-mode analysis are supplied for the real unpaired test set; without paired normal-dose references, standard full-reference metrics cannot be computed on the target domain, leaving the performance claim dependent on unreported no-reference metrics or synthetic-data proxies.

Authors: We agree that full-reference metrics such as PSNR and SSIM are impossible on the real unpaired test set, as no paired normal-dose references exist in the target clinical domain—this is inherent to the unsupervised real-data problem our work targets. The manuscript already reports full-reference metrics on synthetic paired data and relies on no-reference metrics plus physician assessment for real data. In revision we will explicitly report quantitative no-reference metrics (NIQE, BRISQUE) with values, error bars, and statistical tests on the real test set, plus a dedicated failure-mode analysis subsection in §4. revision: yes
Referee: [§3 and §4] §3 (Method) and §4: the Cycle-GAN-inspired training with cycle-consistency plus perceptual loss is presented as sufficient to learn a reliable noise-to-clean mapping, but no ablation or diagnostic analysis is given to confirm absence of hallucinations, lesion erasure, or structural artifacts on real low-contrast liver features; physician scoring alone is known to be insensitive to such subtle changes.

Authors: We acknowledge that physician scoring, while clinically relevant, may miss subtle hallucinations or low-contrast lesion changes. We will add an ablation study on cycle-consistency versus perceptual loss in revised §3 and §4. On synthetic data with ground truth we will quantify lesion preservation and artifact rates; on real data we will add targeted qualitative examples of low-contrast liver regions and note that complete hallucination detection without references remains challenging. revision: partial
Referee: [§4] §4: details on training stability (convergence behavior, loss-weight sensitivity, multiple random seeds) and on the physician evaluation protocol (number of readers, scoring rubric, inter-rater agreement, specific criteria for diagnostic utility) are missing, undermining reproducibility and the claim that the method addresses clinical needs.

Authors: We will expand §4 with training stability details: loss convergence plots, sensitivity analysis to perceptual-loss weight, and mean/std results across three random seeds. For physician evaluation we will specify the protocol: three board-certified radiologists, 5-point rubric for noise reduction/artifact presence/diagnostic utility, inter-rater agreement via Fleiss’ kappa, and explicit criteria (e.g., preservation of liver lesions >5 mm) used to judge clinical utility. revision: yes

standing simulated objections not resolved

We cannot compute standard full-reference metrics (PSNR, SSIM) on the real unpaired clinical test set because paired normal-dose reference images are unavailable in the target domain.

Circularity Check

0 steps flagged

No circularity: purely empirical unsupervised framework with independent validation

full rationale

The paper presents an empirical deep learning architecture (U-Net + attention + residual blocks + perceptual loss, Cycle-GAN inspired) trained on unpaired real clinical LDCT data. No equations, derivations, or first-principles results are claimed; performance is assessed via image metrics on synthetic data and physician visual scoring on real data. No self-definitional reductions, fitted inputs relabeled as predictions, or load-bearing self-citations appear. The central claim (unsupervised training enables denoising on real unpaired scans) rests on standard Cycle-GAN training dynamics and external evaluation, not on any quantity defined by the method itself. This is self-contained empirical work.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus the domain premise that perceptual loss and unpaired Cycle-GAN training suffice for medical-image denoising; no new entities are postulated.

free parameters (1)

network hyperparameters and loss weights
U-Net depth, attention channels, residual blocks, perceptual-loss coefficients, and Cycle-GAN cycle-consistency weight are chosen or tuned for the liver CT task.

axioms (2)

domain assumption Unsupervised adversarial training can learn a denoising mapping from unpaired low-dose and normal-dose distributions
Invoked by the Cycle-GAN-inspired framework design in the abstract.
domain assumption Perceptual loss based on pre-trained feature extractors captures diagnostically relevant image quality for CT
Stated as introduced to improve the network for medical-image characteristics.

pith-pipeline@v0.9.0 · 5524 in / 1377 out tokens · 53999 ms · 2026-05-09T18:01:57.373038+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references

[1]

Impact of dual-energy multidetector row CT with virtual monochromatic imaging on renal cyst pseudoenhancement: In vitro and in vivo study,

A. Mileto, R. C. Nelson, E. Samei,et al., “Impact of dual-energy multidetector row CT with virtual monochromatic imaging on renal cyst pseudoenhancement: In vitro and in vivo study,”Radiology, vol. 272, no. 3, pp. 767–776, Sep. 2014

2014
[2]

Machine learning, medical diagnosis, and biomedical engineering research—commentary,

K. R. Foster, R. Koprowski, and J. D. Skufca, “Machine learning, medical diagnosis, and biomedical engineering research—commentary,” Biomed. Eng. Online, vol. 165, no. 5, pp. 13–24, Oct. 2014

2014
[3]

The reduction of image noise and streak artifact in the thoracic inlet during low dose and ultra-low dose thoracic CT,

N. S. Paulet al., “The reduction of image noise and streak artifact in the thoracic inlet during low dose and ultra-low dose thoracic CT,”Phys. Med. Biol., vol. 55, no. 5, pp. 1363–1380, Mar. 2010

2010
[4]

Attenuation-based automatic kilovolt selection in abdominal computed tomography: Effects on radiation exposure and image quality,

A. M. Eller, M. S. May, M. M. Scharf,et al., “Attenuation-based automatic kilovolt selection in abdominal computed tomography: Effects on radiation exposure and image quality,”Invest. Radiol., vol. 47, no. 10, pp. 559–565, Oct. 2012

2012
[5]

Abdominal CT with low tube voltage: Preliminary observations about radiation dose, contrast enhancement, image quality, and noise,

Y . Nakayama, K. Awai, Y . Funama,et al., “Abdominal CT with low tube voltage: Preliminary observations about radiation dose, contrast enhancement, image quality, and noise,”Radiology, vol. 237, no. 3, pp. 945–951, Dec. 2005

2005
[6]

Nakaura, S

T. Nakaura, S. Nakamura, N. Maruyama,et al., “Low contrast agent and radiation dose protocol for hepatic dynamic CT of thin adults at 256–detector row CT: Effect of low tube voltage and hybrid iterative reconstruction algorithm on image quality,”Radiology, vol. 264, no. 2, pp. 445–454, Aug. 2012

2012
[8]

Bayesian statis- tical reconstruction for low-dose X-ray computed tomography using an adaptive-weighting nonlocal prior,

Y . Chen, D. Gao, C. Nie, L. Luo, and W. Chen, “Bayesian statis- tical reconstruction for low-dose X-ray computed tomography using an adaptive-weighting nonlocal prior,”Comput. Med. Imaging Graph., vol. 33, no. 7, pp. 495–500, Oct. 2009

2009
[9]

Filtered back projection, adaptive statistical iterative reconstruction, and a model-based iterative reconstruction in abdominal CT: An experimental clinical study,

Z. De ´ak, J. M. Grimm, M. Treitl,et al., “Filtered back projection, adaptive statistical iterative reconstruction, and a model-based iterative reconstruction in abdominal CT: An experimental clinical study,”Radi- ology, vol. 266, no. 1, pp. 197–206, Jun. 2013

2013
[11]

Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,

A. Manduca, L. Yu, J. D. Trzasko,et al., “Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,” Med. Phys., vol. 36, no. 11, pp. 4911–4919, Nov. 2009

2009
[12]

Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low dose X-ray computed tomography,

J. Wang, T. Li, H. Lu, and Z. Liang, “Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low dose X-ray computed tomography,”IEEE Trans. Med. Imaging, vol. 25, no. 10, pp. 1272–1283, Oct. 2006

2006
[13]

Improving abdomen tumor low-dose CT images using dictionary learning based patch processing and unsharp filtering,

Y . Chen, F. Yu, L. Luo, and C. Toumoulin, “Improving abdomen tumor low-dose CT images using dictionary learning based patch processing and unsharp filtering,” inProc. IEEE EMBS, 2013, pp. 4014–4017, Jul. 2013

2013
[14]

Low-dose computed tomography image restoration using previous normal-dose scan,

J. Ma, J. Huang, Q. Feng,et al., “Low-dose computed tomography image restoration using previous normal-dose scan,”Med. Phys., vol. 38, no. 10, pp. 5713–5731, Oct. 2011

2011
[15]

Ray contribution masks for structure adaptive sinogram filtering,

M. Balda, J. Hornegger, and B. Heismann, “Ray contribution masks for structure adaptive sinogram filtering,”IEEE Trans. Med. Imaging, vol. 30, no. 5, pp. 1116–1128, Jun. 2011

2011
[16]

Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,

E. Y . Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Phys. Med. Biol., vol. 53, no. 17, pp. 4777–4807, Sep. 2008

2008
[17]

Few-view image reconstruc- tion with fractional-order total variation,

Y . Zhang, W. Zhang, Y . Lei, and J. Zhou, “Few-view image reconstruc- tion with fractional-order total variation,”J. Opt. Soc. Am. A, vol. 31, no. 5, pp. 981–995, May 2014

2014
[18]

Statistical iterative reconstruction using adaptive fractional order regularization,

Y . Zhang, Y . Wang, W. Zhang, F. Lin, Y . Pu, and J. Zhou, “Statistical iterative reconstruction using adaptive fractional order regularization,” Biomed. Opt. Express, vol. 7, no. 3, pp. 1015–1029, Mar. 2016

2016
[19]

Few-view image reconstruction combining total variation and a high-order norm,

Y . Zhang, W.-H. Zhang, H. Chen,et al., “Few-view image reconstruction combining total variation and a high-order norm,”Int. J. Imaging Syst. Technol., vol. 23, no. 3, pp. 249–255, Aug. 2013

2013
[20]

Bayesian statistical reconstruction for low-dose X- ray computed tomography using an adaptive weighting nonlocal prior,

Y . Chenet al., “Bayesian statistical reconstruction for low-dose X- ray computed tomography using an adaptive weighting nonlocal prior,” Comput. Med. Imaging Graph., vol. 33, no. 7, pp. 495–500, Jun. 2009

2009
[21]

Iterative image reconstruction for cerebral perfusion CT using a pre-contrast scan induced edge-preserving prior,

J. Ma, H. Zhang, Y . Gao,et al., “Iterative image reconstruction for cerebral perfusion CT using a pre-contrast scan induced edge-preserving prior,”Phys. Med. Biol., vol. 57, no. 22, pp. 7519–7542, Mar. 2012

2012
[22]

Spectral CT reconstruction with image sparsity and spectral mean,

Y . Zhang, Y . Xi, Q. Yang, W. Cong, J. Zhou, and G. Wang, “Spectral CT reconstruction with image sparsity and spectral mean,”IEEE Trans. Comput. Imaging, vol. 2, no. 4, pp. 510–523, Dec. 2016

2016
[23]

Low-dose X-ray CT reconstruction via dictionary learning,

Q. Xu, H. Yu, X. Mou,et al., “Low-dose X-ray CT reconstruction via dictionary learning,”IEEE Trans. Med. Imaging, vol. 31, no. 9, pp. 1682–1697, Sep. 2012

2012
[24]

Cine cone-beam CT reconstruction using low-rank matrix factorization: Algorithm and a proof-of-principle study,

J.-F. Cai, X. Jia, H. Gao,et al., “Cine cone-beam CT reconstruction using low-rank matrix factorization: Algorithm and a proof-of-principle study,” IEEE Trans. Med. Imaging, vol. 33, no. 8, pp. 1581–1591, Nov. 2014

2014
[25]

Model-based iterative reconstruction technique for radiation dose reduction in chest CT: Com- parison with the adaptive statistical iterative reconstruction techniques,

M. Katsura, M. Matsuda, M. Akahane,et al., “Model-based iterative reconstruction technique for radiation dose reduction in chest CT: Com- parison with the adaptive statistical iterative reconstruction techniques,” Eur . Radiol., vol. 22, no. 8, pp. 1613–1623, Aug. 2012

2012
[26]

Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,

A. Manducaet al., “Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,”Med. Phys., vol. 36, no. 11, pp. 4911–4919, Nov. 2009

2009
[27]

K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,

M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,”IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–322, Sep. 2006

2006