arxiv: 2508.06816 · v3 · submitted 2025-08-09 · 💻 cs.CV

DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation

Vikram Singh , Kabir Malhotra , Rohan Desai , Ananya Shankaracharya , Priyadarshini Chatterjee , Krishnan Menon Iyer This is my paper

Pith reviewed 2026-05-19 00:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords melanocytic lesion segmentationdermoscopydual resolutionartifact suppressionboundary awaremulti-task learningskin cancerimage segmentation

0 comments p. Extension

The pith

A dual-resolution residual architecture with artifact suppression produces more precise segmentation of melanocytic lesions in dermoscopic images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a dual-resolution residual network designed specifically for segmenting melanocytic tumors in dermoscopic images. It uses a high-resolution stream to maintain fine boundary details and a pooled stream to capture broader context, connected through boundary-aware residual links and channel attention. A lightweight artifact suppression block addresses issues like hairs and bubbles, while a multi-task training approach combines Dice-Tversky loss with boundary loss and contrastive regularization to handle small datasets. This design aims to deliver accurate masks without heavy post-processing. The authors show through evaluations on public benchmarks that it improves boundary precision and other metrics over standard encoder-decoder models, supporting better automated skin cancer screening.

Core claim

The dual-resolution residual architecture incorporates a high-resolution stream that preserves fine boundary details alongside a pooled stream for multi-scale context, integrated via boundary-aware residual connections and channel attention, together with a lightweight artifact suppression block and multi-task training using Dice-Tversky loss, explicit boundary loss, and contrastive regularizer, enabling the generation of pixel-accurate segmentation masks for melanocytic lesions without extensive post-processing or complex pre-training.

What carries the argument

Dual-resolution streams with boundary-aware residual connections and a lightweight artifact suppression block, trained via multi-task losses including Dice-Tversky, boundary, and contrastive terms.

If this is right

Enhances boundary precision and clinically relevant segmentation metrics on public dermoscopic benchmarks.
Outperforms traditional encoder-decoder baselines in lesion segmentation accuracy.
Generates pixel-accurate masks without the need for extensive post-processing or complex pre-training.
Provides a valuable component for building automated melanoma assessment systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be adapted for segmenting other types of skin lesions or medical images with similar artifact challenges.
Further validation on datasets representing more diverse skin tones and clinical settings would strengthen its applicability.
Combining this architecture with real-time inference optimizations might enable deployment in clinical decision support tools.
The contrastive regularizer may offer benefits in other segmentation tasks where feature stability is key on limited data.

Load-bearing premise

The public dermoscopic benchmarks used are representative of real-world clinical variability in artifacts, skin types, and lesion appearances.

What would settle it

A new evaluation on a clinical dataset with greater variability in skin types, lighting, or artifact types where the method fails to show improved boundary precision or segmentation metrics compared to baselines would challenge the claims.

read the original abstract

Lesion segmentation, in contrast to natural scene segmentation, requires handling subtle variations in texture and color, frequent imaging artifacts (such as hairs, rulers, and bubbles), and a critical need for precise boundary localization to aid in accurate diagnosis. The accurate delineation of melanocytic tumors in dermoscopic images is a crucial component of automated skin cancer screening systems and clinical decision support. In this paper, we present a novel dual-resolution architecture inspired by ResNet, specifically tailored for the segmentation of melanocytic tumors. Our approach incorporates a high-resolution stream that preserves fine boundary details, alongside a complementary pooled stream that captures multi-scale contextual information for robust lesion recognition. These two streams are closely integrated through boundary-aware residual connections, which inject edge information into deep feature maps, and a channel attention mechanism that adapts the model's sensitivity to color and texture variations in dermoscopic images. To tackle common imaging artifacts and the challenges posed by small clinical datasets, we introduce a lightweight artifact suppression block and a multi-task training strategy. This strategy combines the Dice-Tversky loss with an explicit boundary loss and a contrastive regularizer to enhance feature stability. This unified design enables the model to generate pixel-accurate segmentation masks without the need for extensive post-processing or complex pre-training. Extensive evaluation on public dermoscopic benchmarks reveals that our method significantly enhances boundary precision and clinically relevant segmentation metrics, outperforming traditional encoder-decoder baselines. This makes our approach a valuable component for building automated melanoma assessment systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A domain-specific tweak to ResNet-style segmentation that targets dermoscopy artifacts and boundaries but rests on unshown metrics.

read the letter

The main point is a dual-resolution ResNet variant that keeps a high-resolution stream for edge details while using a pooled stream for context, tied together by boundary-aware residuals and channel attention, plus a lightweight block to suppress hairs, rulers, and bubbles. It trains with Dice-Tversky, an explicit boundary term, and a contrastive regularizer to stabilize things on small datasets. The goal is pixel-accurate masks without heavy post-processing for melanoma screening support. That combination is a reasonable, focused extension of existing multi-scale and attention ideas to this exact clinical imaging problem. It does a decent job naming the real pain points—subtle texture shifts, frequent artifacts, and the need for precise boundaries—and proposes components that directly address them rather than generic improvements. The multi-task loss setup and the artifact block feel like practical choices for limited medical data. The soft spots are straightforward. The abstract asserts clear gains over encoder-decoder baselines on public dermoscopic benchmarks, yet supplies no numbers, no dataset breakdowns, no ablations, and no error bars. Without those, it is impossible to tell whether the claimed boundary improvements are meaningful or whether the artifact block actually drives them. The stress-test concern about benchmark diversity also lands: public sets often under-represent Fitzpatrick skin type variation, subtle lesion edges, and artifact distributions seen in real clinics. If the full paper lacks cross-dataset tests or diversity statistics, the generalization story stays thin. Minor gaps include missing details on exact stream fusion and loss weight tuning. This work is aimed at medical image segmentation researchers who already work on dermoscopy or similar artifact-heavy tasks. Someone building practical screening tools might pick up usable design patterns even if the results need checking. It shows coherent engagement with the problem and prior methods, so it counts as serious thinking. I would send it to peer review to get the experimental evidence properly examined rather than desk-rejecting it outright.

Referee Report

1 major / 1 minor

Summary. The paper proposes a DualResolution Residual Architecture for melanocytic lesion segmentation in dermoscopic images. It uses a high-resolution stream to preserve boundary details and a pooled stream for multi-scale context, integrated via boundary-aware residual connections and channel attention. A lightweight artifact suppression block addresses imaging artifacts, while multi-task training combines Dice-Tversky loss, an explicit boundary loss, and a contrastive regularizer. The central claim is that this design yields superior boundary precision and clinically relevant segmentation metrics on public dermoscopic benchmarks compared to traditional encoder-decoder baselines, without requiring extensive post-processing.

Significance. If the empirical results hold with proper validation, the work could offer a practical, lightweight contribution to automated skin cancer screening by improving handling of artifacts and subtle boundaries in dermoscopy. The multi-task strategy and avoidance of complex pre-training are pragmatic strengths for deployment on limited clinical data.

major comments (1)

[Experimental Evaluation] Experimental Evaluation section: the headline claim that the method 'significantly enhances boundary precision and clinically relevant segmentation metrics' while outperforming baselines rests on public dermoscopic benchmarks; however, these benchmarks are not shown to capture sufficient variability in Fitzpatrick skin types, artifact distributions (hairs, rulers, bubbles), or subtle lesion boundaries typical of real clinical settings. Without cross-dataset generalization tests or ablations isolating the artifact suppression block under distribution shift, the robustness and clinical relevance of the gains are not fully supported.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one or two key quantitative results (e.g., Dice or boundary F1 scores with baselines) rather than purely qualitative assertions of superiority.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and outline revisions that will be incorporated to strengthen the presentation of robustness and clinical relevance.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation section: the headline claim that the method 'significantly enhances boundary precision and clinically relevant segmentation metrics' while outperforming baselines rests on public dermoscopic benchmarks; however, these benchmarks are not shown to capture sufficient variability in Fitzpatrick skin types, artifact distributions (hairs, rulers, bubbles), or subtle lesion boundaries typical of real clinical settings. Without cross-dataset generalization tests or ablations isolating the artifact suppression block under distribution shift, the robustness and clinical relevance of the gains are not fully supported.

Authors: We agree that explicit demonstration of generalization across greater clinical variability would strengthen the claims. While the public benchmarks (ISIC 2017/2018 and PH2) already contain images spanning multiple skin tones, common artifacts (hairs, rulers, bubbles), and lesions with ambiguous boundaries, we acknowledge that dedicated cross-dataset tests and isolated ablations of the artifact suppression block under distribution shift are not currently reported. In the revised manuscript we will add these experiments, including evaluation on an external dataset and controlled ablations that isolate the artifact block under simulated shifts in artifact prevalence and skin-tone distribution. These additions will be placed in the Experimental Evaluation section and will directly support the robustness assertions. revision: yes

Circularity Check

0 steps flagged

Empirical architecture proposal with no derivation chain or self-referential reductions

full rationale

The paper introduces a dual-resolution residual network with boundary-aware connections, channel attention, a lightweight artifact suppression block, and multi-task losses (Dice-Tversky + boundary + contrastive) for dermoscopic lesion segmentation. All claims rest on experimental comparisons against encoder-decoder baselines on public benchmarks, with no mathematical derivations, predictions, or uniqueness theorems that reduce by construction to fitted inputs, self-citations, or ansatzes. Design elements are presented as engineering choices evaluated empirically rather than derived from prior self-referential results. The work is self-contained against external benchmarks, producing a normal non-finding for circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on empirical validation of the proposed components rather than first-principles derivation; standard neural network assumptions about data distribution and optimization are invoked implicitly.

free parameters (1)

multi-task loss weights
The combination of Dice-Tversky, boundary, and contrastive terms implies tunable coefficients whose specific values are not stated and must be chosen to achieve the reported performance.

axioms (1)

domain assumption Dermoscopic images contain a limited set of common artifacts (hairs, rulers, bubbles) that can be effectively suppressed by a lightweight dedicated block without harming lesion features.
Invoked in the description of the artifact suppression block and its role in handling small clinical datasets.

pith-pipeline@v0.9.0 · 5817 in / 1401 out tokens · 39052 ms · 2026-05-19T00:37:59.243270+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

167 extracted references · 167 canonical work pages · 2 internal anchors

[1]

Arxiv article (2023)

Ando, A., Gidaris, S., Bursuc, A., Puy, G., Boulch, A., Marlet, R.: Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. Arxiv article (2023)

work page 2023
[2]

Bazi, Y., Bashmal, L., et.al: Vision transformers for remote sensing image classification (2021)

work page 2021
[3]

Arxiv article (2024)

Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuili` ere, S.: Collaborating foundation models for domain generalized semantic segmentation. Arxiv article (2024)

work page 2024
[4]

In: European Conference on Computer Vision (ECCV) (2020)

Cha, J., Chun, S., Lee, G., Lee, B., Kim, S., Lee, H.: Few-shot compositional font gen- eration with dual memory. In: European Conference on Computer Vision (ECCV) (2020)

work page 2020
[5]

In: Arxiv Article (2021)

Choromanski, K., Likhosherstov, V., et.al: Rethinking attention with performers. In: Arxiv Article (2021)

work page 2021
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

Cha, J., Mun, J., Roh, B.: Learning to generate text-grounded mask for open- world semantic segmentation from only image-text pairs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

work page 2023
[7]

Arxiv article (2023) 11

Cha, J., Mun, J., Roh, B.: Learning to generate text-grounded mask for open-world semantic segmentation from only image-text pairs. Arxiv article (2023) 11

work page 2023
[8]

Advances in Neural Information Processing Systems 34, 7306–7318 (2021)

Chen, J., Niu, L., Liu, L., Zhang, L.: Weak-shot fine-grained classification via similarity transfer. Advances in Neural Information Processing Systems 34, 7306–7318 (2021)

work page 2021
[9]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Chen, J., Niu, L., Zhang, J., Si, J., Qian, C., Zhang, L.: Amodal instance segmentation via prior-guided expansion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 313–321 (2023)

work page 2023
[10]

Advances in Neural Information Processing Systems 35, 32525–32536 (2023)

Chen, J., Niu, L., Zhou, S., Si, J., Qian, C., Zhang, L.: Weak-shot semantic segmenta- tion via dual similarity transfer. Advances in Neural Information Processing Systems 35, 32525–32536 (2023)

work page 2023
[11]

arXiv preprint (2022) arXiv:2203.11068

Cun, X., Wang, Z., et.al: Learning enriched illuminants for cross and single sensor color constancy. Arxiv preprint (2022) Arxiv:2203.11068

work page arXiv 2022
[12]

Ding, L., Lin, D., et.al: Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images (2022)

work page 2022
[13]

Arxiv article (2025)

Du, J., Liu, Y., et.al: Dependeval: Benchmarking llms for repository dependency understanding. Arxiv article (2025)

work page 2025
[14]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual Attention Network for Scene Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)

work page 2019
[15]

Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

Fan, D.P., Zhang, S.C., et.al: Face sketch synthesis style similarity: A new structure co-occurrence texture measure. Arxiv preprint (2018) Arxiv:1804.02975

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

In: Arxiv Article (2022)

Guan, T., Wang, J., et.al: M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers. In: Arxiv Article (2022)

work page 2022
[17]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2022)

Ghorbanzadeh, O., Xu, Y., Zhao, H., Wang, J., Zhong, Y., Zhao, D., Zang, Q., et al.: The outcome of the 2022 landslide4sense competition: Advanced landslide detection from multisource satellite imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2022)

work page 2022
[18]

Huang, Z., Ben, Y., et.al: Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer (2021)

work page 2021
[19]

Arxiv article (2023)

He, H., Cai, J., Pan, Z., Liu, J., Zhang, J., Tao, D., Zhuang, B.: Dynamic focus-aware positional queries for semantic segmentation. Arxiv article (2023)

work page 2023
[20]

In: Proceedings of the ICCV, pp

He, H., Cai, J., Zhang, J., Tao, D., Zhuang, B.: Sensitivity-aware visual parameter- efficient fine-tuning. In: Proceedings of the ICCV, pp. 11825–11835 (2023)

work page 2023
[21]

IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022) 12

He, P., Jiao, L., Shang, R., Wang, S., Liu, X., Quan, D., Yang, K., Zhao, D.: Manet: Multi-scale aware-relation network for semantic segmentation in aerial scenes. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022) 12

work page 2022
[22]

TPAMI (2024)

He, H., Liu, J., Pan, Z., Cai, J., Zhang, J., Tao, D., Zhuang, B.: Pruning self-attentions into convolutional layers in single path. TPAMI (2024)

work page 2024
[23]

Computer Vision and Image Understanding 224, 103556 (2022)

Huang, X., Wang, Y., Li, S., Mei, G., Xu, Z., Wang, Y., Zhang, J., Bennamoun, M.: Robust real-world point cloud registration by inlier detection. Computer Vision and Image Understanding 224, 103556 (2022)

work page 2022
[24]

In: Proceedings of the AAAI (2020)

He, H., Zhang, J., Zhang, Q., Tao, D.: Grapy-ml: Graph pyramid mutual learning for cross-dataset human parsing. In: Proceedings of the AAAI (2020)

work page 2020
[25]

Arxiv article (2021)

Jia, Y., Kaul, C., Lawton, T., Murray-Smith, R., Habli, I.: Prediction of weaning from mechanical ventilation using convolutional neural networks. Arxiv article (2021)

work page 2021
[26]

IEEE Transactions on Image Processing 30, 832–844 (2021)

Jiang, P.T., Zhang, C.B., Hou, Q., Cheng, M.M., Wei, Y.: Layercam: Exploring hier- archical class activation maps. IEEE Transactions on Image Processing 30, 832–844 (2021)

work page 2021
[27]

Arxiv article (2024)

Kim, C., Han, W., et.al: Eagle: Eigen aggregation learning for object-centric unsuper- vised semantic segmentation. Arxiv article (2024)

work page 2024
[28]

Arxiv preprint (2025)

Kim, D., Ko, H., et.al: Fourier decomposition for explicit representation of 3d point cloud attributes. Arxiv preprint (2025)

work page 2025
[29]

Arxiv article (2021)

Kaul, C., Mitton, J., et.al: Cpt: Convolutional point transformer for 3d point cloud processing. Arxiv article (2021)

work page 2021
[30]

In: Arxiv Article (2019)

Kaul, C., Manandhar, S., Pears, N.: Focusnet: An attention-based fully convolutional network for medical image segmentation. In: Arxiv Article (2019)

work page 2019
[31]

Arxiv article (2019)

Kaul, C., Pears, N., Manandhar, S.: Sawnet: A spatially aware deep neural network for 3d point cloud processing. Arxiv article (2019)

work page 2019
[32]

In: Arxiv Article (2021)

Kaul, C., Pears, N., Manandhar, S.: Fatnet: A feature-attentive network for 3d point cloud processing. In: Arxiv Article (2021)

work page 2021
[33]

Advances in Neural Information Processing Systems 35, 30499–30511 (2022)

Kweon, H., Yoon, K.J.: Joint learning of 2d-3d weakly supervised semantic seg- mentation. Advances in Neural Information Processing Systems 35, 30499–30511 (2022)

work page 2022
[34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Kweon, H., Yoon, K.J.: From sam to cams: Exploring segment anything model for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024
[35]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021) 13

Kweon, H., Yoon, S.H., Kim, H., Park, D., Yoon, K.J.: Unlocking the potential of ordi- nary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021) 13

work page 2021
[36]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Kweon, H., Yoon, S.H., Yoon, K.J.: Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021
[37]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

Kweon, H., Yoon, S.H., Yoon, K.J.: Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

work page 2023
[38]

Arxiv article (2018)

Lu, Z., He, Q., et.al: Defect detection of pcb based on bayes feature fusion. Arxiv article (2018)

work page 2018
[39]

Arxiv article (2023)

Liu, X., Han, Z., Lee, S., Cao, Y.-P., Liu, Y.-S.: D-net: Learning for distinctive point clouds by self-attentive point searching and learnable feature fusion. Arxiv article (2023)

work page 2023
[40]

In: Arxiv Article (2019)

Liu, X., Han, Z., Lee, S., Cao, Y.-P.: Point2sequence: Learning the shape representa- tion of 3d point clouds with an attention-based sequence to sequence network. In: Arxiv Article (2019)

work page 2019
[41]

Arxiv article (2021)

Liu, X., Han, Z., Liu, Y.-S., Zwicker, M.: Fine-grained 3d shape classification with hierarchical part-view attention. Arxiv article (2021)

work page 2021
[42]

In: Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS) (2022)

Li, J., Jie, Z., Wang, X., Wei, X., Ma, L.: Expansion and shrinkage of localization for weakly-supervised semantic segmentation. In: Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS) (2022)

work page 2022
[43]

IEEE Transactions on Multimedia 25, 1686–1699 (2022)

Li, J., Jie, Z., Wang, X., Zhou, Y., Wei, X., Ma, L.: Weakly supervised semantic segmentation via progressive patch learning. IEEE Transactions on Multimedia 25, 1686–1699 (2022)

work page 2022
[44]

Neurocomputing 561, 126821 (2023)

Li, J., Jie, Z., Wang, X., Zhou, Y., Ma, L., Jiang, J.: Weakly supervised semantic segmentation via self-supervised destruction learning. Neurocomputing 561, 126821 (2023)

work page 2023
[45]

In: Arxiv Article (2023)

Liu, Q., Kaul, C., Wang, J., Anagnostopoulos, C., Murray-Smith, R., Deligianni, F.: Optimizing vision transformers for medical image segmentation. In: Arxiv Article (2023)

work page 2023
[46]

Arxiv preprint (2021)

Lu, Z., Liu, H., et.al: Efficient transformer for single image super-resolution. Arxiv preprint (2021)

work page 2021
[47]

In: Arxiv Article (2022)

Lin, L., Liu, Y., Hu, Y., Yan, X., Xie, K., Huang, H.: Capturing, reconstructing, and simulating: the urbanscene3d dataset. In: Arxiv Article (2022)

work page 2022
[48]

Arxiv article (2020) 14

Lu, D., Lu, X., Sun, Y., Wang, J.: Deep feature-preserving normal estimation for point cloud filtering. Arxiv article (2020) 14

work page 2020
[49]

In: Arxiv Preprint (2022)

Lee, S.H., Oh, G., et.al: Sound-guided semantic video generation. In: Arxiv Preprint (2022)

work page 2022
[50]

In: Proceedings of NeurIPS (2022)

Liu, J., Pan, Z., He, H., Cai, J., Zhuang, B.: Ecoformer: Energy-saving attention with linear complexity. In: Proceedings of NeurIPS (2022)

work page 2022
[51]

In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Lee, S.H., Roh, W., et.al: Sound-guided semantic image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[52]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Seg- mentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

work page 2015
[53]

In: Proceedings of the 29th ACM International Conference on Multimedia, pp

Li, J., Wang, W., Chen, J., Niu, L., Si, J., Qian, C., Zhang, L.: Video semantic segmentation via sparse temporal transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 59–68 (2021)

work page 2021
[54]

Applied Intelligence 53(18), 20753–20765 (2023)

Li, X., Wu, Y., Dai, S.: Semi-supervised medical imaging segmentation with soft pseudo-label fusion. Applied Intelligence 53(18), 20753–20765 (2023)

work page 2023
[55]

In: Arxiv Preprint (2022)

Li, J., Wu, J., et.al: Partglee: A foundation model for recognizing and parsing any objects. In: Arxiv Preprint (2022)

work page 2022
[56]

Li, K., Wang, Y., et.al: Uniformer: Unifying convolution and self-attention for visual recognition (2022)

work page 2022
[57]

Machine Intelligence Research (2023)

Liu, Y., Wu, Y.H., et.al: Vision transformers with hierarchical attention. Machine Intelligence Research (2023)

work page 2023
[58]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024)

Liu, L., Wang, Z., Phan, M.H., Zhang, B., Ge, J., Liu, Y.: Bpkd: Boundary privileged knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024)

work page 2024
[59]

Arxiv article (2022)

Lu, D., Xie, Q., et.al: 3dctn: 3d convolution-transformer network for point cloud classification. Arxiv article (2022)

work page 2022
[60]

Advances in Neural Information Processing Systems 34, 3978–3990 (2021)

Liu, Y., Zhang, Z., Niu, L., Chen, J., Zhang, L.: Mixed supervised object detection by transferring mask prior and semantic similarity. Advances in Neural Information Processing Systems 34, 3978–3990 (2021)

work page 2021
[61]

Arxiv article (2023)

Mukhoti, J., Lin, T.-Y., Poursaeed, O., Wang, R., Shah, A., Torr, P.H.S., Lim, S.-N.: Open vocabulary semantic segmentation with patch aligned contrastive learning. Arxiv article (2023)

work page 2023
[62]

In: Arxiv Article (2021)

Mommert, M., Scheibenreif, L., Hanna, J., Borth, D.: Power plant classification from remote imaging with deep learning. In: Arxiv Article (2021)

work page 2021
[63]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Park, S., Chun, S., Cha, J., Lee, B., Shim, H.: Few-shot font generation with localized 15 style representations and factorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2393–2402 (2021)

work page 2021
[64]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

Park, S., Chun, S., Cha, J., Lee, B., Shim, H.: Multiple heads are better than one: Few-shot font generation with multiple localized experts. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

work page 2021
[65]

IEEE Transactions on Pattern Analysis and Machine Intelligence 46(3), 1479–1495 (2023)

Park, S., Chun, S., Cha, J., Lee, B., Shim, H.: Few-shot font generation with weakly supervised localized representations. IEEE Transactions on Pattern Analysis and Machine Intelligence 46(3), 1479–1495 (2023)

work page 2023
[66]

Arxiv article (2023)

Pang, J., Liu, W., et.al: Mcnet: Magnitude consistency network for domain adaptive object detection under inclement environments. Arxiv article (2023)

work page 2023
[67]

In: Proceedings of the 27th ACM International Conference on Multimedia, pp

Park, K., Woo, S., Kim, D., Cho, D., Kweon, I.S.: Preserving semantic and tempo- ral consistency for unpaired video-to-video translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1248–1257 (2019)

work page 2019
[68]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Park, K., Woo, S., Oh, S.W., Kweon, I.S., Lee, J.Y.: Per-clip video object segmenta- tion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[69]

Advances in Neural Information Processing Systems (NeurIPS) 33, 10869–10880 (2020)

Park, K., Woo, S., Shin, I., Kweon, I.S.: Discover, hallucinate, and adapt: Open compound domain adaptation for semantic segmentation. Advances in Neural Information Processing Systems (NeurIPS) 33, 10869–10880 (2020)

work page 2020
[70]

In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp

Palasek, P., Yang, H., Xu, Z., Hajimirza, N., Izquierdo, E., Patras, I.: A flexible cal- ibration method of multiple kinects for 3d human reconstruction. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–4 (2015)

work page 2015
[71]

In: 2015 IEEE International Conference on Image Processing (ICIP), pp

Peng, Y.T., Zhao, X., Cosman, P.C.: Single underwater image enhancement using depth estimation based on blurriness. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4952–4956 (2015)

work page 2015
[72]

In: Proceedings of the AAAI (2022)

Pan, Z., Zhuang, B., He, H., Liu, J., Cai, J.: Less is more: Pay less attention in vision transformers. In: Proceedings of the AAAI (2022)

work page 2022
[73]

In: Proceedings of the ICCV (2021)

Pan, Z., Zhuang, B., Liu, J., He, H., Cai, J.: Scalable vision transformers with hierarchical pooling. In: Proceedings of the ICCV (2021)

work page 2021
[74]

In: Arxiv Article (2021)

Ranftl, R., Bochkovskiy, A., et.al: Vision transformers for dense prediction. In: Arxiv Article (2021)

work page 2021
[75]

Arxiv article (2023)

Riz, L., Saltori, C., Ricci, E., Poiesi, F.: Novel class discovery for 3d point cloud semantic segmentation. Arxiv article (2023)

work page 2023
[76]

Journal of Chemical Information and Modeling (2021)

Sacha, M., B laz, M., Byrski, P., Dabrowski-Tumanski, P., Chrominski, M., et al.: 16 Molecule edit graph attention network: Modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling (2021)

work page 2021
[77]

In: Arxiv Article (2021)

Strudel, R., Garcia, R., et.al: Segmenter: Transformer for semantic segmentation. In: Arxiv Article (2021)

work page 2021
[78]

In: Arxiv Article (2022)

Scheibenreif, L., Hanna, J., et.al: Self-supervised vision transformers for land-cover segmentation and classification. In: Arxiv Article (2022)

work page 2022
[79]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2024)

Sacha, M., Jura, B., Rymarczyk, D., Struski, L., Tabor, J., Zielinski, B.: Inter- pretability benchmark for evaluating spatial misalignment of prototypical parts explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence (2024)

work page 2024
[80]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

Shin, I., Kim, D.J., Cho, J.W., Woo, S., Park, K., Kweon, I.S.: Labor: Labeling only if required for domain adaptive semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

work page 2021

Showing first 80 references.