arxiv: 2508.06805 · v2 · submitted 2025-08-09 · 💻 cs.CV

Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling

Aarav Mehta , Priya Deshmukh , Vikram Singh , Siddharth Malhotra , Krishnan Menon Iyer , Tanvi Iyer This is my paper

Pith reviewed 2026-05-19 00:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords edge detectionorgan boundariesmedical imagingtop-down refinementsubpixel upsamplingCTMRIsegmentation

0 comments p. Extension

The pith

A top-down backward refinement architecture with subpixel upsampling produces millimeter-accurate organ boundaries in CT and MRI scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that standard convolutional networks leave organ edges too blurry for medical use and that a dedicated top-down refinement pathway can fix this by repeatedly fusing deep semantic features with fine local detail. A reader would care because millimeter-level boundary precision directly affects segmentation accuracy, registration quality, and the ability to outline lesions sitting right against organ walls. The method works by upsampling high-level maps in a backward pass and merging them with low-level cues, with a light 3D aggregation step added for volumetric data to keep computation reasonable. When these crisp edges are fed into existing medical pipelines they raise Dice scores, cut boundary errors, and improve lesion visibility near interfaces.

Core claim

The central claim is that adapting a top-down backward refinement architecture to medical images, by progressively upsampling high-level semantic features and fusing them with fine-grained low-level cues through a dedicated pathway, produces high-resolution crisp organ boundaries in 2D slices and anisotropic volumes, outperforming baseline ConvNet detectors and other medical edge methods on strict boundary F-measure and Hausdorff distance while also lifting performance in downstream segmentation, registration, and lesion delineation tasks.

What carries the argument

The top-down backward refinement pathway that progressively upsamples and fuses high-level semantic features with low-level cues, extended by light 3D context aggregation for volumes.

If this is right

Substantially higher boundary F-measure and lower Hausdorff distance on several CT and MRI organ datasets.
Consistent gains in organ segmentation, shown by higher Dice scores and reduced boundary errors.
More accurate image registration when crisp edges are supplied.
Better delineation of lesions located near organ interfaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same refinement idea could be tested on other boundary-critical medical tasks such as vessel or tumor margin detection without changing the core fusion logic.
Because the method already mixes 2D slice processing with minimal 3D context, it may scale to full 3D networks if memory allows while preserving the reported efficiency.
Feeding these edges into interactive annotation tools might reduce the number of manual corrections needed at organ borders.

Load-bearing premise

That fusing high-level semantic features with low-level cues through backward refinement will reliably deliver millimeter-level boundary accuracy on medical images without introducing artifacts or needing extensive per-dataset tuning.

What would settle it

Apply the method to a new multi-center CT or MRI dataset with unseen scanner protocols and noise levels; if boundary F-measure and Hausdorff distance do not improve over the same baselines, the central claim does not hold.

read the original abstract

Accurate localization of organ boundaries is critical in medical imaging for segmentation, registration, surgical planning, and radiotherapy. While deep convolutional networks (ConvNets) have advanced general-purpose edge detection to near-human performance on natural images, their outputs often lack precise localization, a limitation that is particularly harmful in medical applications where millimeter-level accuracy is required. Building on a systematic analysis of ConvNet edge outputs, we propose a medically focused crisp edge detector that adapts a novel top-down backward refinement architecture to medical images (2D and volumetric). Our method progressively upsamples and fuses high-level semantic features with fine-grained low-level cues through a backward refinement pathway, producing high-resolution, well-localized organ boundaries. We further extend the design to handle anisotropic volumes by combining 2D slice-wise refinement with light 3D context aggregation to retain computational efficiency. Evaluations on several CT and MRI organ datasets demonstrate substantially improved boundary localization under strict criteria (boundary F-measure, Hausdorff distance) compared to baseline ConvNet detectors and contemporary medical edge/contour methods. Importantly, integrating our crisp edge maps into downstream pipelines yields consistent gains in organ segmentation (higher Dice scores, lower boundary errors), more accurate image registration, and improved delineation of lesions near organ interfaces. The proposed approach produces clinically valuable, crisp organ edges that materially enhance common medical-imaging tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adapts top-down refinement plus subpixel upsampling to medical organ boundaries but rests on thin abstract-level claims.

read the letter

Hi colleague, the core of this paper is a domain-specific adaptation: they take the top-down backward refinement idea from general edge detection, add subpixel upsampling for finer localization, and handle anisotropic CT/MRI volumes with mostly 2D slice processing plus light 3D context. That combination targets the real pain point of millimeter-level boundary accuracy for segmentation, registration, and radiotherapy planning. The description of progressively fusing high-level semantics with low-level cues through the refinement pathway is clear and sensible on paper. They also note the efficiency trade-off, which shows some practical thinking. What the work does reasonably well is frame why off-the-shelf ConvNet edges fall short in medical settings and sketch a pathway that could feed better into downstream tasks like Dice score gains or lesion delineation near interfaces. The anisotropic extension avoids full 3D cost, which is a fair engineering choice. The soft spots are mostly around evidence. The abstract asserts better boundary F-measure, Hausdorff distance, and downstream improvements over baselines and other medical contour methods, yet supplies no tables, dataset sizes, error bars, or ablation breakdowns. Without those, it's hard to tell if the gains are substantial, if baselines were competitive, or if results hold across organs and scanners. The assumption that the refinement fusion will deliver reliable accuracy without heavy per-dataset tuning looks optimistic given typical medical image problems like low contrast and partial-volume effects. If the full manuscript has solid quantitative sections and cross-validation, that would change the picture. This is for readers already working on medical segmentation or boundary-aware pipelines who might want to test the refinement block. It is not a foundational advance but could be a useful incremental tool if the numbers check out. I would send it to peer review so the experiments get proper scrutiny rather than desk-rejecting on the abstract alone.

Referee Report

2 major / 2 minor

Summary. The paper proposes a top-down backward refinement architecture with subpixel upsampling for crisp organ boundary detection in 2D and volumetric medical CT/MRI images. It progressively fuses high-level semantic features with low-level cues via a backward pathway, extends the design to anisotropic volumes using slice-wise 2D refinement plus light 3D aggregation, and claims superior boundary localization (F-measure, Hausdorff distance) over ConvNet baselines and medical edge methods, plus gains when the edges are fed into downstream segmentation, registration, and lesion delineation pipelines.

Significance. If the empirical improvements hold under rigorous evaluation, the method could offer a practical advance for millimeter-level boundary accuracy in clinical workflows where precise organ interfaces matter for segmentation, registration, and radiotherapy. The efficiency-focused 3D extension and emphasis on medical-specific challenges (anisotropy, low contrast) are positive aspects.

major comments (2)

[§4] §4 (Experiments) and associated tables: the abstract and §1 assert substantially improved boundary F-measure and Hausdorff distance plus downstream Dice gains, yet no numerical tables, dataset sizes, error bars, cross-validation details, or ablation results are provided. This directly undermines verification of the central empirical claim.
[§3] §3 (Method, backward refinement pathway): the description of progressive upsampling and high-to-low feature fusion does not include analysis or controls for artifact introduction in low-contrast or partial-volume regions typical of CT/MRI, nor evidence that millimeter accuracy is achieved without per-dataset tuning. This is load-bearing for the generalization claim.

minor comments (2)

[Abstract] Abstract: specify the exact CT and MRI organ datasets used and their key characteristics (resolution, anisotropy, number of cases).
[§4] Figure captions and §4: ensure all boundary metric plots include baseline comparisons with the same strict criteria (e.g., tolerance thresholds for F-measure).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity and rigor, particularly around experimental reporting and methodological robustness. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our results and analysis.

read point-by-point responses

Referee: [§4] §4 (Experiments) and associated tables: the abstract and §1 assert substantially improved boundary F-measure and Hausdorff distance plus downstream Dice gains, yet no numerical tables, dataset sizes, error bars, cross-validation details, or ablation results are provided. This directly undermines verification of the central empirical claim.

Authors: We agree that the experimental details must be presented more explicitly to enable full verification of the claims. The complete manuscript includes results on multiple CT and MRI organ datasets with boundary F-measure, Hausdorff distance, and downstream segmentation/registration metrics, but we acknowledge these may not have been sufficiently highlighted or tabulated in the reviewed version. In the revision, we will expand §4 with comprehensive tables reporting all quantitative results, dataset sizes and compositions, standard deviations from cross-validation, and ablation studies on the top-down refinement and subpixel upsampling components. We will also add explicit cross-references from the abstract and §1 to these tables. revision: yes
Referee: [§3] §3 (Method, backward refinement pathway): the description of progressive upsampling and high-to-low feature fusion does not include analysis or controls for artifact introduction in low-contrast or partial-volume regions typical of CT/MRI, nor evidence that millimeter accuracy is achieved without per-dataset tuning. This is load-bearing for the generalization claim.

Authors: We recognize the importance of addressing potential artifacts and generalization explicitly for medical images. While the method is designed to mitigate issues in low-contrast areas through progressive high-to-low fusion and subpixel upsampling, we will revise §3 to include a new analysis subsection. This will provide qualitative and quantitative controls (e.g., edge maps and error metrics in partial-volume regions), discuss design elements that reduce artifact risk without per-dataset hyperparameter tuning, and reference cross-dataset results demonstrating consistent millimeter-level boundary accuracy. These additions will better support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of refinement architecture

full rationale

The paper proposes a top-down backward refinement pathway with progressive upsampling and feature fusion for organ boundary edge detection in CT/MRI, extended to anisotropic volumes. Central claims rest on empirical evaluations using boundary F-measure, Hausdorff distance, and downstream gains in segmentation/registration on multiple datasets. No equations, fitted parameters renamed as predictions, or self-citation chains reduce any result to its inputs by construction. The method adapts ConvNet ideas with novel fusion but is self-contained against external benchmarks via reported metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the paper introduces no explicit free parameters, axioms, or invented entities beyond standard deep network components. The central claim depends on the unstated assumption that the proposed fusion mechanism generalizes across CT and MRI datasets without domain-specific retraining.

pith-pipeline@v0.9.0 · 5790 in / 1245 out tokens · 37139 ms · 2026-05-19T00:42:55.586236+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

163 extracted references · 163 canonical work pages · 2 internal anchors

[1]

Arxiv article (2023)

Ando, A., Gidaris, S., Bursuc, A., Puy, G., Boulch, A., Marlet, R.: Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. Arxiv article (2023)

work page 2023
[2]

Bazi, Y., Bashmal, L., et.al: Vision transformers for remote sensing image classification (2021)

work page 2021
[3]

Arxiv article (2024)

Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuili` ere, S.: Collaborating foundation models for domain generalized semantic segmentation. Arxiv article (2024)

work page 2024
[4]

In: European Conference on Computer Vision (ECCV) (2020)

Cha, J., Chun, S., Lee, G., Lee, B., Kim, S., Lee, H.: Few-shot compositional font gen- eration with dual memory. In: European Conference on Computer Vision (ECCV) (2020)

work page 2020
[5]

In: Arxiv Article (2021)

Choromanski, K., Likhosherstov, V., et.al: Rethinking attention with performers. In: Arxiv Article (2021)

work page 2021
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

Cha, J., Mun, J., Roh, B.: Learning to generate text-grounded mask for open- world semantic segmentation from only image-text pairs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

work page 2023
[7]

Arxiv article (2023)

Cha, J., Mun, J., Roh, B.: Learning to generate text-grounded mask for open-world semantic segmentation from only image-text pairs. Arxiv article (2023)

work page 2023
[8]

Advances in Neural Information Processing Systems 34, 7306–7318 (2021)

Chen, J., Niu, L., Liu, L., Zhang, L.: Weak-shot fine-grained classification via similarity transfer. Advances in Neural Information Processing Systems 34, 7306–7318 (2021)

work page 2021
[9]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Chen, J., Niu, L., Zhang, J., Si, J., Qian, C., Zhang, L.: Amodal instance segmentation via prior-guided expansion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 313–321 (2023)

work page 2023
[10]

Advances in Neural Information Processing Systems 11 35, 32525–32536 (2023)

Chen, J., Niu, L., Zhou, S., Si, J., Qian, C., Zhang, L.: Weak-shot semantic segmenta- tion via dual similarity transfer. Advances in Neural Information Processing Systems 11 35, 32525–32536 (2023)

work page 2023
[11]

arXiv preprint (2022) arXiv:2203.11068

Cun, X., Wang, Z., et.al: Learning enriched illuminants for cross and single sensor color constancy. Arxiv preprint (2022) Arxiv:2203.11068

work page arXiv 2022
[12]

Ding, L., Lin, D., et.al: Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images (2022)

work page 2022
[13]

Arxiv article (2025)

Du, J., Liu, Y., et.al: Dependeval: Benchmarking llms for repository dependency understanding. Arxiv article (2025)

work page 2025
[14]

Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

Fan, D.P., Zhang, S.C., et.al: Face sketch synthesis style similarity: A new structure co-occurrence texture measure. Arxiv preprint (2018) Arxiv:1804.02975

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

In: Arxiv Article (2022)

Guan, T., Wang, J., et.al: M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers. In: Arxiv Article (2022)

work page 2022
[16]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2022)

Ghorbanzadeh, O., Xu, Y., Zhao, H., Wang, J., Zhong, Y., Zhao, D., Zang, Q., et al.: The outcome of the 2022 landslide4sense competition: Advanced landslide detection from multisource satellite imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2022)

work page 2022
[17]

Huang, Z., Ben, Y., et.al: Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer (2021)

work page 2021
[18]

Arxiv article (2023)

He, H., Cai, J., Pan, Z., Liu, J., Zhang, J., Tao, D., Zhuang, B.: Dynamic focus-aware positional queries for semantic segmentation. Arxiv article (2023)

work page 2023
[19]

In: Proceedings of the ICCV, pp

He, H., Cai, J., Zhang, J., Tao, D., Zhuang, B.: Sensitivity-aware visual parameter- efficient fine-tuning. In: Proceedings of the ICCV, pp. 11825–11835 (2023)

work page 2023
[20]

IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022)

He, P., Jiao, L., Shang, R., Wang, S., Liu, X., Quan, D., Yang, K., Zhao, D.: Manet: Multi-scale aware-relation network for semantic segmentation in aerial scenes. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022)

work page 2022
[21]

TPAMI (2024)

He, H., Liu, J., Pan, Z., Cai, J., Zhang, J., Tao, D., Zhuang, B.: Pruning self-attentions into convolutional layers in single path. TPAMI (2024)

work page 2024
[22]

Computer Vision and Image Understanding 224, 103556 (2022)

Huang, X., Wang, Y., Li, S., Mei, G., Xu, Z., Wang, Y., Zhang, J., Bennamoun, M.: Robust real-world point cloud registration by inlier detection. Computer Vision and Image Understanding 224, 103556 (2022)

work page 2022
[23]

In: Proceedings of the AAAI (2020)

He, H., Zhang, J., Zhang, Q., Tao, D.: Grapy-ml: Graph pyramid mutual learning for cross-dataset human parsing. In: Proceedings of the AAAI (2020)

work page 2020
[24]

Arxiv article (2021) 12

Jia, Y., Kaul, C., Lawton, T., Murray-Smith, R., Habli, I.: Prediction of weaning from mechanical ventilation using convolutional neural networks. Arxiv article (2021) 12

work page 2021
[25]

IEEE Transactions on Image Processing 30, 832–844 (2021)

Jiang, P.T., Zhang, C.B., Hou, Q., Cheng, M.M., Wei, Y.: Layercam: Exploring hier- archical class activation maps. IEEE Transactions on Image Processing 30, 832–844 (2021)

work page 2021
[26]

Arxiv article (2024)

Kim, C., Han, W., et.al: Eagle: Eigen aggregation learning for object-centric unsuper- vised semantic segmentation. Arxiv article (2024)

work page 2024
[27]

Arxiv preprint (2025)

Kim, D., Ko, H., et.al: Fourier decomposition for explicit representation of 3d point cloud attributes. Arxiv preprint (2025)

work page 2025
[28]

Arxiv article (2021)

Kaul, C., Mitton, J., et.al: Cpt: Convolutional point transformer for 3d point cloud processing. Arxiv article (2021)

work page 2021
[29]

In: Arxiv Article (2019)

Kaul, C., Manandhar, S., Pears, N.: Focusnet: An attention-based fully convolutional network for medical image segmentation. In: Arxiv Article (2019)

work page 2019
[30]

Arxiv article (2019)

Kaul, C., Pears, N., Manandhar, S.: Sawnet: A spatially aware deep neural network for 3d point cloud processing. Arxiv article (2019)

work page 2019
[31]

In: Arxiv Article (2021)

Kaul, C., Pears, N., Manandhar, S.: Fatnet: A feature-attentive network for 3d point cloud processing. In: Arxiv Article (2021)

work page 2021
[32]

Advances in Neural Information Processing Systems 35, 30499–30511 (2022)

Kweon, H., Yoon, K.J.: Joint learning of 2d-3d weakly supervised semantic seg- mentation. Advances in Neural Information Processing Systems 35, 30499–30511 (2022)

work page 2022
[33]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Kweon, H., Yoon, K.J.: From sam to cams: Exploring segment anything model for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024
[34]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

Kweon, H., Yoon, S.H., Kim, H., Park, D., Yoon, K.J.: Unlocking the potential of ordi- nary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

work page 2021
[35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

Kweon, H., Yoon, S.H., Yoon, K.J.: Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

work page 2021
[36]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

Kweon, H., Yoon, S.H., Yoon, K.J.: Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

work page 2023
[37]

Arxiv article (2018)

Lu, Z., He, Q., et.al: Defect detection of pcb based on bayes feature fusion. Arxiv article (2018)

work page 2018
[38]

Arxiv article (2023)

Liu, X., Han, Z., Lee, S., Cao, Y.-P., Liu, Y.-S.: D-net: Learning for distinctive point 13 clouds by self-attentive point searching and learnable feature fusion. Arxiv article (2023)

work page 2023
[39]

In: Arxiv Article (2019)

Liu, X., Han, Z., Lee, S., Cao, Y.-P.: Point2sequence: Learning the shape representa- tion of 3d point clouds with an attention-based sequence to sequence network. In: Arxiv Article (2019)

work page 2019
[40]

Arxiv article (2021)

Liu, X., Han, Z., Liu, Y.-S., Zwicker, M.: Fine-grained 3d shape classification with hierarchical part-view attention. Arxiv article (2021)

work page 2021
[41]

In: Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS) (2022)

Li, J., Jie, Z., Wang, X., Wei, X., Ma, L.: Expansion and shrinkage of localization for weakly-supervised semantic segmentation. In: Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS) (2022)

work page 2022
[42]

IEEE Transactions on Multimedia 25, 1686–1699 (2022)

Li, J., Jie, Z., Wang, X., Zhou, Y., Wei, X., Ma, L.: Weakly supervised semantic segmentation via progressive patch learning. IEEE Transactions on Multimedia 25, 1686–1699 (2022)

work page 2022
[43]

Neurocomputing 561, 126821 (2023)

Li, J., Jie, Z., Wang, X., Zhou, Y., Ma, L., Jiang, J.: Weakly supervised semantic segmentation via self-supervised destruction learning. Neurocomputing 561, 126821 (2023)

work page 2023
[44]

In: Arxiv Article (2023)

Liu, Q., Kaul, C., Wang, J., Anagnostopoulos, C., Murray-Smith, R., Deligianni, F.: Optimizing vision transformers for medical image segmentation. In: Arxiv Article (2023)

work page 2023
[45]

Arxiv preprint (2021)

Lu, Z., Liu, H., et.al: Efficient transformer for single image super-resolution. Arxiv preprint (2021)

work page 2021
[46]

In: Arxiv Article (2022)

Lin, L., Liu, Y., Hu, Y., Yan, X., Xie, K., Huang, H.: Capturing, reconstructing, and simulating: the urbanscene3d dataset. In: Arxiv Article (2022)

work page 2022
[47]

Arxiv article (2020)

Lu, D., Lu, X., Sun, Y., Wang, J.: Deep feature-preserving normal estimation for point cloud filtering. Arxiv article (2020)

work page 2020
[48]

In: Arxiv Preprint (2022)

Lee, S.H., Oh, G., et.al: Sound-guided semantic video generation. In: Arxiv Preprint (2022)

work page 2022
[49]

In: Proceedings of NeurIPS (2022)

Liu, J., Pan, Z., He, H., Cai, J., Zhuang, B.: Ecoformer: Energy-saving attention with linear complexity. In: Proceedings of NeurIPS (2022)

work page 2022
[50]

In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Lee, S.H., Roh, W., et.al: Sound-guided semantic image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[51]

In: Proceedings of the 29th ACM International Conference on Multimedia, pp

Li, J., Wang, W., Chen, J., Niu, L., Si, J., Qian, C., Zhang, L.: Video semantic segmentation via sparse temporal transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 59–68 (2021) 14

work page 2021
[52]

Applied Intelligence 53(18), 20753–20765 (2023)

Li, X., Wu, Y., Dai, S.: Semi-supervised medical imaging segmentation with soft pseudo-label fusion. Applied Intelligence 53(18), 20753–20765 (2023)

work page 2023
[53]

In: Arxiv Preprint (2022)

Li, J., Wu, J., et.al: Partglee: A foundation model for recognizing and parsing any objects. In: Arxiv Preprint (2022)

work page 2022
[54]

Li, K., Wang, Y., et.al: Uniformer: Unifying convolution and self-attention for visual recognition (2022)

work page 2022
[55]

Machine Intelligence Research (2023)

Liu, Y., Wu, Y.H., et.al: Vision transformers with hierarchical attention. Machine Intelligence Research (2023)

work page 2023
[56]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024)

Liu, L., Wang, Z., Phan, M.H., Zhang, B., Ge, J., Liu, Y.: Bpkd: Boundary privileged knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024)

work page 2024
[57]

Arxiv article (2022)

Lu, D., Xie, Q., et.al: 3dctn: 3d convolution-transformer network for point cloud classification. Arxiv article (2022)

work page 2022
[58]

Advances in Neural Information Processing Systems 34, 3978–3990 (2021)

Liu, Y., Zhang, Z., Niu, L., Chen, J., Zhang, L.: Mixed supervised object detection by transferring mask prior and semantic similarity. Advances in Neural Information Processing Systems 34, 3978–3990 (2021)

work page 2021
[59]

Arxiv article (2023)

Mukhoti, J., Lin, T.-Y., Poursaeed, O., Wang, R., Shah, A., Torr, P.H.S., Lim, S.-N.: Open vocabulary semantic segmentation with patch aligned contrastive learning. Arxiv article (2023)

work page 2023
[60]

In: Arxiv Article (2021)

Mommert, M., Scheibenreif, L., Hanna, J., Borth, D.: Power plant classification from remote imaging with deep learning. In: Arxiv Article (2021)

work page 2021
[61]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Park, S., Chun, S., Cha, J., Lee, B., Shim, H.: Few-shot font generation with localized style representations and factorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2393–2402 (2021)

work page 2021
[62]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

Park, S., Chun, S., Cha, J., Lee, B., Shim, H.: Multiple heads are better than one: Few-shot font generation with multiple localized experts. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

work page 2021
[63]

IEEE Transactions on Pattern Analysis and Machine Intelligence 46(3), 1479–1495 (2023)

Park, S., Chun, S., Cha, J., Lee, B., Shim, H.: Few-shot font generation with weakly supervised localized representations. IEEE Transactions on Pattern Analysis and Machine Intelligence 46(3), 1479–1495 (2023)

work page 2023
[64]

Arxiv article (2023)

Pang, J., Liu, W., et.al: Mcnet: Magnitude consistency network for domain adaptive object detection under inclement environments. Arxiv article (2023)

work page 2023
[65]

In: Proceedings of the 27th ACM International Conference on Multimedia, pp

Park, K., Woo, S., Kim, D., Cho, D., Kweon, I.S.: Preserving semantic and tempo- ral consistency for unpaired video-to-video translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1248–1257 (2019) 15

work page 2019
[66]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Park, K., Woo, S., Oh, S.W., Kweon, I.S., Lee, J.Y.: Per-clip video object segmenta- tion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

work page 2022
[67]

Advances in Neural Information Processing Systems (NeurIPS) 33, 10869–10880 (2020)

Park, K., Woo, S., Shin, I., Kweon, I.S.: Discover, hallucinate, and adapt: Open compound domain adaptation for semantic segmentation. Advances in Neural Information Processing Systems (NeurIPS) 33, 10869–10880 (2020)

work page 2020
[68]

In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp

Palasek, P., Yang, H., Xu, Z., Hajimirza, N., Izquierdo, E., Patras, I.: A flexible cal- ibration method of multiple kinects for 3d human reconstruction. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–4 (2015)

work page 2015
[69]

In: 2015 IEEE International Conference on Image Processing (ICIP), pp

Peng, Y.T., Zhao, X., Cosman, P.C.: Single underwater image enhancement using depth estimation based on blurriness. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4952–4956 (2015)

work page 2015
[70]

In: Proceedings of the AAAI (2022)

Pan, Z., Zhuang, B., He, H., Liu, J., Cai, J.: Less is more: Pay less attention in vision transformers. In: Proceedings of the AAAI (2022)

work page 2022
[71]

In: Proceedings of the ICCV (2021)

Pan, Z., Zhuang, B., Liu, J., He, H., Cai, J.: Scalable vision transformers with hierarchical pooling. In: Proceedings of the ICCV (2021)

work page 2021
[72]

In: Arxiv Article (2021)

Ranftl, R., Bochkovskiy, A., et.al: Vision transformers for dense prediction. In: Arxiv Article (2021)

work page 2021
[73]

Arxiv article (2023)

Riz, L., Saltori, C., Ricci, E., Poiesi, F.: Novel class discovery for 3d point cloud semantic segmentation. Arxiv article (2023)

work page 2023
[74]

Journal of Chemical Information and Modeling (2021)

Sacha, M., B laz, M., Byrski, P., Dabrowski-Tumanski, P., Chrominski, M., et al.: Molecule edit graph attention network: Modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling (2021)

work page 2021
[75]

In: Arxiv Article (2021)

Strudel, R., Garcia, R., et.al: Segmenter: Transformer for semantic segmentation. In: Arxiv Article (2021)

work page 2021
[76]

In: Arxiv Article (2022)

Scheibenreif, L., Hanna, J., et.al: Self-supervised vision transformers for land-cover segmentation and classification. In: Arxiv Article (2022)

work page 2022
[77]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2024)

Sacha, M., Jura, B., Rymarczyk, D., Struski, L., Tabor, J., Zielinski, B.: Inter- pretability benchmark for evaluating spatial misalignment of prototypical parts explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence (2024)

work page 2024
[78]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021) 16

Shin, I., Kim, D.J., Cho, J.W., Woo, S., Park, K., Kweon, I.S.: Labor: Labeling only if required for domain adaptive semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021) 16

work page 2021
[79]

Arxiv article (2022)

Scheibenreif, L., Mommert, M., Borth, D.: Contrastive self-supervised data fusion for satellite imagery. Arxiv article (2022)

work page 2022
[80]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2023)

Sacha, M., Rymarczyk, D., Struski, L., Tabor, J., Zielinski, B.: Protoseg: Interpretable semantic segmentation with prototypical parts. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2023)

work page 2023

Showing first 80 references.