FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation

Hirokatsu Kataoka; Hirokazu Nosato; Kazuya Nishimura; Tetsuya Ogata; Tsukasa Fukusato; Yukinori Yamamoto

arxiv: 2603.23199 · v2 · submitted 2026-03-24 · 💻 cs.CV

FDIF: Formula-Driven supervised Learning with Implicit Functions for 3D Medical Image Segmentation

Yukinori Yamamoto , Kazuya Nishimura , Tsukasa Fukusato , Hirokazu Nosato , Tetsuya Ogata , Hirokatsu Kataoka This is my paper

Pith reviewed 2026-05-15 00:28 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D medical image segmentationimplicit functionssigned distance functionsformula-driven learningsynthetic data generationself-supervised pre-trainingdata-free representation learning

0 comments

The pith

Implicit functions based on signed distance functions let formula-driven synthesis match real-data pre-training for 3D medical image segmentation without any real scans or labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FDIF, a framework that generates all training data and labels for 3D medical segmentation directly from mathematical formulas using implicit functions. These functions model shapes through signed distance representations, which compactly capture complex geometries and allow separate control over surface geometry and intensity textures. Experiments across the AMOS, ACDC, and KiTS benchmarks with three different network architectures show consistent gains over prior voxel-based formula-driven methods and performance levels comparable to self-supervised pre-training on large collections of real patient data. The same pre-training also improves results on 3D classification tasks. This establishes implicit-function formula supervision as a route to representation learning that avoids privacy issues and annotation costs entirely.

Core claim

FDIF introduces an implicit-function representation based on signed distance functions (SDFs), enabling compact modeling of complex geometries while exploiting the surface representation of SDFs to support controllable synthesis of both geometric and intensity textures. Across three medical image segmentation benchmarks (AMOS, ACDC, and KiTS) and three architectures (SwinUNETR, nnUNet ResEnc-L, and nnUNet Primus-M), FDIF consistently improves over a formula-driven method and achieves performance comparable to self-supervised approaches pre-trained on large-scale real datasets. The same pre-training also benefits 3D classification tasks.

What carries the argument

Signed distance function (SDF)-based implicit representation that models 3D shapes compactly and separates control of geometry from intensity texture synthesis.

If this is right

FDIF raises segmentation accuracy on AMOS, ACDC, and KiTS over earlier voxel-based formula-driven approaches.
Performance reaches levels comparable to self-supervised pre-training that used large real patient datasets.
The same pre-training improves accuracy on 3D classification tasks as well.
Pre-training requires no real medical volumes or expert annotations at any stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The controllable texture synthesis could be used to create targeted training distributions for rare anatomical variants or specific scanner artifacts.
Because the method is entirely formula-driven, it could be extended to other 3D volumetric tasks such as detection or registration where labeled real data is scarce.
Combining FDIF pre-training with a small amount of real-data fine-tuning might close any remaining gap to fully supervised models while still minimizing annotation effort.

Load-bearing premise

Synthetic volumes generated from the SDF implicit functions must have a distribution close enough to real medical scans that models trained on them transfer effectively to real-data segmentation and classification tasks.

What would settle it

If a network pre-trained solely on FDIF synthetic data shows no accuracy gain over random initialization or falls well below self-supervised real-data baselines on the AMOS, ACDC, or KiTS test sets, the transfer-learning claim would be falsified.

read the original abstract

Deep learning-based 3D medical image segmentation methods relies on large-scale labeled datasets, yet acquiring such data is difficult due to privacy constraints and the high cost of expert annotation. Formula-Driven Supervised Learning (FDSL) offers an appealing alternative by generating training data and labels directly from mathematical formulas. However, existing voxel-based approaches are limited in geometric expressiveness and cannot synthesize realistic textures. We introduce Formula-Driven supervised learning with Implicit Functions (FDIF), a framework that enables scalable pre-training without using any real data and medical expert annotations. FDIF introduces an implicit-function representation based on signed distance functions (SDFs), enabling compact modeling of complex geometries while exploiting the surface representation of SDFs to support controllable synthesis of both geometric and intensity textures. Across three medical image segmentation benchmarks (AMOS, ACDC, and KiTS) and three architectures (SwinUNETR, nnUNet ResEnc-L, and nnUNet Primus-M), FDIF consistently improves over a formula-driven method, and achieves performance comparable to self-supervised approaches pre-trained on large-scale real datasets. We further show that FDIF pre-training also benefits 3D classification tasks, highlighting implicit-function-based formula supervision as a promising paradigm for data-free representation learning. Code is available at https://github.com/yamanoko/FDIF.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FDIF shows implicit SDFs can generate synthetic 3D medical volumes for pre-training that reach parity with large-scale real-data self-supervision on AMOS, ACDC, and KiTS.

read the letter

The main point is that FDIF moves formula-driven supervised learning from voxel grids to implicit signed distance functions. This change supports better control over both geometry and intensity textures when creating synthetic training data for 3D medical segmentation, all without touching any real scans or labels. The experiments report gains over earlier FDSL baselines and results that sit close to self-supervised pre-training on real data across SwinUNETR, nnUNet ResEnc-L, and nnUNet Primus-M on the three standard benchmarks. Code release helps with checking the details. The shift to SDF surfaces is the actual technical step forward; it gives a compact way to model complex shapes and modulate textures that voxel methods lacked. That part is cleanly motivated and fits the data-free goal in medical imaging. The evaluation covers multiple architectures and tasks, including a side note on classification, which adds some breadth. One soft spot is the missing direct check on how well the synthetic intensity distributions match real CT and MRI statistics. Medical volumes carry scanner-specific noise, partial-volume effects, and histogram quirks that pure formula modulation may not reproduce exactly. If the paper lacks histogram comparisons or feature-space distances between synthetic and real volumes, the transfer story rests more on the benchmark numbers than on proven distributional closeness. The central claim still looks testable rather than circular. This paper is aimed at groups working on pre-training for annotation-scarce medical segmentation or on synthetic data pipelines in general. Readers who already follow FDSL or implicit representations will get the most from the method section and the controlled synthesis details. It deserves a serious referee because the idea is distinct from prior work and the evaluation setup is concrete enough to review on its own terms.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces FDIF, a formula-driven supervised learning framework that employs signed distance functions (SDFs) as implicit representations to generate synthetic 3D medical volumes with controllable geometry and intensity textures. This enables scalable pre-training of segmentation models without any real data or expert annotations. The central claim is that FDIF yields consistent gains over prior formula-driven baselines and achieves performance comparable to self-supervised pre-training on large real datasets, demonstrated across the AMOS, ACDC, and KiTS benchmarks using SwinUNETR, nnUNet ResEnc-L, and nnUNet Primus-M architectures; additional benefits for 3D classification are reported.

Significance. If the distributional similarity between SDF-generated volumes and real CT/MRI holds, the work would offer a genuinely data-free pre-training paradigm that reduces dependence on scarce annotated medical data while maintaining competitive downstream performance. The availability of code at https://github.com/yamanoko/FDIF and the multi-architecture, multi-benchmark evaluation are positive features that support reproducibility and generality.

major comments (1)

[Abstract and Experiments] The headline claim that FDIF pre-training matches large-scale real-data self-supervision rests on the unverified assumption that SDF-based intensity texture synthesis produces joint distributions sufficiently close to real medical volumes. No FID, MMD, histogram-matching, or other quantitative domain-gap statistics are provided to substantiate this; without them the observed gains could arise from regularization effects rather than representation transfer.

minor comments (2)

[Abstract] The abstract states 'consistent improvements' and 'comparable performance' without any numerical values, confidence intervals, or ablation summaries; adding a compact results table excerpt would improve readability.
[Method] Notation for the implicit function and intensity modulation parameters should be introduced with explicit definitions in the methods section to avoid ambiguity when describing controllable texture synthesis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The single major comment raises a valid point about the lack of direct distributional similarity metrics. We address it point-by-point below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract and Experiments] The headline claim that FDIF pre-training matches large-scale real-data self-supervision rests on the unverified assumption that SDF-based intensity texture synthesis produces joint distributions sufficiently close to real medical volumes. No FID, MMD, histogram-matching, or other quantitative domain-gap statistics are provided to substantiate this; without them the observed gains could arise from regularization effects rather than representation transfer.

Authors: We agree that the manuscript does not report direct quantitative measures of distributional similarity (FID, MMD, or histogram statistics) between FDIF-generated volumes and real CT/MRI data. The current evidence for effective representation transfer rests on consistent downstream gains across three benchmarks (AMOS, ACDC, KiTS) and three architectures (SwinUNETR, nnUNet ResEnc-L, nnUNet Primus-M), where FDIF matches or approaches self-supervised pre-training on real data while outperforming prior formula-driven baselines. While these results provide indirect support, we acknowledge that explicit domain-gap metrics would more rigorously rule out pure regularization effects. In the revised manuscript we will add a new subsection under Experiments that computes FID scores (using a 3D feature extractor) and intensity histogram comparisons between FDIF samples and the real training volumes of each benchmark. We will also report these metrics for the prior formula-driven baseline to enable direct comparison. This addition will be placed before the downstream segmentation results so readers can assess the domain gap independently of task performance. revision: yes

Circularity Check

0 steps flagged

No circularity: FDIF derivation is independent and self-contained

full rationale

The paper defines FDIF via signed-distance implicit functions that generate both geometry and intensity textures from explicit mathematical formulas, with no real data or target-benchmark statistics used in the synthesis process. Training labels are produced directly from the same SDF surfaces, and evaluation occurs on external public benchmarks (AMOS, ACDC, KiTS) using unmodified downstream architectures. No equation reduces a claimed prediction to a fitted parameter, no self-citation supplies a uniqueness theorem that forces the method, and no ansatz is smuggled in; the distributional-similarity assumption is an empirical claim, not a definitional identity. The derivation chain therefore remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that SDF implicit functions can generate realistic-enough synthetic medical images for transfer to real benchmarks; no free parameters or invented physical entities are described in the abstract.

axioms (1)

domain assumption Signed distance functions can compactly represent complex 3D geometries and support controllable synthesis of geometric and intensity textures
This is the core modeling choice enabling the FDIF framework as stated in the abstract

pith-pipeline@v0.9.0 · 5562 in / 1450 out tokens · 52120 ms · 2026-05-15T00:28:44.322112+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FDIF introduces an implicit-function representation based on signed distance functions (SDFs), enabling compact modeling of complex geometries while exploiting the surface representation of SDFs to support controllable synthesis of both geometric and intensity textures.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct a library of SDFs Φ={ϕc}C c=1 … The pool contains C=109 classes … displacement-function library Fd … mapper-function library Fm … I(x)=gm(ϕc(x)+Δj(x))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.