pith. machine review for the scientific record. sign in

arxiv: 2604.08610 · v1 · submitted 2026-04-08 · 💻 cs.CV

Recognition: unknown

A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionmedieval manuscriptsimage-to-3D conversionsemi-automated frameworkHi3DGenXR visualizationtactile printingmanuscript miniatures
0
0 comments X

The pith

A semi-automated pipeline converts medieval manuscript miniatures into 3D models for XR, printing, and visualization after targeted refinement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work tests seven image-to-3D conversion tools on 69 figures drawn from two manuscript collections and selects Hi3DGen as the method that best preserves topology while adding surface detail. The resulting pipeline chains automatic segmentation, mesh generation, expert sculpting in ZBrush, and AI texturing to produce models ready for WebXR display, AR overlays on original pages, and tactile prints. A sympathetic reader would care because the approach lowers the barrier to creating accessible digital and physical versions of fragile historical art without requiring full manual modeling from scratch.

Core claim

The framework evaluates multiple AI-based image-to-3D conversion tools on manuscript figures and identifies Hi3DGen as the method that provides the optimal trade-off between maintaining geometric fidelity and generating rich surface details through its normal bridging technique, serving as an effective base for subsequent expert editing to produce usable 3D models.

What carries the argument

Hi3DGen image-to-3D generation with its normal bridging approach, placed inside a pipeline that begins with SAM segmentation and ends with ZBrush refinement plus AI texturing.

If this is right

  • The generated models support interactive WebXR visualization of the original miniatures.
  • AR overlays can place the 3D versions directly on physical manuscript pages.
  • Tactile 3D prints become feasible for visually impaired users to explore the artwork by touch.
  • The same workflow applies across Gothic illuminations and Renaissance miniatures without style-specific changes.
  • Expert refinement in ZBrush consistently improves initial AI meshes to production-ready quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pipeline could be adapted for other categories of flat historical imagery such as prints or drawings.
  • Substituting newer image-to-3D generators into the same segmentation-plus-refinement structure might raise baseline quality further.
  • Integration with museum digitization workflows would test scalability beyond the 69-figure test set.
  • Automated quality checks based on the same metrics could flag cases needing extra manual work before printing.

Load-bearing premise

The chosen rendering-based and volumetric metrics sufficiently predict real-world usability for XR visualization and tactile 3D printing.

What would settle it

A controlled comparison measuring expert refinement time and final model quality when starting from Hi3DGen versus the next-best tool, or direct user testing of the resulting tactile prints by visually impaired participants.

Figures

Figures reproduced from arXiv: 2604.08610 by Pierluigi Feliciati, Riccardo Pallotto, Tiberio Uricchio.

Figure 1
Figure 1. Figure 1: Pipeline overview: image acquisition, SAM segmentation, Hi3DGen mesh [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison on a Vatican figure (bishop from [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison on a Monteprandone figure. SF3D and SPAR3D [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pipeline stages: original miniature, Hi3DGen output, refined mesh, textured [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

This paper presents a semi-automated framework for transforming two-dimensional miniatures from medieval manuscripts into three-dimensional digital models suitable for extended reality (XR), tactile 3D~printing, and web-based visualization. We evaluate seven image-to-3D methods (TripoSR, SF3D, SPAR3D, TRELLIS, Wonder3D, SAM~3D, Hi3DGen) on 69~manuscript figures from two collections using rendering-based metrics (Silhouette IoU, LPIPS, CLIP~Score) and volumetric measures (Depth Range Ratio, watertight percentage), revealing a trade-off between volumetric expansion and geometric fidelity. Hi3DGen balances topological quality with rich surface detail through its normal bridging approach, making it a good starting point for expert refinement. Our pipeline combines SAM segmentation, Hi3DGen mesh generation, expert refinement in ZBrush, and AI-assisted texturing. Two case studies on Gothic illuminations from the Decretum Gratiani (Vatican Library) and Renaissance miniatures by Giulio Clovio demonstrate applicability across artistic traditions. The resulting models can support WebXR visualization, AR overlay on physical manuscripts, and tactile 3D~prints for visually impaired users.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a semi-automated framework for 3D reconstruction of medieval manuscript miniatures, evaluating seven image-to-3D methods (TripoSR, SF3D, SPAR3D, TRELLIS, Wonder3D, SAM 3D, Hi3DGen) on 69 figures from two collections. It reports a trade-off between volumetric expansion and geometric fidelity using rendering-based metrics (Silhouette IoU, LPIPS, CLIP Score) and volumetric measures (Depth Range Ratio, watertight percentage), identifies Hi3DGen as balancing topological quality with surface detail via normal bridging, and demonstrates the full pipeline (SAM segmentation, Hi3DGen, ZBrush refinement, AI texturing) through two case studies on Gothic and Renaissance illuminations for XR, web visualization, and tactile printing applications.

Significance. If the proxy metrics can be shown to correlate with actual usability in XR overlays and tactile 3D printing, the work provides a practical, domain-specific pipeline for cultural heritage digitization that could enable new applications for visually impaired users and manuscript scholars. The empirical comparison across multiple methods on a curated set of 69 figures and the inclusion of real-world case studies are concrete strengths that ground the recommendations.

major comments (2)
  1. [Evaluation] Evaluation section (69 figures, seven methods): The central recommendation of Hi3DGen as a good starting point for expert refinement rests on the claim that it balances topological quality with rich surface detail. However, this is supported only by the listed proxy metrics (Silhouette IoU, LPIPS, CLIP Score, Depth Range Ratio, watertight percentage), which measure 2D rendering fidelity and basic volume properties but do not directly assess preservation of fine artistic linework, manifoldness for 3D printing, or perceptual suitability for XR overlays. A correlation study or expert/user validation against the target applications is required to substantiate the trade-off and recommendation.
  2. [Results] Results and Methods: The manuscript reports concrete metric values and a trade-off but provides no details on data splits, selection criteria for the 69 figures, statistical testing of differences between methods, or how the metrics were chosen to predict downstream XR/printing quality. These omissions make it difficult to assess the robustness of the Hi3DGen preference and the generalizability of the pipeline.
minor comments (2)
  1. [Abstract] Abstract and pipeline description: The integration of expert refinement in ZBrush and AI-assisted texturing is presented as part of the framework, but the degree of automation versus manual intervention is not quantified, which affects reproducibility claims.
  2. [Figures/Tables] Figure captions and tables: Some metric definitions (e.g., exact computation of Depth Range Ratio) and the precise meaning of 'watertight percentage' could be clarified with equations or pseudocode to aid readers unfamiliar with the 3D generation literature.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments and for acknowledging the practical strengths of our pipeline and case studies. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (69 figures, seven methods): The central recommendation of Hi3DGen as a good starting point for expert refinement rests on the claim that it balances topological quality with rich surface detail. However, this is supported only by the listed proxy metrics (Silhouette IoU, LPIPS, CLIP Score, Depth Range Ratio, watertight percentage), which measure 2D rendering fidelity and basic volume properties but do not directly assess preservation of fine artistic linework, manifoldness for 3D printing, or perceptual suitability for XR overlays. A correlation study or expert/user validation against the target applications is required to substantiate the trade-off and recommendation.

    Authors: We agree that proxy metrics have inherent limitations and do not directly validate downstream usability. These metrics were chosen following standard practices in image-to-3D literature, where 3D ground truth for historical artifacts is unavailable. In revision, we will expand the Evaluation section with a dedicated discussion linking each metric to XR and printing applications (e.g., watertight percentage to manifoldness for printing, LPIPS to perceptual fidelity in overlays) and add qualitative examples from the case studies illustrating linework preservation. A full correlation study or user validation cannot be performed in this revision and will be noted as a limitation with future work suggestions. revision: partial

  2. Referee: [Results] Results and Methods: The manuscript reports concrete metric values and a trade-off but provides no details on data splits, selection criteria for the 69 figures, statistical testing of differences between methods, or how the metrics were chosen to predict downstream XR/printing quality. These omissions make it difficult to assess the robustness of the Hi3DGen preference and the generalizability of the pipeline.

    Authors: We will revise the Methods and Results sections to address these omissions. The 69 figures were selected for diversity in artistic style, complexity, and period from the two specified collections; we will detail these criteria explicitly. As this is a benchmark comparison of existing methods (not ML training), no data splits were used. We will add statistical testing (e.g., paired significance tests) for metric differences and expand the justification for metric selection with explicit ties to XR/printing quality prediction. revision: yes

standing simulated objections not resolved
  • A dedicated correlation study or expert/user validation requires new experiments and participant recruitment that cannot be completed within the current revision timeline.

Circularity Check

0 steps flagged

No circularity; empirical pipeline with external grounding

full rationale

The paper describes a semi-automated 3D reconstruction pipeline and performs an empirical comparison of seven image-to-3D methods on 69 manuscript figures using named external metrics (Silhouette IoU, LPIPS, CLIP Score, Depth Range Ratio, watertight percentage) and datasets from two collections. No derivations, equations, fitted parameters presented as predictions, or self-citation chains appear in the abstract or described content. All claims rest on direct evaluation against external tools and benchmarks rather than reducing to the paper's own inputs by construction. This is the expected non-finding for an applied engineering paper without mathematical modeling.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework depends on off-the-shelf AI models and standard graphics software without introducing new mathematical parameters or entities.

axioms (2)
  • domain assumption SAM segmentation performs reliably on stylized medieval miniature figures.
    The pipeline begins with SAM to isolate figures; failure here would break downstream steps.
  • domain assumption The listed rendering and volumetric metrics correlate with suitability for XR and tactile printing.
    Choice of Hi3DGen rests on these metrics matching real application needs.

pith-pipeline@v0.9.0 · 5529 in / 1396 out tokens · 58181 ms · 2026-05-10T17:26:59.097604+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    In: Ioannides, M., Baker, D., Aga- piou, A., Siegkas, P

    Baker, D.: Paradata: The digital prometheus. In: Ioannides, M., Baker, D., Aga- piou, A., Siegkas, P. (eds.) 3D Research Challenges in Cultural Heritage V, vol. 15190, pp. 12–23. Springer Nature Switzerland (2025). https://doi.org/10.1007/ 978-3-031-78590-0_2

  2. [2]

    Aracne editrice, 1 edn

    Bonacini, E.: I musei e le forme dello storytelling digitale. Aracne editrice, 1 edn. (2020)

  3. [3]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Boss, M., Huang, Z., Vasishta, A., Jampani, V.: SF3D: Stable fast 3d mesh recon- struction with uv-unwrapping and illumination disentanglement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16240–16250 (2025)

  4. [4]

    Communications of the ACM44(5) (2001)

    Brown, M.S., Seales, W.B., Griffioen, J.N., Kiernan, K.S.: 3d acquisition and restoration of medieval manuscripts. Communications of the ACM44(5) (2001)

  5. [5]

    Arc Humanities Press (2020)

    Campagnolo, A.: Book Conservation and Digitization: The Challenges of Dialogue and Collaboration. Arc Humanities Press (2020)

  6. [6]

    SAM 3D: 3Dfy Anything in Images

    Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., Lin, A., Liu, J., Ma, Z., Sagar, A., Song, B., Wang, X., Yang, J., Zhang, B., Doll´ ar, P., Gkioxari, G., Feiszli, M., Malik, J.: SAM 3D: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)

  7. [7]

    Alpine Fine Arts Collection, New York (1980)

    Cionini Visani, M.: Giorgio Giulio Clovio: Miniaturist of the Renaissance. Alpine Fine Arts Collection, New York (1980)

  8. [8]

    Library Association Publishing, London (2002)

    Deegan, M., Tanner, S.: Digital Futures: Strategies for the Information Age. Library Association Publishing, London (2002)

  9. [9]

    Chad Gospels, Materiality, Recoveries, and Representation in 2D and 3D

    Endres, B.: Digitizing Medieval Manuscripts: The St. Chad Gospels, Materiality, Recoveries, and Representation in 2D and 3D. Arc Humanities Press, 1 edn. (2019). https://doi.org/10.1515/9781942401803

  10. [10]

    Journal of Medieval Iberian Studies14(1), 15–27 (2022).https://doi.org/10.1080/17546559.2021.2022738

    Francomano, E.C., Bamford, H.: Whose digital middle ages? accessibility in digital medieval manuscript culture. Journal of Medieval Iberian Studies14(1), 15–27 (2022).https://doi.org/10.1080/17546559.2021.2022738

  11. [11]

    In: Ioannides, M., Patias, P

    Haynes, R.: Evolving standards in digital cultural heritage – developing a IIIF 3d technical specification. In: Ioannides, M., Patias, P. (eds.) 3D Research Challenges in Cultural Heritage III, vol. 13125, pp. 50–64. Springer International Publishing (2023).https://doi.org/10.1007/978-3-031-35593-6_3

  12. [12]

    In: International Conference on Learning Representations (ICLR) (2024) Title Suppressed Due to Excessive Length 11

    Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., Tan, H.: LRM: Large reconstruction model for single image to 3d. In: International Conference on Learning Representations (ICLR) (2024) Title Suppressed Due to Excessive Length 11

  13. [13]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Huang, Z., Boss, M., Vasishta, A., Rehg, J.M., Jampani, V.: SPAR3D: Stable point-aware reconstruction of 3d objects from single images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16860–16870 (2025)

  14. [14]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

    Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Re- purposing diffusion-based image generators for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 9492–9502 (2024)

  15. [15]

    2023 , url =

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Doll´ ar, P., Girshick, R.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 3992–4003 (2023). https://doi.org/10.1109/ICCV51070.2023.00371

  16. [16]

    In: Svensson, P., Goldberg, D.T

    Lindh´ e, C.: Medieval materiality through the digital lens. In: Svensson, P., Goldberg, D.T. (eds.) Between Humanities and the Digital, pp. 193–204. The MIT Press (2015).https://doi.org/10.7551/mitpress/9465.003.0019

  17. [17]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Long, X., Guo, Y.C., Lin, C., Liu, Y., Dou, Z., Liu, L., Ma, Y., Zhang, S.H., Habermann, M., Theobalt, C., Wang, W.: Wonder3d: Single image to 3d using cross-domain diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9970–9980 (2024)

  18. [18]

    In: Da Rold, O., Treharne, E.M

    Paul, S.: The curation and display of digital medieval manuscripts. In: Da Rold, O., Treharne, E.M. (eds.) The Cambridge Companion to Medieval British Manuscripts, pp. 267–283. Cambridge University Press, 1 edn. (2020). https://doi.org/10. 1017/9781316182659.013

  19. [19]

    Journal of Cultural Heritage8(1), 93–98 (2007).https://doi.org/10.1016/j.culher.2006.10.007

    Pavlidis, G., Koutsoudis, A., Arnaoutoglou, F., Tsioukas, V., Chamzas, C.: Methods for 3d digitization of cultural heritage. Journal of Cultural Heritage8(1), 93–98 (2007).https://doi.org/10.1016/j.culher.2006.10.007

  20. [20]

    from codex4d to dataspace project

    Pietroni, E., Chiriv` ı, A., Fanini, B., Bucciero, A.: An innovative approach to shape information architecture related to ancient manuscripts, through multi-layered virtual ecosystems. from codex4d to dataspace project. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds.) Extended Reality, vol. 14219, pp. 247–267. Springer Nature Switzerland (2023).https://...

  21. [21]

    viaggio interdisciplinare nel manoscritto antico

    Pietroni, E., Orazi, N., Fanini, B.: Codex4d. viaggio interdisciplinare nel manoscritto antico. Migrazioni e contaminazioni tra le scienze4, 103–120 (2023). https://doi. org/10.36173/PLURIMI-2023-4/06

  22. [22]

    In: Proceedings of the 38th International Conference on Machine Learning (ICML)

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning (ICML). pp. 8748–8763 (2021)

  23. [23]

    Remote Sensing3(6), 1104–1138 (2011)

    Remondino, F.: Heritage recording and 3d modeling with photogrammetry and 3d scanning. Remote Sensing3(6), 1104–1138 (2011). https://doi.org/10.3390/ rs3061104

  24. [24]

    In: Albritton, B., Henley, G., Treharne, E.M

    Robertson, A.G.: A note on technology and functionality in digital manuscript studies. In: Albritton, B., Henley, G., Treharne, E.M. (eds.) Medieval Manuscripts in the Digital Age. Routledge (2020)

  25. [25]

    Giacomo della Marca

    Siliquini, A.: Decorazione e illustrazione nella biblioteca di s. Giacomo della Marca. Gianni Maroni editore (2002)

  26. [26]

    arXiv preprint arXiv:2403.02151 , year=

    Tochilkin, D., Pankratz, D., Liu, Z., Huang, Z., Letts, A., Li, Y., Liang, D., Laforte, C., Jampani, V., Cao, Y.P.: TripoSR: Fast 3d object reconstruction from a single image. arXiv preprint arXiv:2403.02151 (2024) 12 R. Pallotto et al

  27. [27]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 34, pp. 27171– 27183 (2021)

  28. [28]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops

    Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. pp. 1905–1914 (2021)

  29. [29]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025), spotlight

    Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., Yang, J.: TRELLIS: Structured 3d latents for scalable and versatile 3d generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025), spotlight

  30. [30]

    In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Ye, C., Wu, Y., Lu, Z., Chang, J., Guo, X., Zhou, J., Zhao, H., Han, X.: Hi3dgen: High-fidelity 3d geometry generation from images via normal bridging. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 25050–25061 (2025)

  31. [31]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effec- tiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 586–595 (2018)