Recognition: unknown
A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures
Pith reviewed 2026-05-10 17:26 UTC · model grok-4.3
The pith
A semi-automated pipeline converts medieval manuscript miniatures into 3D models for XR, printing, and visualization after targeted refinement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework evaluates multiple AI-based image-to-3D conversion tools on manuscript figures and identifies Hi3DGen as the method that provides the optimal trade-off between maintaining geometric fidelity and generating rich surface details through its normal bridging technique, serving as an effective base for subsequent expert editing to produce usable 3D models.
What carries the argument
Hi3DGen image-to-3D generation with its normal bridging approach, placed inside a pipeline that begins with SAM segmentation and ends with ZBrush refinement plus AI texturing.
If this is right
- The generated models support interactive WebXR visualization of the original miniatures.
- AR overlays can place the 3D versions directly on physical manuscript pages.
- Tactile 3D prints become feasible for visually impaired users to explore the artwork by touch.
- The same workflow applies across Gothic illuminations and Renaissance miniatures without style-specific changes.
- Expert refinement in ZBrush consistently improves initial AI meshes to production-ready quality.
Where Pith is reading between the lines
- The pipeline could be adapted for other categories of flat historical imagery such as prints or drawings.
- Substituting newer image-to-3D generators into the same segmentation-plus-refinement structure might raise baseline quality further.
- Integration with museum digitization workflows would test scalability beyond the 69-figure test set.
- Automated quality checks based on the same metrics could flag cases needing extra manual work before printing.
Load-bearing premise
The chosen rendering-based and volumetric metrics sufficiently predict real-world usability for XR visualization and tactile 3D printing.
What would settle it
A controlled comparison measuring expert refinement time and final model quality when starting from Hi3DGen versus the next-best tool, or direct user testing of the resulting tactile prints by visually impaired participants.
Figures
read the original abstract
This paper presents a semi-automated framework for transforming two-dimensional miniatures from medieval manuscripts into three-dimensional digital models suitable for extended reality (XR), tactile 3D~printing, and web-based visualization. We evaluate seven image-to-3D methods (TripoSR, SF3D, SPAR3D, TRELLIS, Wonder3D, SAM~3D, Hi3DGen) on 69~manuscript figures from two collections using rendering-based metrics (Silhouette IoU, LPIPS, CLIP~Score) and volumetric measures (Depth Range Ratio, watertight percentage), revealing a trade-off between volumetric expansion and geometric fidelity. Hi3DGen balances topological quality with rich surface detail through its normal bridging approach, making it a good starting point for expert refinement. Our pipeline combines SAM segmentation, Hi3DGen mesh generation, expert refinement in ZBrush, and AI-assisted texturing. Two case studies on Gothic illuminations from the Decretum Gratiani (Vatican Library) and Renaissance miniatures by Giulio Clovio demonstrate applicability across artistic traditions. The resulting models can support WebXR visualization, AR overlay on physical manuscripts, and tactile 3D~prints for visually impaired users.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a semi-automated framework for 3D reconstruction of medieval manuscript miniatures, evaluating seven image-to-3D methods (TripoSR, SF3D, SPAR3D, TRELLIS, Wonder3D, SAM 3D, Hi3DGen) on 69 figures from two collections. It reports a trade-off between volumetric expansion and geometric fidelity using rendering-based metrics (Silhouette IoU, LPIPS, CLIP Score) and volumetric measures (Depth Range Ratio, watertight percentage), identifies Hi3DGen as balancing topological quality with surface detail via normal bridging, and demonstrates the full pipeline (SAM segmentation, Hi3DGen, ZBrush refinement, AI texturing) through two case studies on Gothic and Renaissance illuminations for XR, web visualization, and tactile printing applications.
Significance. If the proxy metrics can be shown to correlate with actual usability in XR overlays and tactile 3D printing, the work provides a practical, domain-specific pipeline for cultural heritage digitization that could enable new applications for visually impaired users and manuscript scholars. The empirical comparison across multiple methods on a curated set of 69 figures and the inclusion of real-world case studies are concrete strengths that ground the recommendations.
major comments (2)
- [Evaluation] Evaluation section (69 figures, seven methods): The central recommendation of Hi3DGen as a good starting point for expert refinement rests on the claim that it balances topological quality with rich surface detail. However, this is supported only by the listed proxy metrics (Silhouette IoU, LPIPS, CLIP Score, Depth Range Ratio, watertight percentage), which measure 2D rendering fidelity and basic volume properties but do not directly assess preservation of fine artistic linework, manifoldness for 3D printing, or perceptual suitability for XR overlays. A correlation study or expert/user validation against the target applications is required to substantiate the trade-off and recommendation.
- [Results] Results and Methods: The manuscript reports concrete metric values and a trade-off but provides no details on data splits, selection criteria for the 69 figures, statistical testing of differences between methods, or how the metrics were chosen to predict downstream XR/printing quality. These omissions make it difficult to assess the robustness of the Hi3DGen preference and the generalizability of the pipeline.
minor comments (2)
- [Abstract] Abstract and pipeline description: The integration of expert refinement in ZBrush and AI-assisted texturing is presented as part of the framework, but the degree of automation versus manual intervention is not quantified, which affects reproducibility claims.
- [Figures/Tables] Figure captions and tables: Some metric definitions (e.g., exact computation of Depth Range Ratio) and the precise meaning of 'watertight percentage' could be clarified with equations or pseudocode to aid readers unfamiliar with the 3D generation literature.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and for acknowledging the practical strengths of our pipeline and case studies. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section (69 figures, seven methods): The central recommendation of Hi3DGen as a good starting point for expert refinement rests on the claim that it balances topological quality with rich surface detail. However, this is supported only by the listed proxy metrics (Silhouette IoU, LPIPS, CLIP Score, Depth Range Ratio, watertight percentage), which measure 2D rendering fidelity and basic volume properties but do not directly assess preservation of fine artistic linework, manifoldness for 3D printing, or perceptual suitability for XR overlays. A correlation study or expert/user validation against the target applications is required to substantiate the trade-off and recommendation.
Authors: We agree that proxy metrics have inherent limitations and do not directly validate downstream usability. These metrics were chosen following standard practices in image-to-3D literature, where 3D ground truth for historical artifacts is unavailable. In revision, we will expand the Evaluation section with a dedicated discussion linking each metric to XR and printing applications (e.g., watertight percentage to manifoldness for printing, LPIPS to perceptual fidelity in overlays) and add qualitative examples from the case studies illustrating linework preservation. A full correlation study or user validation cannot be performed in this revision and will be noted as a limitation with future work suggestions. revision: partial
-
Referee: [Results] Results and Methods: The manuscript reports concrete metric values and a trade-off but provides no details on data splits, selection criteria for the 69 figures, statistical testing of differences between methods, or how the metrics were chosen to predict downstream XR/printing quality. These omissions make it difficult to assess the robustness of the Hi3DGen preference and the generalizability of the pipeline.
Authors: We will revise the Methods and Results sections to address these omissions. The 69 figures were selected for diversity in artistic style, complexity, and period from the two specified collections; we will detail these criteria explicitly. As this is a benchmark comparison of existing methods (not ML training), no data splits were used. We will add statistical testing (e.g., paired significance tests) for metric differences and expand the justification for metric selection with explicit ties to XR/printing quality prediction. revision: yes
- A dedicated correlation study or expert/user validation requires new experiments and participant recruitment that cannot be completed within the current revision timeline.
Circularity Check
No circularity; empirical pipeline with external grounding
full rationale
The paper describes a semi-automated 3D reconstruction pipeline and performs an empirical comparison of seven image-to-3D methods on 69 manuscript figures using named external metrics (Silhouette IoU, LPIPS, CLIP Score, Depth Range Ratio, watertight percentage) and datasets from two collections. No derivations, equations, fitted parameters presented as predictions, or self-citation chains appear in the abstract or described content. All claims rest on direct evaluation against external tools and benchmarks rather than reducing to the paper's own inputs by construction. This is the expected non-finding for an applied engineering paper without mathematical modeling.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption SAM segmentation performs reliably on stylized medieval miniature figures.
- domain assumption The listed rendering and volumetric metrics correlate with suitability for XR and tactile printing.
Reference graph
Works this paper leans on
-
[1]
In: Ioannides, M., Baker, D., Aga- piou, A., Siegkas, P
Baker, D.: Paradata: The digital prometheus. In: Ioannides, M., Baker, D., Aga- piou, A., Siegkas, P. (eds.) 3D Research Challenges in Cultural Heritage V, vol. 15190, pp. 12–23. Springer Nature Switzerland (2025). https://doi.org/10.1007/ 978-3-031-78590-0_2
2025
-
[2]
Aracne editrice, 1 edn
Bonacini, E.: I musei e le forme dello storytelling digitale. Aracne editrice, 1 edn. (2020)
2020
-
[3]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Boss, M., Huang, Z., Vasishta, A., Jampani, V.: SF3D: Stable fast 3d mesh recon- struction with uv-unwrapping and illumination disentanglement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16240–16250 (2025)
2025
-
[4]
Communications of the ACM44(5) (2001)
Brown, M.S., Seales, W.B., Griffioen, J.N., Kiernan, K.S.: 3d acquisition and restoration of medieval manuscripts. Communications of the ACM44(5) (2001)
2001
-
[5]
Arc Humanities Press (2020)
Campagnolo, A.: Book Conservation and Digitization: The Challenges of Dialogue and Collaboration. Arc Humanities Press (2020)
2020
-
[6]
SAM 3D: 3Dfy Anything in Images
Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., Lin, A., Liu, J., Ma, Z., Sagar, A., Song, B., Wang, X., Yang, J., Zhang, B., Doll´ ar, P., Gkioxari, G., Feiszli, M., Malik, J.: SAM 3D: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)
work page internal anchor Pith review arXiv 2025
-
[7]
Alpine Fine Arts Collection, New York (1980)
Cionini Visani, M.: Giorgio Giulio Clovio: Miniaturist of the Renaissance. Alpine Fine Arts Collection, New York (1980)
1980
-
[8]
Library Association Publishing, London (2002)
Deegan, M., Tanner, S.: Digital Futures: Strategies for the Information Age. Library Association Publishing, London (2002)
2002
-
[9]
Chad Gospels, Materiality, Recoveries, and Representation in 2D and 3D
Endres, B.: Digitizing Medieval Manuscripts: The St. Chad Gospels, Materiality, Recoveries, and Representation in 2D and 3D. Arc Humanities Press, 1 edn. (2019). https://doi.org/10.1515/9781942401803
-
[10]
Journal of Medieval Iberian Studies14(1), 15–27 (2022).https://doi.org/10.1080/17546559.2021.2022738
Francomano, E.C., Bamford, H.: Whose digital middle ages? accessibility in digital medieval manuscript culture. Journal of Medieval Iberian Studies14(1), 15–27 (2022).https://doi.org/10.1080/17546559.2021.2022738
-
[11]
Haynes, R.: Evolving standards in digital cultural heritage – developing a IIIF 3d technical specification. In: Ioannides, M., Patias, P. (eds.) 3D Research Challenges in Cultural Heritage III, vol. 13125, pp. 50–64. Springer International Publishing (2023).https://doi.org/10.1007/978-3-031-35593-6_3
-
[12]
In: International Conference on Learning Representations (ICLR) (2024) Title Suppressed Due to Excessive Length 11
Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., Tan, H.: LRM: Large reconstruction model for single image to 3d. In: International Conference on Learning Representations (ICLR) (2024) Title Suppressed Due to Excessive Length 11
2024
-
[13]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Huang, Z., Boss, M., Vasishta, A., Rehg, J.M., Jampani, V.: SPAR3D: Stable point-aware reconstruction of 3d objects from single images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16860–16870 (2025)
2025
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)
Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Re- purposing diffusion-based image generators for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 9492–9502 (2024)
2024
-
[15]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Doll´ ar, P., Girshick, R.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 3992–4003 (2023). https://doi.org/10.1109/ICCV51070.2023.00371
-
[16]
In: Svensson, P., Goldberg, D.T
Lindh´ e, C.: Medieval materiality through the digital lens. In: Svensson, P., Goldberg, D.T. (eds.) Between Humanities and the Digital, pp. 193–204. The MIT Press (2015).https://doi.org/10.7551/mitpress/9465.003.0019
-
[17]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Long, X., Guo, Y.C., Lin, C., Liu, Y., Dou, Z., Liu, L., Ma, Y., Zhang, S.H., Habermann, M., Theobalt, C., Wang, W.: Wonder3d: Single image to 3d using cross-domain diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9970–9980 (2024)
2024
-
[18]
In: Da Rold, O., Treharne, E.M
Paul, S.: The curation and display of digital medieval manuscripts. In: Da Rold, O., Treharne, E.M. (eds.) The Cambridge Companion to Medieval British Manuscripts, pp. 267–283. Cambridge University Press, 1 edn. (2020). https://doi.org/10. 1017/9781316182659.013
2020
-
[19]
Journal of Cultural Heritage8(1), 93–98 (2007).https://doi.org/10.1016/j.culher.2006.10.007
Pavlidis, G., Koutsoudis, A., Arnaoutoglou, F., Tsioukas, V., Chamzas, C.: Methods for 3d digitization of cultural heritage. Journal of Cultural Heritage8(1), 93–98 (2007).https://doi.org/10.1016/j.culher.2006.10.007
-
[20]
from codex4d to dataspace project
Pietroni, E., Chiriv` ı, A., Fanini, B., Bucciero, A.: An innovative approach to shape information architecture related to ancient manuscripts, through multi-layered virtual ecosystems. from codex4d to dataspace project. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds.) Extended Reality, vol. 14219, pp. 247–267. Springer Nature Switzerland (2023).https://...
-
[21]
viaggio interdisciplinare nel manoscritto antico
Pietroni, E., Orazi, N., Fanini, B.: Codex4d. viaggio interdisciplinare nel manoscritto antico. Migrazioni e contaminazioni tra le scienze4, 103–120 (2023). https://doi. org/10.36173/PLURIMI-2023-4/06
-
[22]
In: Proceedings of the 38th International Conference on Machine Learning (ICML)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning (ICML). pp. 8748–8763 (2021)
2021
-
[23]
Remote Sensing3(6), 1104–1138 (2011)
Remondino, F.: Heritage recording and 3d modeling with photogrammetry and 3d scanning. Remote Sensing3(6), 1104–1138 (2011). https://doi.org/10.3390/ rs3061104
2011
-
[24]
In: Albritton, B., Henley, G., Treharne, E.M
Robertson, A.G.: A note on technology and functionality in digital manuscript studies. In: Albritton, B., Henley, G., Treharne, E.M. (eds.) Medieval Manuscripts in the Digital Age. Routledge (2020)
2020
-
[25]
Giacomo della Marca
Siliquini, A.: Decorazione e illustrazione nella biblioteca di s. Giacomo della Marca. Gianni Maroni editore (2002)
2002
-
[26]
arXiv preprint arXiv:2403.02151 , year=
Tochilkin, D., Pankratz, D., Liu, Z., Huang, Z., Letts, A., Li, Y., Liang, D., Laforte, C., Jampani, V., Cao, Y.P.: TripoSR: Fast 3d object reconstruction from a single image. arXiv preprint arXiv:2403.02151 (2024) 12 R. Pallotto et al
-
[27]
In: Advances in Neural Information Processing Systems (NeurIPS)
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 34, pp. 27171– 27183 (2021)
2021
-
[28]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. pp. 1905–1914 (2021)
1905
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025), spotlight
Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., Yang, J.: TRELLIS: Structured 3d latents for scalable and versatile 3d generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025), spotlight
2025
-
[30]
In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Ye, C., Wu, Y., Lu, Z., Chang, J., Guo, X., Zhou, J., Zhao, H., Han, X.: Hi3dgen: High-fidelity 3d geometry generation from images via normal bridging. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 25050–25061 (2025)
2025
-
[31]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effec- tiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 586–595 (2018)
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.