pith. sign in

arxiv: 2606.05611 · v1 · pith:ZE7V5IIWnew · submitted 2026-06-04 · 💻 cs.CV

What's Under the Skin? Estimating Swine Body Condition

Pith reviewed 2026-06-28 02:38 UTC · model grok-4.3

classification 💻 cs.CV
keywords swine body conditionbackfat thicknessRGB-D cameraslice attention encoderdepth image processinglivestock monitoringcomputer visiontissue thickness estimation
0
0 comments X

The pith

PigFormer estimates swine backfat thickness to 2.43 mm mean absolute error from ceiling depth camera images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PigFormer, an end-to-end two-stage system that predicts subcutaneous backfat thickness, loin muscle depth, and total tissue thickness in sows from raw depth frames captured by a ceiling-mounted RGB-D camera. It aims to offer a scalable non-contact alternative to ultrasound scans and visual scoring or calipers that correlate poorly with tissue composition. Stage 1 standardizes depth data into height maps via segmentation distillation, ground-plane removal, and orientation normalization. Stage 2 applies a Slice Attention Encoder to capture spatial relationships along the dorsal surface. On a multi-site dataset of 319 instances, it achieves 2.43 mm backfat MAE and 3.87 mm overall MAE while outperforming ResNet-18 and ViT-small baselines.

Core claim

The central discovery is that converting raw depth frames into standardized height maps through SAM3-to-MaskDINO segmentation distillation, ground-plane removal, and orientation normalization, then processing those maps with a Slice Attention Encoder that treats them as sequences of cross-sectional slices, enables accurate prediction of backfat thickness (2.43 mm MAE), loin muscle depth, and total tissue thickness (3.87 mm MAE) at the last rib on multi-site data from two facilities, outperforming single-stage baselines.

What carries the argument

The Slice Attention Encoder, which processes each standardized height map as a sequence of cross-sectional slices to capture spatial relationships along the full dorsal surface for tissue thickness predictions.

If this is right

  • Continuous automated body condition monitoring becomes feasible in commercial swine production without labor-intensive ultrasound.
  • More accurate estimates of underlying tissue composition support better management decisions affecting lactation performance and piglet survival.
  • The two-stage design separates geometric standardization from prediction, enabling modular improvements or replacement of either component.
  • Multi-site results indicate the approach can generalize across different production facilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the height map representation holds, the same pipeline could extend to estimating additional body metrics such as volume or weight.
  • Real-time deployment in barns could enable tracking of individual condition changes over time rather than snapshot assessments.
  • The method might reduce reliance on physical contact methods for routine welfare monitoring if the geometric front-end remains robust across breeds and camera setups.

Load-bearing premise

The geometric front-end produces height maps that faithfully represent the dorsal surface geometry needed for the Slice Attention Encoder to make accurate tissue predictions.

What would settle it

Ultrasound ground-truth measurements on a new set of sows where PigFormer's backfat predictions show mean absolute error substantially larger than 2.43 mm would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.05611 by Daniel Morris, Gary Rohrer, Kuljit Bhatti, Madonna Benjamin, Mk Bashar, Tami Brown-Brandl.

Figure 1
Figure 1. Figure 1: Overview of PigFormer, a two-stage system. A ceiling-mounted RGB-D camera captures raw depth frames. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: From raw capture to height map. (a) The RGB-D point [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Heading angle estimated from upper-body mask in (a) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ultrasound annotation process. (a) A raw ultrasound frame selected from a video where the last rib is clearly visible. (b) The [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sow caliper score vs. ultrasound-derived backfat depth and loin depth across both sites ( [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Population Normalized Column Importance (NCI) curve averaged over all 79 test bags with [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-spine-column importance for one representative test pig (input-aggregated height map). Top: height map with tail / last-rib / [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Sow body condition is an important indicator for growers as it has a large impact on lactation performance and piglet survival. However, body condition measures used during production, such as visual scoring and calipers, correlate poorly with underlying tissue composition. Ultrasound scans can provide direct measurements of subcutaneous backfat thickness and loin muscle depth, but their operation is labor intensive and not scalable for production. We present PigFormer, an end-to-end two-stage system that takes raw depth frames from a ceiling-mounted RGB-D camera and predicts subcutaneous backfat thickness, loin muscle depth, and total tissue thickness at the last rib. Stage 1 is a geometric front-end that converts raw depth into a standardized height map via SAM3-to-MaskDINO segmentation distillation, ground-plane removal, and orientation normalization. Stage 2 is a Slice Attention Encoder that treats each height map as a sequence of cross-sectional slices and captures spatial relationships along the full dorsal surface. On a multi-site dataset of 319 sow and gilt instances from two facilities, PigFormer achieves 2.43 mm backfat MAE and 3.87 mm overall MAE. It outperforms strong single-stage ResNet-18 and ViT-small baselines. PigFormer offers a practical path toward continuous, automated, non-contact body condition monitoring in commercial swine production. Code is available at https://github.com/iambashar/Pigformer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents PigFormer, a two-stage pipeline for non-contact estimation of swine body condition from ceiling-mounted RGB-D depth frames. Stage 1 applies a geometric front-end (SAM3-to-MaskDINO segmentation distillation, ground-plane removal, and orientation normalization) to produce standardized height maps. Stage 2 uses a Slice Attention Encoder to process sequences of cross-sectional slices and predict backfat thickness, loin muscle depth, and total tissue thickness at the last rib. On a multi-site dataset of 319 sow and gilt instances from two facilities, the system reports 2.43 mm backfat MAE and 3.87 mm overall MAE while outperforming single-stage ResNet-18 and ViT-small baselines. Code is released at the cited GitHub repository.

Significance. If the performance numbers hold under proper validation, the work provides a practical route to automated, scalable body-condition monitoring in commercial swine operations, where current manual methods (visual scoring, calipers, ultrasound) are labor-intensive and poorly correlated with tissue composition. The public code release is a clear strength that supports reproducibility.

major comments (3)
  1. [Abstract / evaluation] Abstract and evaluation section: the headline claims of 2.43 mm backfat MAE and 3.87 mm overall MAE on 319 multi-site instances are presented without any description of train/test splits, cross-validation procedure, data-exclusion criteria, or per-site breakdowns. These omissions are load-bearing for the central claim that PigFormer generalizes across facilities and outperforms the baselines.
  2. [Stage 1] Stage 1 geometric front-end description: the manuscript treats the output height maps as faithful representations of dorsal surface geometry after SAM3-to-MaskDINO distillation, ground-plane removal, and orientation normalization, yet supplies no quantitative validation (e.g., comparison to manual height measurements or ablation removing the front-end) to confirm that residual tilt or segmentation leakage does not systematically affect the Slice Attention Encoder inputs.
  3. [Results] Results comparison: the reported outperformance over ResNet-18 and ViT-small is stated without error bars, statistical significance tests, or an ablation that isolates the contribution of the Slice Attention Encoder versus the geometric preprocessing, making it impossible to determine whether the MAE advantage is attributable to the proposed architecture or to the front-end pipeline.
minor comments (2)
  1. [Abstract] The phrase 'overall MAE' is used without an explicit definition of which of the three tissue measures are included in the average; a short clarification would remove ambiguity.
  2. [Dataset description] The multi-site collection is described only by total instance count; adding a table or paragraph with per-facility instance counts and basic statistics would strengthen the generalization narrative.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each of the major comments below and will revise the paper to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract / evaluation] Abstract and evaluation section: the headline claims of 2.43 mm backfat MAE and 3.87 mm overall MAE on 319 multi-site instances are presented without any description of train/test splits, cross-validation procedure, data-exclusion criteria, or per-site breakdowns. These omissions are load-bearing for the central claim that PigFormer generalizes across facilities and outperforms the baselines.

    Authors: We agree that the current presentation lacks sufficient detail on the evaluation protocol. In the revised manuscript we will expand both the abstract and the evaluation section to explicitly describe the train/test split (including the ratio and any stratification by site or animal type), the cross-validation procedure if employed, data-exclusion criteria, and per-site performance breakdowns to substantiate the generalization claim. revision: yes

  2. Referee: [Stage 1] Stage 1 geometric front-end description: the manuscript treats the output height maps as faithful representations of dorsal surface geometry after SAM3-to-MaskDINO segmentation distillation, ground-plane removal, and orientation normalization, yet supplies no quantitative validation (e.g., comparison to manual height measurements or ablation removing the front-end) to confirm that residual tilt or segmentation leakage does not systematically affect the Slice Attention Encoder inputs.

    Authors: We acknowledge the absence of quantitative validation for the geometric front-end. We will add an ablation that measures the impact of removing Stage 1 preprocessing entirely and, where possible with the collected data, include a limited comparison of the generated height maps against manual reference measurements on a subset of instances. revision: yes

  3. Referee: [Results] Results comparison: the reported outperformance over ResNet-18 and ViT-small is stated without error bars, statistical significance tests, or an ablation that isolates the contribution of the Slice Attention Encoder versus the geometric preprocessing, making it impossible to determine whether the MAE advantage is attributable to the proposed architecture or to the front-end pipeline.

    Authors: We agree that error bars, statistical tests, and a targeted ablation are required for a clear interpretation. In the revision we will report standard deviations alongside the MAE values, apply appropriate statistical significance tests between models, and add an ablation that replaces the Slice Attention Encoder with a standard backbone while retaining the geometric front-end to isolate its contribution. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical evaluation on held-out data

full rationale

The paper presents a two-stage ML pipeline (geometric front-end followed by Slice Attention Encoder) whose outputs are learned parameters evaluated via MAE on a held-out multi-site dataset of 319 instances. No algebraic derivation, self-referential definitions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The central performance claims rest on external data splits rather than reducing to the model's own inputs or prior author work by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard supervised deep-learning assumptions plus the domain-specific premise that the geometric preprocessing step produces usable height maps; no new physical entities are postulated.

free parameters (1)
  • Model hyperparameters and training schedule
    Typical deep-learning weights and learning-rate choices fitted to the labeled ultrasound targets.
axioms (1)
  • domain assumption SAM3-to-MaskDINO segmentation distillation produces accurate sow masks from depth frames
    Invoked in Stage 1 to enable ground-plane removal and orientation normalization.
invented entities (1)
  • Slice Attention Encoder no independent evidence
    purpose: Captures spatial relationships across dorsal cross-sectional slices of the height map
    New architectural component introduced for this regression task

pith-pipeline@v0.9.1-grok · 5789 in / 1321 out tokens · 54910 ms · 2026-06-28T02:38:25.360653+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Authement and Mark T

    Miranda R. Authement and Mark T. Knauer. Associations between sow body condition with subsequent reproductive performance.Open Journal of Animal Sciences, 13(3):291– 303, 2023. 1

  2. [2]

    SAM 3: Segment Anything with Concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R¨adle, Triantafyllos Afouras, Effrosyni Mavroudi, Kather- ine Xu, Tsung-Han Wu, Yu Zhou, Lil...

  3. [3]

    Arthur F. A. Fernandes, Jo ˜ao R. R. D´orea, Robert Fitzgerald, William Herring, and Guilherme J. M. Rosa. A novel auto- mated system to acquire biometric and morphological mea- surements and predict body weight of pigs via 3d computer vision.Journal of Animal Science, 97:496–508, 2019. 1, 2

  4. [4]

    Arthur F. A. Fernandes, Jo ˜ao R. R. D ´orea, Bruno Dourado Valente, Robert Fitzgerald, William Herring, and Guilherme J. M. Rosa. Comparison of data analytics strategies in com- puter vision systems to predict pig body composition traits from 3d images.Journal of Animal Science, 98(8):1–9, 2020. 1, 2

  5. [5]

    Gourley, Hilda I

    Kiah M. Gourley, Hilda I. Calderon, Jason C. Woodworth, Joel M. DeRouchey, Mike D. Tokach, Steve S. Dritz, and Robert D. Goodband. Sow and piglet traits associated with piglet survival at birth and to weaning.Journal of Animal Science, 98(7):skaa187, 2020. 1

  6. [6]

    Two-stream cross-attention vision Transformer based on RGB-D images for pig weight estimation.Computers and Electronics in Agriculture, 212:107986, 2023

    Wei He, Yang Mi, Xiangdong Ding, Gang Liu, and Tao Li. Two-stream cross-attention vision Transformer based on RGB-D images for pig weight estimation.Computers and Electronics in Agriculture, 212:107986, 2023. 2

  7. [7]

    Estimation of sow backfat thickness based on machine vision.Animals, 14:3520, 2024

    Yue Jian, Shihua Pu, Jiaming Zhu, Jianlong Zhang, and Wen- wen Xing. Estimation of sow backfat thickness based on machine vision.Animals, 14:3520, 2024. 1, 2

  8. [8]

    The sow body condition caliper.Applied Engineering in Agriculture, 31(2):175–178, 2015

    Mark Thomas Knauer and David John Baitinger. The sow body condition caliper.Applied Engineering in Agriculture, 31(2):175–178, 2015. 1

  9. [9]

    M. T. Knauer, L. E. Jones, D. W. Rozeboom, L. L. Greiner, M. J. Azain, L. A. Karriker, and K. J. Stalder. Use of a sow caliper for body condition assessment.Journal of Swine Health and Production, 15(4):209–213, 2007. 1

  10. [10]

    Mask dino: Towards a unified transformer-based framework for object detection and segmentation

    Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InCVPR, pages 3041–3050, 2023. 3, 1, 2

  11. [11]

    Non-contact measurement of pregnant sows’ backfat thickness based on a hybrid CNN-ViT model

    Xuan Li, Mengyuan Yu, Dihong Xu, Shuhong Zhao, Hua Tan, and Xu Liu. Non-contact measurement of pregnant sows’ backfat thickness based on a hybrid CNN-ViT model. Agriculture, 13(7):1395, 2023. 2

  12. [12]

    Baidoo, and Lee J

    Yuzhi Li, Shiquan Cui, Samuel K. Baidoo, and Lee J. John- ston. Evaluation of sow caliper for body condition measure- ment of gestating sows.Journal of Swine Health and Pro- duction, 29(5):245–252, 2021. 1

  13. [13]

    Mullins, Carissa M

    Israel L. Mullins, Carissa M. Truman, Magnus R. Campler, Jeffrey M. Bewley, and Joao H. C. Costa. Validation of a commercial automated body condition scoring system on a commercial dairy farm.Animals, 9(6):287, 2019. 2

  14. [14]

    Backfat thickness at pre-farrowing: Indicators of sow reproductive performance, milk yield, and piglet birth weight in smart farm-based systems.Agriculture, 14(1):24,

    Nattakarn Nuntapaitoon, Supaporn Thongkhuy, and Padet Tummaruk. Backfat thickness at pre-farrowing: Indicators of sow reproductive performance, milk yield, and piglet birth weight in smart farm-based systems.Agriculture, 14(1):24,

  15. [15]

    Z. C. Peppmeier, J. T. Howard, M. T. Knauer, and S. M. Leonard. Estimating backfat depth, loin depth, and intra- muscular fat percentage from ultrasound images in swine. Animal, 17(10):100969, 2023. 2

  16. [16]

    Feed processing and feed budgets and feed cost to produce pork

    Pork Information Gateway. Feed processing and feed budgets and feed cost to produce pork. https : / / porkgateway . org / resource / feed - processing - and - feed - budgets - and - feed - cost- to- produce- pork/, n.d. Accessed: 2026-03-

  17. [17]

    Rodriguez, Alfredo Tey- seyre, Carlos Sanz, Alejandro Zunino, Claudio Machado, and Cristian Mateos

    Juan Rodr ´ıguez Alvarez, Mauricio Arroqui, Pablo Mangudo, Juan Toloza, Daniel Jatip, Juan M. Rodriguez, Alfredo Tey- seyre, Carlos Sanz, Alejandro Zunino, Claudio Machado, and Cristian Mateos. Estimating body condition score in dairy cows from depth images using convolutional neu- ral networks, transfer learning and model ensembling tech- niques.Agronomy...

  18. [18]

    RoFormer: Enhanced Transformer with Rotary Position Embedding

    Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.arXiv, abs/2104.09864,

  19. [19]

    Chebel, and Haipeng Yu

    Zhou Tang, Jin Wang, Angelo De Castro, Yuxi Zhang, Vic- toria Bastos Primo, Ana Beatriz Montevecchio Bernardino, Gota Morota, Xu Wang, Ricardo C. Chebel, and Haipeng Yu. Can 3d point cloud data improve automated body condi- tion score prediction in dairy cattle?arXiv, abs/2601.22522,

  20. [20]

    Non-contact sow body condition scoring method based on kinect sensor.Transactions of the Chinese Society of Agricultural Engineering, 34(13), 2018

    Guanghui Teng, Zhijie Shen, Jianlong Zhang, Chen Shi, and Jionghua Yu. Non-contact sow body condition scoring method based on kinect sensor.Transactions of the Chinese Society of Agricultural Engineering, 34(13), 2018. 2

  21. [21]

    Yijie Xiong, Isabella C. F. S. Condotta, Jacki A. Musgrave, Tami M. Brown-Brandl, and J. Travis Mulliniks. Estimat- ing body weight and body condition score of mature beef cows using depth images.Translational Animal Science, 7: txad085, 2023. 2 6

  22. [22]

    Cat-cbam-net: An automatic scoring method for sow body condition based on cnn and transformer.Sensors, 23:7919, 2023

    Hongxiang Xue, Yuwen Sun, Jinxin Chen, Haonan Tian, Zi- hao Liu, Mingxia Shen, and Longshen Liu. Cat-cbam-net: An automatic scoring method for sow body condition based on cnn and transformer.Sensors, 23:7919, 2023. 1, 2

  23. [23]

    M. G. Young, Michael D. Tokach, Robert D. Goodband, Jim L. Nelssen, and Steven S. Dritz. The relationship be- tween body condition score and backfat in gestating sows. InKansas State University Swine Day Report, 2001. 1

  24. [24]

    Forecasting dy- namic body weight of nonrestrained pigs from images using an rgb-d sensor camera.Translational Animal Science, 5: 1–9, 2021

    Haipeng Yu, Kiho Lee, and Gota Morota. Forecasting dy- namic body weight of nonrestrained pigs from images using an rgb-d sensor camera.Translational Animal Science, 5: 1–9, 2021. 1, 2

  25. [25]

    Non-contact detection method of pregnant sows’ backfat thickness based on two- dimensional images.Animal Genetics, 53(6):769–781, 2022

    Mengyuan Yu, Hongya Zheng, Dihong Xu, Yonghui Shuai, Shanfeng Tian, Tingjin Cao, Mingyan Zhou, Yuhua Zhu, Shuhong Zhao, and Xuan Li. Non-contact detection method of pregnant sows’ backfat thickness based on two- dimensional images.Animal Genetics, 53(6):769–781, 2022. 2

  26. [26]

    Pig weight and body size estimation using a multiple output regression convolutional neural network: A fast and fully automatic method.Sensors, 21(9):3218, 2021

    Jianlong Zhang, Yanrong Zhuang, Hengyi Ji, and Guanghui Teng. Pig weight and body size estimation using a multiple output regression convolutional neural network: A fast and fully automatic method.Sensors, 21(9):3218, 2021. 2

  27. [27]

    A Pig” Pig MaskUpper Body MaskGround Mask Heuristics Figure 2. Segmentation pipeline overview. SAM3 generates a whole-pig mask from an RGB frame with the text prompt “A Pig,

    Kaixuan Zhao, Anthony N. Shelley, Daniel L. Lau, Karmella A. Dolecheck, and Jeffrey M. Bewley. Auto- matic body condition scoring system for dairy cows based on depth-image analysis.International Journal of Agricul- tural and Biological Engineering, 13(4):45–54, 2020. 2 7 What’s Under the Skin? Estimating Swine Body Condition Supplementary Material Sam3 M...

  28. [28]

    no- object

    Stage 1: Geometric Front-End Details This section expands on Stage 1 of PigFormer summarized in Sec. 3.2 of the main paper. 8.1. Segmentation Our segmentation pipeline produces three masks—whole pig, upper body, and ground—through a semi-automated la- beling process followed by model distillation (Fig. 2). Whole-pig mask.We pass each RGB frame to SAM3 [2]...

  29. [29]

    MSU pigs were scanned across 6 dates (February–December 2025), while UNL pigs were scanned across 11 dates (June–December 2025)

    Dataset The combined dataset comprises 319 instances (116 MSU, 203 UNL) totaling 6,705 depth frames. MSU pigs were scanned across 6 dates (February–December 2025), while UNL pigs were scanned across 11 dates (June–December 2025). 9.1. Ground-truth labels We obtain ground-truth backfat depth and loin muscle depth at the last rib via B-mode ultrasound measu...

  30. [30]

    All ablations use the base PigFormer configu- ration (1-layer Slice Attention Encoder with dual pooling) and the combined MSU+UNL dataset (319 instances, 6,705 frames)

    Ablation Studies We report ablation experiments that guided the design of PigFormer. All ablations use the base PigFormer configu- ration (1-layer Slice Attention Encoder with dual pooling) and the combined MSU+UNL dataset (319 instances, 6,705 frames). Due to the broad search over hyperparameters, these supplementary ablation models were trained with a s...

  31. [31]

    Under the global-statistic hypothesis, an input-attribution map should be approxi- mately uniform across the pig

    Does the encoder localize to pig anatomy? The Slice Attention Encoder has a single transformer layer, so one can ask whether it has learned anatomical structure or is regressing global statistics of the height map (mean height, body volume, body area). Under the global-statistic hypothesis, an input-attribution map should be approxi- mately uniform across...