pith. sign in

arxiv: 2606.21252 · v2 · pith:WBP7RM64new · submitted 2026-06-19 · 💻 cs.CV

A Neurosymbolic Framework for Interpretable Skeleton-Based Seizure Detection via Concept-Driven Logical Reasoning

Pith reviewed 2026-07-01 07:22 UTC · model grok-4.3

classification 💻 cs.CV
keywords neurosymbolic frameworkseizure detectionskeleton sequencesinterpretable AImotor semiologyBoolean rulesvideo-based detectionepilepsy monitoring
0
0 comments X

The pith

A neurosymbolic framework detects seizures from video skeletons by composing clinical motor concepts into Boolean rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that pulls skeleton sequences from epilepsy videos, identifies motor concepts drawn from clinical guidelines, and assembles those activations into logical rules using differentiable logic. This produces predictions that can be audited at the level of which primitives appear, how the rules combine them, and how much each rule weighs in the decision. The same pipeline splits non-seizure footage into everyday activities to give the model finer distinctions and cut false alarms. On two public benchmarks the approach records 89.78 percent sensitivity at 0.06 false detections per hour on SAHZU and 85.27 percent at 0.09 on IEEE while releasing all annotations and code.

Core claim

The framework extracts patient-centric skeleton sequences from epilepsy monitoring videos via a prompt-guided foundation model, predicts binary spatio-temporal concept activations grounded in clinical motor semiology guidelines, and composes them via differentiable logic into interpretable Boolean rules with auditable contributions, while sub-classifying non-seizure segments into clinically relevant normal activities to reduce false positives.

What carries the argument

Differentiable logic that composes binary concept activations into Boolean rules.

If this is right

  • Every prediction breaks down into detected motor primitives, their logical composition, and each rule's contribution to the output.
  • Sub-classifying non-seizure segments supplies fine-grained supervision that lowers false detections per hour.
  • The three-level interpretability holds for every frame-level decision on the tested benchmarks.
  • Public release of annotations, pose sequences, pipeline, and code allows direct reuse or extension of the extracted data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same concept-to-rule pipeline could be tested on other video tasks that already have published clinical movement guidelines.
  • If the extracted skeletons prove stable across camera angles and clothing, the method might transfer to home monitoring without retraining the concept layer.
  • Adding temporal duration constraints to the Boolean rules could be checked against seizure-length statistics to see whether specificity rises further.

Load-bearing premise

The prompt-guided foundation model accurately extracts patient-centric skeleton sequences from the videos and the predicted concept activations correctly match clinical motor semiology guidelines.

What would settle it

Independent neurologist review of the same videos showing that the model's concept activations do not align with observed motor primitives would disprove the grounding step.

Figures

Figures reproduced from arXiv: 2606.21252 by Deval Mehta, Talha Ilyas, Zongyuan Ge.

Figure 1
Figure 1. Figure 1: (a) Prompt guided patient tracking and pose extraction, (b) VLM assisted nor￾mal activity sub-classificaiton and verification, (c) Data Classes, statistics and distribu￾tion, (d) Generated concept bank for NTU-RGB+D 120, SAHZU and IEEE Dataset. assisted annotation, publicly released. (3) Empirical evidence that fine-grained supervision combined with neurosymbolic reasoning reduces false positives on moveme… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Model Framework overview along with body part specific concept of seizure dataset, (b) Discrete logic layer example along with it’s corresponding rules, (c) Visual samples of normal and seizure activities along with rule interpretation guide. 2.3 Neurosymbolic Seizure Detection Framework Our framework ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative analysis on a SAHZU recording. (a) Ground-truth and predicted timelines with rule contributions for representative resting and seizure clips. (b) Ad￾ditional frames from an earlier segment. (c) t-SNE of rule activations: multi-label vs. binary formulation. 5 Conclusion We introduced the first neurosymbolic framework for video-based seizure detec￾tion, grounding every prediction in ILAE motor se… view at source ↗
read the original abstract

Video-based seizure detection is essential for the management of epilepsy patients, offering a non-invasive complement to electroencephalography. While several deep learning approaches have been developed for video-based seizure detection, none are inherently interpretable, limiting their adoption and translation into clinical practice. We present, to our knowledge, the first exploration of a neurosymbolic framework for video-based seizure detection that directly addresses this gap. Our approach (1) extracts patient-centric skeleton sequences from epilepsy monitoring units via a prompt-guided foundation model, (2) predicts binary spatio-temporal concept activations grounded in clinical motor semiology guidelines, and (3) composes them via differentiable logic into interpretable Boolean rules with auditable contributions. Furthermore, to mitigate false positives arising from the traditional binary formulation (seizure vs.\ non-seizure), we sub-classify non-seizure segments into clinically relevant normal activities, providing the model with fine-grained discriminative supervision. Evaluated on two public seizure video benchmarks, our framework achieves 89.78% sensitivity with 0.06 false detections per hour on SAHZU and 85.27%,0.09 on IEEE, while producing complete three-level interpretability: every prediction decomposes into which motor primitives were detected, how they were logically composed, and how much each rule contributed to the clinical decision. We publicly release all annotations, extracted pose sequences, our data pipeline and code, https://github.com/Mr-TalhaIlyas/CDSD/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes the first neurosymbolic framework for video-based seizure detection. It extracts patient-centric skeleton sequences from epilepsy monitoring videos using a prompt-guided foundation model, predicts binary spatio-temporal concept activations grounded in clinical motor semiology guidelines, composes them via differentiable logic into interpretable Boolean rules with auditable contributions, and sub-classifies non-seizure segments into normal activities for finer supervision. On the SAHZU and IEEE public benchmarks it reports 89.78% sensitivity / 0.06 false detections per hour and 85.27% / 0.09 respectively, together with three-level interpretability (motor primitives, logical composition, rule contributions). All annotations, pose sequences, pipeline and code are released publicly.

Significance. If the central claims hold, the work would be significant as the first neurosymbolic treatment of this clinical task, directly addressing the interpretability barrier that has limited adoption of prior deep-learning video detectors. The public release of annotations, extracted pose sequences, data pipeline and code is a clear strength that supports reproducibility and follow-on research.

major comments (2)
  1. [Abstract] Abstract: the reported sensitivity and false-detection rates rest on two unvalidated steps—prompt-guided foundation-model skeleton extraction from EMU videos (subject to motion blur, occlusions, non-standard poses) and binary concept activations asserted to be grounded in motor semiology guidelines—yet no quantitative metrics (pose error vs. manual landmarks, inter-rater agreement on concept labels, or ablation removing the grounding step) are supplied, preventing assessment of whether the data support the performance and interpretability claims.
  2. [Abstract] Abstract (points 1–2 of the approach): the framework’s three-level interpretability guarantee is only clinically meaningful if the extracted skeletons and concept activations are accurate; without reported validation of these steps the downstream differentiable logic produces auditable but potentially meaningless rules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for validation of the skeleton extraction and concept activation stages. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported sensitivity and false-detection rates rest on two unvalidated steps—prompt-guided foundation-model skeleton extraction from EMU videos (subject to motion blur, occlusions, non-standard poses) and binary concept activations asserted to be grounded in clinical motor semiology guidelines—yet no quantitative metrics (pose error vs. manual landmarks, inter-rater agreement on concept labels, or ablation removing the grounding step) are supplied, preventing assessment of whether the data support the performance and interpretability claims.

    Authors: We acknowledge that the manuscript does not report quantitative metrics such as pose estimation error against manual landmarks or inter-rater agreement on the binary concept labels. The skeleton extraction relies on a prompt-guided foundation model, and concept activations follow published clinical motor semiology guidelines, with all annotations and pose sequences released publicly to enable external verification. The reported performance figures are end-to-end results on the detection task. To address the concern, we will add an ablation that replaces the grounded concepts with ungrounded learned activations and quantify the performance difference; we will also add a dedicated limitations paragraph discussing the absence of these intermediate validation metrics. revision: yes

  2. Referee: [Abstract] Abstract (points 1–2 of the approach): the framework’s three-level interpretability guarantee is only clinically meaningful if the extracted skeletons and concept activations are accurate; without reported validation of these steps the downstream differentiable logic produces auditable but potentially meaningless rules.

    Authors: We agree that the clinical utility of the three-level interpretability (motor primitives, logical composition, rule contributions) presupposes reasonable accuracy in the skeleton and concept stages. The current manuscript demonstrates interpretability via qualitative examples and releases the full annotation and pose data for independent assessment. We will revise the abstract, approach description, and discussion sections to explicitly state this dependency and list it among the limitations. No additional experiments are required for this textual clarification. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework on external benchmarks

full rationale

The paper describes a three-stage pipeline (prompt-guided skeleton extraction, binary concept activation, differentiable logic composition) and reports sensitivity/FDR numbers on two public external datasets (SAHZU, IEEE). No equations, fitted parameters renamed as predictions, self-citation chains, or uniqueness theorems are present in the provided text. The central claims rest on empirical results rather than any derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the accuracy of the initial skeleton extraction and the clinical validity of the concept definitions; these are presented as inputs without supporting evidence or validation in the abstract.

axioms (2)
  • domain assumption Clinical motor semiology guidelines provide reliable binary spatio-temporal concepts for seizure detection
    Concept activations are grounded in these guidelines (abstract, point 2) with no validation or inter-rater details supplied.
  • domain assumption Prompt-guided foundation model accurately extracts patient-centric skeleton sequences from epilepsy monitoring videos
    This is the first processing step (abstract, point 1) upon which all downstream reasoning depends.

pith-pipeline@v0.9.1-grok · 5804 in / 1532 out tokens · 35738 ms · 2026-07-01T07:22:21.685726+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 8 canonical work pages · 5 internal anchors

  1. [1]

    Qwen3-VL Technical Report

    Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

  2. [2]

    Epileptic Disorders24(3), 447–495 (2022)

    Beniczky, S., Tatum, W.O., Blumenfeld, H., Stefan, H., Mani, J., Maillard, L., Fahoum, F., Vinayan, K.P., Mayor, L.C., Vlachou, M., et al.: Seizure semiology: Ilae glossary of terms and their significance. Epileptic Disorders24(3), 447–495 (2022)

  3. [3]

    Epilepsia66(6), 1804–1823 (2025)

    Beniczky, S., Trinka, E., Wirrell, E., Abdulla, F., Al Baradie, R., Alonso Vanegas, M., Auvin, S., Singh, M.B., Blumenfeld, H., Bogacz Fressola, A., et al.: Updated classification of epileptic seizures: Position paper of the international league against epilepsy. Epilepsia66(6), 1804–1823 (2025)

  4. [4]

    SAM 3: Segment Anything with Concepts

    Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala,K.V.,Khedr,H.,Huang,A.,etal.:Sam3:Segmentanythingwithconcepts. arXiv preprint arXiv:2511.16719 (2025)

  5. [5]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 13359–13368 (2021)

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Chi, H.g., Ha, M.H., Chi, S., Lee, S.W., Huang, Q., Ramani, K.: Infogcn: Repre- sentation learning for human skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20186– 20196 (2022)

  7. [7]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Gao, Y., Zhou, H., Gao, Z., Wang, B., Gao, S., Wang, S., Zhuang, X.: Learn- ing concept-driven logical rules for interpretable and generalizable medical im- age classification. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 291–300. Springer (2025)

  8. [8]

    The Llama 3 Herd of Models

    Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

  9. [9]

    Hou, J.C., Thonnat, M., Bartolomei, F., McGonigal, A.: Automated video analysis ofemotionanddystoniainepilepticseizures.EpilepsyResearch184,106953(2022)

  10. [10]

    Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action Recognition

    Ilyas, T., Mehta, D., Ge, Z.: Neurosymbolic framework for concept-driven logical reasoning in skeleton-based human action recognition. arXiv preprint arXiv:2605.07140 (2026)

  11. [11]

    In: International Workshop on Ap- plications of Medical AI

    Ilyas, T., Mehta, D., Sivathamboo, S., Wijaya, I., Steele, R., Simpson, H., Millist, L., O’Brien, T., Kwan, P., Ge, Z.: Privacy-centric seizure diagnosis via relation- aware fusion of minimally-invasive modalities. In: International Workshop on Ap- plications of Medical AI. pp. 173–183. Springer (2025) 10 T. Ilyas et al

  12. [12]

    Epilepsy & Behavior41, 197–202 (2014)

    Jin, B., Wu, H., Xu, J., Yan, J., Ding, Y., Wang, Z.I., Guo, Y., Wang, Z., Shen, C., Chen, Z., et al.: Analyzing reliability of seizure diagnosis based on semiology. Epilepsy & Behavior41, 197–202 (2014)

  13. [13]

    In: European Conference on Computer Vision

    Jin, S., Xu, L., Xu, J., Wang, C., Liu, W., Qian, C., Ouyang, W., Luo, P.: Whole- body human pose estimation in the wild. In: European Conference on Computer Vision. pp. 196–214. Springer (2020)

  14. [14]

    Scientific Reports12(1), 19571 (2022)

    Karácsony, T., Loesch-Biffar, A.M., Vollmar, C., Rémi, J., Noachtar, S., Cunha, J.P.S.: Novel 3d video action recognition deep learning approach for near real time epileptic seizure classification. Scientific Reports12(1), 19571 (2022)

  15. [15]

    In: European Conference on Computer Vision

    Khirodkar, R., Bagautdinov, T., Martinez, J., Zhaoen, S., James, A., Selednik, P., Anderson, S., Saito, S.: Sapiens: Foundation for human vision models. In: European Conference on Computer Vision. pp. 206–228. Springer (2024)

  16. [16]

    In: International conference on machine learning

    Koh, P.W., Nguyen, T., Tang, Y.S., Mussmann, S., Pierson, E., Kim, B., Liang, P.: Concept bottleneck models. In: International conference on machine learning. pp. 5338–5348. PMLR (2020)

  17. [17]

    Li,Y.L.,Liu,X.,Wu,X.,Li,Y.,Qiu,Z.,Xu,L.,Xu,Y.,Fang,H.S.,Lu,C.:Hake:A knowledgeenginefoundationforhumanactivityunderstanding.IEEETransactions on Pattern Analysis and Machine Intelligence45(7), 8494–8506 (2022)

  18. [18]

    Liang, J.: Seizure videos of epilepsy patients (2024).https://doi.org/10.21227/ nt6e-3x56,https://dx.doi.org/10.21227/nt6e-3x56

  19. [19]

    IEEE trans- actions on pattern analysis and machine intelligence42(10), 2684–2701 (2019)

    Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.Y., Kot, A.C.: Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE trans- actions on pattern analysis and machine intelligence42(10), 2684–2701 (2019)

  20. [20]

    arXiv preprint arXiv:2506.00915 (2025)

    Liu, M., Liu, H., Hu, Q., Ren, B., Yuan, J., Lin, J., Wen, J.: 3d skeleton-based action recognition: A review. arXiv preprint arXiv:2506.00915 (2025)

  21. [21]

    In: International Con- ference on Medical Image Computing and Computer-Assisted Intervention

    Mehta, D., Sivathamboo, S., Simpson, H., Kwan, P., O’Brien, T., Ge, Z.: Privacy- preserving early detection of epileptic seizures in videos. In: International Con- ference on Medical Image Computing and Computer-Assisted Intervention. pp. 210–219. Springer (2023)

  22. [22]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Pérez-García, F., Scott, C., Sparks, R., Diehl, B., Ourselin, S.: Transfer learning of deep spatiotemporal networks to model arbitrarily long videos of seizures. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 334–344. Springer (2021)

  23. [23]

    In: Australasian Joint Conference on Artificial Intelligence

    Ponnambalam, K.G., Ilyas, T., Sivathamboo, S., Ge, Z., Kwan, P., Kuhlmann, L., Mehta, D.: Privacy-centric seizure detection using surface normals, pose and segmentation masks. In: Australasian Joint Conference on Artificial Intelligence. pp. 322–334. Springer (2025)

  24. [24]

    In: 2024 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

    Rehman,M.U.,Ilyas,T.,Seneviratne,L.,Hussain,I.:Enhancedgesturerecognition through graph-based multimodal fusion. In: 2024 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). pp. 1–5. IEEE (2024)

  25. [25]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert- networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (11 2019),https: //arxiv.org/abs/1908.10084

  26. [26]

    npj Digital Medicine7(1), 42 (2024)

    Saab, K., Tang, S., Taha, M., Lee-Messer, C., Re, C., Rubin, D.L.: Towards trust- worthy seizure onset detection using workflow notes. npj Digital Medicine7(1), 42 (2024)

  27. [27]

    Epilepsy & Behavior126, 108455 (2022) Interpretable Concept Driven Seizure Detection 11

    Turek, G., Skjei, K.: Seizure semiology, localization, and the 2017 ilae seizure clas- sification. Epilepsy & Behavior126, 108455 (2022) Interpretable Concept Driven Seizure Detection 11

  28. [28]

    arXiv preprint arXiv:2512.07383 (2025)

    Vemuri, D.S., Bellamkonda, G., Pola, A., Balasubramanian, V.N.: Logiccbms: Logic-enhanced concept-based learning. arXiv preprint arXiv:2512.07383 (2025)

  29. [29]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(2), 1121–1133 (2023)

    Wang, Z., Zhang, W., Liu, N., Wang, J.: Learning interpretable rules for scalable data representation and classification. IEEE Transactions on Pattern Analysis and Machine Intelligence46(2), 1121–1133 (2023)

  30. [30]

    World: Epilepsy (Feb 2024),https://www.who.int/news-room/fact-sheets/ detail/epilepsy

  31. [31]

    In: European Conference on Computer Vision

    Xu, Y., Wang, J., Chen, Y.H., Yang, J., Ming, W., Wang, S., Sawan, M.: Vsvig: Real-time video-based seizure detection via skeleton-based spatiotemporal vig. In: European Conference on Computer Vision. pp. 228–245. Springer (2024)

  32. [32]

    IEEE Journal of Biomedical and Health Informatics25(8), 2997–3008 (2021)

    Yang, Y., Sarkis, R.A., El Atrache, R., Loddenkemper, T., Meisel, C.: Video-based detection of generalized tonic-clonic seizures using deep learning. IEEE Journal of Biomedical and Health Informatics25(8), 2997–3008 (2021)

  33. [33]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yang, Y., Panagopoulou, A., Zhou, S., Jin, D., Callison-Burch, C., Yatskar, M.: Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19187–19197 (2023)

  34. [34]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Zhou, Y., Xu, T., Wu, C., Wu, X., Kittler, J.: Adaptive hyper-graph convolution network for skeleton-based human action recognition with virtual connections. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12648–12658 (2025)