Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding
Pith reviewed 2026-05-22 08:08 UTC · model grok-4.3
The pith
A dataset of 438 annotated seizure videos and a seven-task benchmark enable multimodal models to reach 0.96 F1 on epileptic versus non-epileptic classification after seizure-specific fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a clinically grounded dataset of 438 videos with dense annotations on 20 semiological features, combined with a seven-task benchmark and the Seizure-RQI metric, reveals systematic weaknesses in current multimodal large language models; seizure-specific fine-tuning substantially improves performance across the tasks, while a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification.
What carries the argument
The seven-task hierarchical benchmark that moves from low-level visual perception through temporal sequencing and narrative report generation to final seizure diagnosis, evaluated with the Seizure-RQI for clinical faithfulness.
If this is right
- General multimodal models exhibit repeatable failures in laterality reasoning, temporal localization, and symptom sequencing on seizure videos.
- Seizure-specific fine-tuning produces measurable gains on every task in the seven-task hierarchy.
- The two-stage neuro-symbolic framework delivers an F1 of 0.96 on the binary epileptic versus non-epileptic classification task.
- The Seizure-RQI metric supplies a structured way to judge whether generated reports match clinical expectations for detail and accuracy.
- The benchmark can serve as a testbed for developing domain-adapted multimodal systems intended for safety-critical medical video analysis.
Where Pith is reading between the lines
- Similar annotated video collections could be created for other paroxysmal neurological events such as syncope or movement disorders to test cross-domain transfer.
- If the reported gains hold on prospective clinical recordings, the approach might reduce inter-rater variability in epilepsy monitoring unit reviews.
- The identified weaknesses in temporal ordering suggest that future video models will need explicit mechanisms for modeling symptom evolution over seconds to minutes.
- The dataset size and annotation density make it feasible to explore whether smaller, specialized models can match the performance of large general models after targeted training.
Load-bearing premise
The expert annotations on the 438 videos are accurate, consistent, and representative of real-world clinical seizure diversity so the benchmark and Seizure-RQI capture meaningful requirements.
What would settle it
Performance of the fine-tuned models and the neuro-symbolic framework drops sharply when the same tasks are run on a fresh set of seizure videos annotated by an independent group of clinicians using different labeling criteria.
Figures
read the original abstract
While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in general video understanding, their capacity to interpret involuntary, and spatio-temporally evolving pathologic motor behaviors such as seizure semiology remains largely untested. To address this gap, we introduce Seizure-Semiology-Suite, a clinically grounded dataset and benchmark for fine-grained, structured seizure semiology understanding. The dataset includes 438 seizure videos annotated with over 35,000 dense labels covering 20 ILAE-defined semiological features. Building on this dataset, we propose a seven-task hierarchical benchmark that systematically evaluates MLLMs from low-level visual perception to temporal sequencing, narrative report generation, and seizure diagnosis. To enable clinically meaningful evaluation of generated reports, we further introduce the Report Quality Index for Seizure Semiology (Seizure-RQI). Extensive baselines across 11 open-weight MLLMs reveal systematic weaknesses in laterality reasoning, temporal localization, symptom sequencing, and clinically faithful reporting. We show that seizure-specific fine-tuning substantially improves performance across tasks, and that a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification. Seizure-Semiology-Suite establishes a rigorous benchmark for evaluating multimodal models in safety-critical medical video understanding and guides the development of clinically reliable, domain-adaptive multimodal intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Seizure-Semiology-Suite (S3), a dataset of 438 seizure videos with over 35,000 dense labels for 20 ILAE-defined semiological features. It defines a seven-task hierarchical benchmark evaluating MLLMs from low-level visual perception through temporal sequencing and narrative generation to seizure diagnosis, introduces the Seizure-RQI metric for clinically meaningful report assessment, and reports baselines on 11 open-weight MLLMs that reveal weaknesses in laterality, localization, sequencing, and faithful reporting. Seizure-specific fine-tuning yields consistent gains, and a two-stage neuro-symbolic framework reaches an F1 of 0.96 on epileptic versus non-epileptic classification.
Significance. If the annotations prove reliable and representative, the work supplies a much-needed structured benchmark and evaluation protocol for safety-critical medical video understanding. The systematic comparison of 11 models, the demonstration of domain-specific fine-tuning benefits, and the introduction of Seizure-RQI constitute concrete contributions that could guide future model development in this domain.
major comments (2)
- [Dataset section] Dataset section: the manuscript provides no inter-rater reliability statistics (e.g., Cohen’s or Fleiss’ kappa), expert count, annotation protocol, or consensus procedure for the 35,000 labels across the 438 videos and 20 ILAE features. This is load-bearing for the central empirical claims because both the seven-task benchmark and the reported 0.96 F1 on epileptic versus non-epileptic classification presuppose that the ground-truth labels are accurate, consistent, and clinically representative.
- [Evaluation section] Evaluation section: exact definitions of the per-task metrics and the precise formulation of Seizure-RQI are not supplied, nor are the train/validation/test splits or any cross-validation procedure. Without these details the quantitative improvements attributed to seizure-specific fine-tuning cannot be fully reproduced or interpreted.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly reference the ILAE 2017 classification standards when listing the 20 semiological features.
- [Figures] Figure captions for example video frames and annotation visualizations would benefit from additional detail on what each label represents.
Simulated Author's Rebuttal
We sincerely thank the referee for their detailed and constructive comments. We have carefully considered each point and provide our responses below. We will make the necessary revisions to address the concerns regarding dataset annotation details and evaluation specifics.
read point-by-point responses
-
Referee: [Dataset section] Dataset section: the manuscript provides no inter-rater reliability statistics (e.g., Cohen’s or Fleiss’ kappa), expert count, annotation protocol, or consensus procedure for the 35,000 labels across the 438 videos and 20 ILAE features. This is load-bearing for the central empirical claims because both the seven-task benchmark and the reported 0.96 F1 on epileptic versus non-epileptic classification presuppose that the ground-truth labels are accurate, consistent, and clinically representative.
Authors: We agree that these details are critical for validating our ground-truth annotations. In the revised manuscript, we will add comprehensive information on the annotation protocol, including the number of experts (clinical neurologists), the consensus procedure, and inter-rater reliability measures such as Fleiss' kappa for the multi-label annotations across the 20 features. This will strengthen the credibility of the benchmark and the reported performance metrics. revision: yes
-
Referee: [Evaluation section] Evaluation section: exact definitions of the per-task metrics and the precise formulation of Seizure-RQI are not supplied, nor are the train/validation/test splits or any cross-validation procedure. Without these details the quantitative improvements attributed to seizure-specific fine-tuning cannot be fully reproduced or interpreted.
Authors: We acknowledge the need for precise specifications to ensure reproducibility. The revised manuscript will include exact definitions and formulas for all per-task metrics, the complete mathematical formulation of the Seizure-RQI metric, detailed descriptions of the train/validation/test splits (with patient-level stratification to avoid data leakage), and any cross-validation methods used in the experiments. revision: yes
Circularity Check
No circularity: empirical dataset and benchmark paper with direct evaluations only
full rationale
This is a dataset introduction and empirical benchmark paper. The abstract and described content contain no mathematical derivations, equations, fitted parameters, predictions, or self-referential chains. Performance claims (seizure-specific fine-tuning gains and 0.96 F1 on epileptic vs. non-epileptic classification) are presented as direct model evaluation results on the 438-video dataset with 35,000 labels. No load-bearing steps reduce to inputs by construction, self-citation, or renaming; the seven-task benchmark and Seizure-RQI are defined explicitly from the new annotations rather than derived from prior results. The paper is self-contained against external benchmarks as an empirical contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =
work page 2000
-
[2]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
work page 1980
-
[3]
M. J. Kearns , title =
-
[4]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
work page 1983
-
[5]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
work page 2000
-
[6]
Suppressed for Anonymity , author=
-
[7]
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
work page 1981
-
[8]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
work page 1959
-
[9]
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) , year =
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , author =. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) , year =
-
[10]
Epilepsy & Behavior , volume = 154, pages =
Ahmedt-Aristizabal, David and Armin, Mohammad Ali and Hayder, Zeeshan and Garcia-Cairasco, Norberto and Petersson, Lars and Fookes, Clinton and Denman, Simon and McGonigal, Aileen , title =. Epilepsy & Behavior , volume = 154, pages =
-
[11]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Activitynet-qa: A dataset for understanding complex web videos via question answering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[12]
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , year=
Xu, Jun and Mei, Tao and Yao, Ting and Rui, Yong , booktitle=. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , year=
-
[13]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Hong, Wenyi and Cheng, Yean and Yang, Zhuoyi and Wang, Weihan and Wang, Lefan and Gu, Xiaotao and Huang, Shiyu and Dong, Yuxiao and Tang, Jie , title =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[14]
Journal of Clinical Neurology , year =
Krikor Tufenkjian and Hans O Lüders , title =. Journal of Clinical Neurology , year =
-
[15]
Gupta, Deepak and Attal, Kush and Demner-Fushman, Dina , title =. Scientific Data , volume =. 2023 , doi =
work page 2023
-
[16]
Beniczky, Simona Alexandra and Fogarasi, András and Neufeld, Miri and Andersen, Noémi Becser and Wolf, Peter and van Emde Boas, Walter and Beniczky, Sándor , title =. Epilepsy & Behavior , year =
-
[17]
IEEE Journal of Biomedical and Health Informatics , volume =
Yang, Yonghua and Sarkis, Rani A and Atrache, Rima EI and Loddenkemper, Tobias and Meisel, Christian , title =. IEEE Journal of Biomedical and Health Informatics , volume =. 2021 , doi =
work page 2021
-
[18]
and Kotagal, Prakash , title =
Elwan, Sherif and Alexopoulos, Andreas and Silveira, Diosely C. and Kotagal, Prakash , title =. Seizure , volume =. 2018 , doi =
work page 2018
-
[19]
arXiv preprint arXiv:2505.02064 , year=
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video , author=. arXiv preprint arXiv:2505.02064 , year=
-
[20]
arXiv preprint arXiv:2512.22905 , year=
Javisgpt: A unified multi-modal llm for sounding-video comprehension and generation , author=. arXiv preprint arXiv:2512.22905 , year=
-
[21]
SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , year=
Jin, Yueming and Dou, Qi and Chen, Hao and Yu, Lequan and Qin, Jing and Fu, Chi-Wing and Heng, Pheng-Ann , journal=. SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , year=
-
[22]
Perez, D. L. and LaFrance, W. C., Jr. , title =. CNS Spectrums , volume =. 2016 , doi =
work page 2016
-
[24]
Practice guideline summary: Sudden unexpected death in epilepsy incidence rates and risk factors: Report of the Guideline Development, Dissemination, and Implementation Subcommittee of the American Academy of Neurology and the American Epilepsy Society , author=. Neurology , volume=. 2017 , publisher=
work page 2017
-
[26]
Frontal lobe seizures: from clinical semiology to localization , author=. Epilepsia , volume=. 2012 , doi=
work page 2012
-
[27]
Lateralizing significance of head and eye deviation in secondary generalized tonic-clonic seizures , author=. Neurology , volume=. 1993 , doi=
work page 1993
-
[28]
Lateralizing value of asymmetric tonic limb posturing observed in secondarily generalized tonic-clonic seizures , author=. Epilepsia , volume=. 2000 , doi=
work page 2000
-
[30]
Prospective study of epilepsy with generalized tonic--clonic seizures alone: Clinical features, response to treatment, and likelihood of medication withdrawal , author=. Epilepsia Open , volume=. 2024 , publisher=
work page 2024
-
[31]
Journal of Clinical Neurophysiology , year =
Value and limitations of seizure semiology in localizing seizure onset , author =. Journal of Clinical Neurophysiology , year =. doi:10.1097/01.wnp.0000228498.71365.7b , pmid =
-
[32]
Journal of clinical neurology (Seoul, Korea) , volume=
Seizure semiology: its value and limitations in localizing the epileptogenic zone , author=. Journal of clinical neurology (Seoul, Korea) , volume=
-
[34]
CONTINUUM: Lifelong Learning in Neurology , volume=
Diagnosis and treatment of nonepileptic seizures , author=. CONTINUUM: Lifelong Learning in Neurology , volume=. 2016 , publisher=
work page 2016
-
[35]
Annals of internal medicine , volume=
Clinical documentation in the 21st century: executive summary of a policy position paper from the American College of Physicians , author=. Annals of internal medicine , volume=. 2015 , publisher=
work page 2015
-
[36]
JAMA internal medicine , volume=
Medical documentation burden among US office-based physicians in 2019: a national study , author=. JAMA internal medicine , volume=. 2022 , publisher=
work page 2019
-
[43]
Advances in Neural Information Processing Systems , volume=
Audio flamingo 3: Advancing audio intelligence with fully open large audio language models , author=. Advances in Neural Information Processing Systems , volume=
-
[45]
Advances in neural information processing systems , volume =
Training language models to follow instructions with human feedback , author =. Advances in neural information processing systems , volume =
-
[47]
Nature Communications , volume =
The Medical Segmentation Decathlon , author =. Nature Communications , volume =. 2022 , doi =
work page 2022
-
[50]
What’s left can’t be right – The remaining positional incompetence of contrastive vision-language models , author =. 2023 , eprint =. doi:10.48550/arXiv.2311.11477 , url =
-
[51]
Seizure semiology: ILAE glossary of terms and their significance , author =. Epileptic Disorders , year =. doi:10.1684/epd.2022.1430 , pmid =
-
[52]
The diagnostic utility of the ictal cry , author=. Epilepsy & Behavior , volume=. 2010 , publisher=
work page 2010
-
[53]
Neuropsychiatric disease and treatment , pages=
Long-term video EEG monitoring for diagnosis of psychogenic nonepileptic seizures , author=. Neuropsychiatric disease and treatment , pages=. 2014 , publisher=
work page 2014
-
[54]
Journal of Clinical Neuroscience , volume=
Epilepsy with eyelid myoclonia: A systematic review and meta-analysis , author=. Journal of Clinical Neuroscience , volume=. 2025 , publisher=
work page 2025
-
[55]
MAMC Journal of Medical Sciences , volume=
Psychogenic nonepileptic seizures (PNES): A review , author=. MAMC Journal of Medical Sciences , volume=. 2019 , publisher=
work page 2019
-
[56]
“Convulsive” nonepileptic seizures have a characteristic pattern of rhythmic artifact distinguishing them from convulsive epileptic seizures , author=. Epilepsia , volume=. 2004 , publisher=
work page 2004
-
[57]
Neurology: Clinical Practice , volume=
Using semiology to classify epileptic seizures vs psychogenic nonepileptic seizures: a meta-analysis , author=. Neurology: Clinical Practice , volume=. 2022 , publisher=
work page 2022
-
[58]
Journal of neurology , volume=
Predictive semiology of psychogenic non-epileptic seizures in an epilepsy monitoring unit , author=. Journal of neurology , volume=. 2022 , publisher=
work page 2022
-
[59]
The Lancet Neurology , volume=
Impaired consciousness in epilepsy , author=. The Lancet Neurology , volume=. 2012 , publisher=
work page 2012
-
[60]
Clinical features of automatisms and correlation with the seizure onset zones: A cluster analysis of 74 surgically-treated cases , author=. Seizure , volume=. 2022 , publisher=
work page 2022
-
[61]
Journal of Clinical Neurophysiology , volume=
Mesial temporal lobe epilepsy , author=. Journal of Clinical Neurophysiology , volume=. 2012 , publisher=
work page 2012
-
[62]
NREM arousal parasomnias and their distinction from nocturnal frontal lobe epilepsy: a video EEG analysis , author=. Sleep , volume=. 2009 , publisher=
work page 2009
-
[63]
Journal of personalized medicine , volume=
The reciprocal relationship between sleep and epilepsy , author=. Journal of personalized medicine , volume=. 2024 , publisher=
work page 2024
-
[64]
Neurology: Clinical Practice , volume=
Semiology and neurophysiology of clonic seizures: a report of 39 patients , author=. Neurology: Clinical Practice , volume=. 2024 , publisher=
work page 2024
-
[65]
The generalized tonic-clonic seizure in partial versus generalized epilepsy: semiologic differences , author=. Epilepsia , volume=. 1999 , publisher=
work page 1999
-
[66]
Handbook of Clinical Neurology , volume=
Motor seizure semiology , author=. Handbook of Clinical Neurology , volume=. 2023 , publisher=
work page 2023
-
[67]
Glossary of descriptive terminology for ictal semiology: Report of the ILAE task force on classification and terminology , author =. Epilepsia , year =
-
[69]
Current and future state of evaluation of large language models for medical summarization tasks , author=. Npj health systems , volume=. 2025 , publisher=
work page 2025
-
[70]
Findings of the association for computational linguistics: EMNLP 2024 , pages=
Green: Generative radiology report evaluation and error notation , author=. Findings of the association for computational linguistics: EMNLP 2024 , pages=
work page 2024
-
[71]
International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=
Privacy-preserving early detection of epileptic seizures in videos , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2023 , organization=
work page 2023
-
[72]
Video-based detection of tonic–clonic seizures using a three-dimensional convolutional neural network , author=. Epilepsia , pages=
-
[73]
Automated detection of tonic–clonic seizures using 3-D accelerometry and surface electromyography in pediatric patients , author=. Epilepsy & Behavior , pages=
-
[74]
Medical Engineering & Physics , pages=
Automatic segmentation of episodes containing epileptic clonic seizures in video sequences , author=. Medical Engineering & Physics , pages=
-
[75]
IEEE Transactions on Biomedical Engineering , year =
Automatic Segmentation of Episodes Containing Epileptic Clonic Seizures in Video Sequences , author =. IEEE Transactions on Biomedical Engineering , year =
-
[76]
Convulsive seizure detection using a wrist-worn electrodermal activity and accelerometry biosensor , author =. Epilepsia , year =
-
[77]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =
What's ``up'' with vision-language models? Investigating their struggle with spatial reasoning , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =. 2023 , address =
work page 2023
- [79]
-
[80]
S., Khashman, M., and Ahmed, M
Ahmed, H. S., Khashman, M., and Ahmed, M. T. Epilepsy with eyelid myoclonia: A systematic review and meta-analysis. Journal of Clinical Neuroscience, 140: 0 111535, 2025
work page 2025
-
[82]
Blair, R. D. G. Temporal lobe epilepsy semiology. Epilepsy Research and Treatment, 2012: 0 751510, 2012. doi:10.1155/2012/751510. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3420439/
-
[83]
T., L \"u ders, H., Mizrahi, E., Tassinari, C., van Emde Boas, W., and Engel, J
Blume, W. T., L \"u ders, H., Mizrahi, E., Tassinari, C., van Emde Boas, W., and Engel, J. Glossary of descriptive terminology for ictal semiology: Report of the ilae task force on classification and terminology. Epilepsia, 42 0 (9): 0 1212--1218, 2001. doi:10.1046/j.1528-1157.2001.22001.x
-
[84]
Impaired consciousness in epilepsy
Blumenfeld, H. Impaired consciousness in epilepsy. The Lancet Neurology, 11 0 (9): 0 814--826, 2012
work page 2012
-
[85]
Bonini, F., McGonigal, A., Trébuchon, A., Gavaret, M., Bartolomei, F., and Guye, M. Frontal lobe seizures: from clinical semiology to localization. Epilepsia, 53 0 (5): 0 815--826, 2012. doi:10.1111/j.1528-1167.2012.03440.x
-
[86]
Boyne, J. et al. Video-based detection of tonic–clonic seizures using a three-dimensional convolutional neural network. Epilepsia, pp.\ 2495--2506, 2025
work page 2025
-
[87]
Psychogenic nonepileptic seizures (pnes): A review
Chaudhry, N., Dhamija, K., and Puri, V. Psychogenic nonepileptic seizures (pnes): A review. MAMC Journal of Medical Sciences, 5 0 (1): 0 1--7, 2019
work page 2019
-
[88]
Chen, D. K. and LaFrance Jr, W. C. Diagnosis and treatment of nonepileptic seizures. CONTINUUM: Lifelong Learning in Neurology, 22 0 (1): 0 116--131, 2016
work page 2016
-
[89]
A., Silva, R., Whatley, B., and Walker, M
Chowdhury, F. A., Silva, R., Whatley, B., and Walker, M. C. Localisation in focal epilepsy: a practical guide. Practical Neurology, 21 0 (6): 0 481--491, 2021. doi:10.1136/practneurol-2019-002341
-
[90]
Current and future state of evaluation of large language models for medical summarization tasks
Croxford, E., Gao, Y., Pellegrino, N., Wong, K., Wills, G., First, E., Liao, F., Goswami, C., Patterson, B., and Afshar, M. Current and future state of evaluation of large language models for medical summarization tasks. Npj health systems, 2 0 (1): 0 6, 2025
work page 2025
-
[91]
Derry, C. P., Harvey, A. S., Walker, M. C., Duncan, J. S., and Berkovic, S. F. Nrem arousal parasomnias and their distinction from nocturnal frontal lobe epilepsy: a video eeg analysis. Sleep, 32 0 (12): 0 1637--1644, 2009
work page 2009
-
[92]
J., Peric, I., Boston, R., and Seneviratne, U
Duncan, A. J., Peric, I., Boston, R., and Seneviratne, U. Predictive semiology of psychogenic non-epileptic seizures in an epilepsy monitoring unit. Journal of neurology, 269 0 (4): 0 2172--2178, 2022
work page 2022
-
[93]
Elzawahry, H., Do, C. S., Lin, K., and Benbadis, S. The diagnostic utility of the ictal cry. Epilepsy & Behavior, 18 0 (3): 0 306--307, 2010
work page 2010
-
[94]
Fotedar, N., Acar, A., Hakami, S., Praditukrit, K., Morris, A., dela Vega, M., Fernandez-BacaVaca, G., and L \"u ders, H. O. Semiology and neurophysiology of clonic seizures: a report of 39 patients. Neurology: Clinical Practice, 14 0 (2): 0 e200252, 2024
work page 2024
-
[95]
Gedzelman, E. R. and LaRoche, S. M. Long-term video eeg monitoring for diagnosis of psychogenic nonepileptic seizures. Neuropsychiatric disease and treatment, pp.\ 1979--1986, 2014
work page 1979
-
[96]
Audio flamingo 3: Advancing audio intelligence with fully open large audio language models
Ghosh, S., Goel, A., Kim, J., Kumar, S., Kong, Z., Lee, S.-g., Yang, C.-H., Duraiswami, R., Manocha, D., Valle, R., et al. Audio flamingo 3: Advancing audio intelligence with fully open large audio language models. Advances in Neural Information Processing Systems, 38: 0 41819--41886, 2026
work page 2026
-
[97]
A dataset for medical instructional video classification and question answering
Gupta, D., Attal, K., and Demner-Fushman, D. A dataset for medical instructional video classification and question answering. Scientific Data, 10 0 (1): 0 158, 2023. doi:10.1038/s41597-023-02036-y
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.