SurgViVQA adds temporal video encoding to surgical VideoQA and reports 9-11% gains in keyword accuracy over image-only baselines on two datasets plus improved robustness to question rephrasing.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
A multimodal fusion model using cognitive appraisal theory, transformers, and fuzzy logic predicts video-induced pleasure levels with 0.6624 accuracy by inferring appraisal variables.
citing papers explorer
-
SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
SurgViVQA adds temporal video encoding to surgical VideoQA and reports 9-11% gains in keyword accuracy over image-only baselines on two datasets plus improved robustness to question rephrasing.
-
Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion
A multimodal fusion model using cognitive appraisal theory, transformers, and fuzzy logic predicts video-induced pleasure levels with 0.6624 accuracy by inferring appraisal variables.