SurgViVQA adds temporal video encoding to surgical VideoQA and reports 9-11% gains in keyword accuracy over image-only baselines on two datasets plus improved robustness to question rephrasing.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
A multimodal fusion model using cognitive appraisal theory, transformers, and fuzzy logic predicts video-induced pleasure levels with 0.6624 accuracy by inferring appraisal variables.
citing papers explorer
No citing papers match the current filters.