In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics

Papineni, K · 2002

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

CT-SpatialVQA benchmark shows 3D medical VLMs achieve only 34% average accuracy on semantic-spatial reasoning tasks in CT volumes, often below random chance.

GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.

On the Factual Consistency of Text-based Explainable Recommendation Models

cs.IR · 2025-12-30 · unverdicted · novelty 7.0

A prompting pipeline and statement-level metrics show that six state-of-the-art text-based explainable recommendation models achieve high semantic similarity but very low factual consistency on Amazon review data.

Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Hi-GaTA is a hierarchical gated temporal aggregation adapter that uses short-to-long temporal pyramids and gated fusion to enable surgical video report generation, backed by a new 214-video benchmark and a surgical ViViT pretrained on 40,000 minutes of video.

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

IMU-to-4D uses wearable IMU data and repurposed LLMs to predict coherent 4D human motion plus coarse scene structure, outperforming cascaded state-of-the-art pipelines in temporal stability.

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

cs.CV · 2026-05-18 · unverdicted · novelty 4.0 · 2 refs

Proposes MedFM-Robust benchmark to evaluate robustness of medical vision-language and segmentation foundation models for clinical reliability.

Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation

cs.IR · 2026-04-07 · unverdicted · novelty 4.0

Curr-RLCER applies curriculum reinforcement learning with coherence-driven rewards to align generated explanations with predicted ratings in explainable recommendation systems.

citing papers explorer

Showing 7 of 7 citing papers.

Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models cs.CV · 2026-05-09 · unverdicted · none · ref 14
CT-SpatialVQA benchmark shows 3D medical VLMs achieve only 34% average accuracy on semantic-spatial reasoning tasks in CT volumes, often below random chance.
GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models cs.CV · 2026-04-12 · unverdicted · none · ref 39
GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.
On the Factual Consistency of Text-based Explainable Recommendation Models cs.IR · 2025-12-30 · unverdicted · none · ref 23
A prompting pipeline and statement-level metrics show that six state-of-the-art text-based explainable recommendation models achieve high semantic similarity but very low factual consistency on Amazon review data.
Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation cs.CV · 2026-05-11 · unverdicted · none · ref 17 · 2 links
Hi-GaTA is a hierarchical gated temporal aggregation adapter that uses short-to-long temporal pyramids and gated fusion to enable surgical video report generation, backed by a new 214-video benchmark and a surgical ViViT pretrained on 40,000 minutes of video.
Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs cs.CV · 2026-04-23 · unverdicted · none · ref 65
IMU-to-4D uses wearable IMU data and repurposed LLMs to predict coherent 4D human motion plus coarse scene structure, outperforming cascaded state-of-the-art pipelines in temporal stability.
MedFM-Robust: Benchmarking Robustness of Medical Foundation Models cs.CV · 2026-05-18 · unverdicted · none · ref 15 · 2 links
Proposes MedFM-Robust benchmark to evaluate robustness of medical vision-language and segmentation foundation models for clinical reliability.
Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation cs.IR · 2026-04-07 · unverdicted · none · ref 16
Curr-RLCER applies curriculum reinforcement learning with coherence-driven rewards to align generated explanations with predicted ratings in explainable recommendation systems.

In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer