Title resolution pending

· 2015 · DOI 10.1109/tmm

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

cs.MM · 2026-04-15 · unverdicted · novelty 8.0

AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.

RoleMAG: Learning Neighbor Roles in Multimodal Graphs

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

RoleMAG learns neighbor roles in multimodal graphs to route shared, complementary, and heterophilous signals through separate channels, improving propagation without modality interference.

Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing

cs.CV · 2025-10-03 · unverdicted · novelty 7.0

A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.

CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification

cs.CL · 2025-10-19 · unverdicted · novelty 6.0

CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.

Towards Open World Sound Event Detection

cs.SD · 2026-05-05 · unverdicted · novelty 5.0 · 2 refs

Introduces OW-SED paradigm and WOOT framework with deformable attention for detecting known and unseen sound events in open-world settings.

SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users

cs.CV · 2026-03-12 · unverdicted · novelty 5.0

SafeScreen enforces individualized safety constraints as a prerequisite for video retrieval by using profile extraction, adaptive VideoRAG analysis, and LLM decision-making to approve content for vulnerable users.

Structural and Disentangled Adaptation of Large Vision Language Models for Multimodal Recommendation

cs.IR · 2025-12-07 · unverdicted · novelty 5.0

SDA uses structural alignment as a soft teacher and gated low-rank expert paths to adapt LVLMs for multimodal recommendation, reporting 6.15% Hit@10 and 8.64% NDCG@10 average gains plus larger long-tail improvements on Amazon datasets.

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

cs.CV · 2025-08-28 · unverdicted · novelty 3.0

A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

citing papers explorer

Showing 8 of 8 citing papers.

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction cs.MM · 2026-04-15 · unverdicted · none · ref 23
AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.
RoleMAG: Learning Neighbor Roles in Multimodal Graphs cs.LG · 2026-04-14 · unverdicted · none · ref 20
RoleMAG learns neighbor roles in multimodal graphs to route shared, complementary, and heterophilous signals through separate channels, improving propagation without modality interference.
Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing cs.CV · 2025-10-03 · unverdicted · none · ref 52
A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.
CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification cs.CL · 2025-10-19 · unverdicted · none · ref 9
CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.
Towards Open World Sound Event Detection cs.SD · 2026-05-05 · unverdicted · none · ref 18 · 2 links
Introduces OW-SED paradigm and WOOT framework with deformable attention for detecting known and unseen sound events in open-world settings.
SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users cs.CV · 2026-03-12 · unverdicted · none · ref 31
SafeScreen enforces individualized safety constraints as a prerequisite for video retrieval by using profile extraction, adaptive VideoRAG analysis, and LLM decision-making to approve content for vulnerable users.
Structural and Disentangled Adaptation of Large Vision Language Models for Multimodal Recommendation cs.IR · 2025-12-07 · unverdicted · none · ref 11
SDA uses structural alignment as a soft teacher and gated low-rank expert paths to adapt LVLMs for multimodal recommendation, reporting 6.15% Hit@10 and 8.64% NDCG@10 average gains plus larger long-tail improvements on Amazon datasets.
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding cs.CV · 2025-08-28 · unverdicted · none · ref 116
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer