WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
years
2026 8representative citing papers
Occlusion-aware keyframe selection via structural, cycle-consistent tracking, and vision-language criteria improves diffusion video editing robustness without manual annotations.
DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.
citing papers explorer
-
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
-
Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing
Occlusion-aware keyframe selection via structural, cycle-consistent tracking, and vision-language criteria improves diffusion video editing robustness without manual annotations.
-
DisImpact: Quantifying the Physi-Social Impact of Natural Disasters Through Social Media
DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
-
QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning
QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
-
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
-
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
-
Are Researchers Being Replaced by Artificial Intelligence?
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.