WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
Title resolution pending
22 Pith papers cite this work. Polarity classification is still indexing.
years
2026 22representative citing papers
BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
Relit-LiVE jointly predicts relit videos and viewpoint-aligned environment maps inside a single diffusion process to achieve physically consistent video relighting without camera pose input.
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
A new MAT simplification algorithm uses explicit surface correspondence tracking and priority-controlled edge collapses to preserve structural features like fillet alignments on discrete meshes.
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
A training-free technique manipulates low-frequency noise in diffusion models to control image color and structure using low-frequency priors.
TTCD uses a non-stationary feature learner and reconstruction-guided distillation inside a transformer to infer contemporaneous and lagged causal graphs from non-stationary time series without strong noise assumptions.
A relaxed Picard iteration plus heteroscedastic boundary denoising lets Monte Carlo PDE solvers solve heat equations with nonlinear radiation boundary conditions more accurately than linearization.
Lightweight networks combine bracketed smartphone exposures as convex combinations of raw pixels to produce artifact-free HDR images that generalize from synthetic training to real captures.
PULSE stabilizes mmWave human pose estimation by screening Doppler motion prompts before injecting them into spatial magnitude reasoning.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
CS students and recent grads prioritize pay and workplace culture over ethics in job searches and justify conflicting decisions with shared explanations such as money or lack of alternatives.
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.
citing papers explorer
-
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
-
BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation
BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
-
DisImpact: Quantifying the Physi-Social Impact of Natural Disasters Through Social Media
DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
-
QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning
QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
-
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
-
Generative 3D Gaussians with Learned Density Control
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
-
Relit-LiVE: Relight Video by Jointly Learning Environment Video
Relit-LiVE jointly predicts relit videos and viewpoint-aligned environment maps inside a single diffusion process to achieve physically consistent video relighting without camera pose input.
-
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
-
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
-
Structural MAT: Clean and Scalable Medial Axis Simplification via Explicit Surface Correspondence
A new MAT simplification algorithm uses explicit surface correspondence tracking and priority-controlled edge collapses to preserve structural features like fillet alignments on discrete meshes.
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
-
Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation
A training-free technique manipulates low-frequency noise in diffusion models to control image color and structure using low-frequency priors.
-
TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data
TTCD uses a non-stationary feature learner and reconstruction-guided distillation inside a transformer to infer contemporaneous and lagged causal graphs from non-stationary time series without strong noise assumptions.
-
Monte Carlo PDE Solvers for Nonlinear Radiative Boundary Conditions
A relaxed Picard iteration plus heteroscedastic boundary denoising lets Monte Carlo PDE solvers solve heat equations with nonlinear radiation boundary conditions more accurately than linearization.
-
Lucky High Dynamic Range Smartphone Imaging
Lightweight networks combine bracketed smartphone exposures as convex combinations of raw pixels to produce artifact-free HDR images that generalize from synthetic training to real captures.
-
Doppler Prompting for Stable mmWave-based Human Pose Estimation
PULSE stabilizes mmWave human pose estimation by screening Doppler motion prompts before injecting them into spatial magnitude reasoning.
-
TabEmb: Joint Semantic-Structure Embedding for Table Annotation
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
-
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
-
Cost-of-Ethics Crisis: Beliefs, Decisions, and Justifications in the Job Searches of Computer Science Students in Canada and the United States
CS students and recent grads prioritize pay and workplace culture over ethics in job searches and justify conflicting decisions with shared explanations such as money or lack of alternatives.
-
Are Researchers Being Replaced by Artificial Intelligence?
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.
- Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing