super hub Canonical reference

Emogen: Emotional image content generation with text-to-image diffusion models

Wu, G · 2024 · arXiv 2733.2024

Canonical reference. 91% of citing Pith papers cite this work as background.

279 Pith papers citing it

Background 91% of classified citations

read on arXiv browse 279 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 83 dataset 6 baseline 2 method 2

citation-polarity summary

background 85 use dataset 4 baseline 2 use method 2

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0 · 3 refs

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception

cs.CV · 2026-06-30 · accept · novelty 7.0 · 2 refs

RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.

Intrinsic decomposition and editing of 3D Gaussian splats

cs.GR · 2026-06-30 · unverdicted · novelty 7.0

A method to decompose 3D Gaussian splats into independent albedo and shading components for consistent texture editing in radiance fields.

Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.

Learning to Deny: Action Denial in Multimodal Large Language Models

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.

Diffusion-Based Material Regularization for Physics-Based Inverse Rendering

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

A regularization technique that treats diffusion model outputs as a similarity kernel during material optimization in inverse rendering, enabling joint reconstruction of geometry, materials, and illumination that satisfies the rendering equation and generalizes to new lighting.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

ScaLe-INR: Scale and Learn Implicit Neural Representations

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0 · 4 refs

GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.

Leveraging target dynamics for imaging in complex media

physics.optics · 2026-06-21 · unverdicted · novelty 7.0

Target dynamics provide an intrinsic source of variation equivalent to controlled illumination changes, enabling scattering-compensated reconstruction of dynamic scenes with one acquisition per frame in holographic and fluorescence imaging.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

FLM-Occ: Feed-forward Likelihood Maximization for Efficient Indoor Occupancy Prediction

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.

HERO: Hypothesis-Driven Evidence Retrieval from Omics for Multi-Task Breast Cancer Analysis

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

cs.CL · 2026-06-18 · unverdicted · novelty 7.0

StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.

Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

cs.CV · 2026-06-16 · conditional · novelty 7.0

CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0 · 2 refs

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0 · 2 refs

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design

cs.CE · 2026-06-04 · unverdicted · novelty 7.0 · 3 refs

Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

cs.CV · 2026-06-04 · conditional · novelty 7.0

Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.

citing papers explorer

Showing 50 of 144 citing papers after filters.

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife cs.CV · 2026-06-19 · unverdicted · none · ref 37
WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.
MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models cs.CV · 2026-07-01 · unverdicted · none · ref 21
MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.
SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE cs.CV · 2026-06-30 · unverdicted · none · ref 19 · 3 links
SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.
Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs cs.CV · 2026-06-30 · unverdicted · none · ref 17
An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.
Learning to Deny: Action Denial in Multimodal Large Language Models cs.CV · 2026-06-30 · unverdicted · none · ref 39
MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.
Diffusion-Based Material Regularization for Physics-Based Inverse Rendering cs.CV · 2026-06-30 · unverdicted · none · ref 39
A regularization technique that treats diffusion model outputs as a similarity kernel during material optimization in inverse rendering, enabling joint reconstruction of geometry, materials, and illumination that satisfies the rendering equation and generalizes to new lighting.
Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction cs.CV · 2026-06-28 · unverdicted · none · ref 35
Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.
ScaLe-INR: Scale and Learn Implicit Neural Representations cs.CV · 2026-06-26 · unverdicted · none · ref 20
ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.
MATCH: Flow Matching for Multi-View Anomaly Detection cs.CV · 2026-06-23 · unverdicted · none · ref 9 · 2 links
MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.
GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation cs.CV · 2026-06-22 · unverdicted · none · ref 6 · 4 links
GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation cs.CV · 2026-06-22 · unverdicted · none · ref 6
Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.
FLM-Occ: Feed-forward Likelihood Maximization for Efficient Indoor Occupancy Prediction cs.CV · 2026-06-19 · unverdicted · none · ref 9
FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.
HERO: Hypothesis-Driven Evidence Retrieval from Omics for Multi-Task Breast Cancer Analysis cs.CV · 2026-06-19 · unverdicted · none · ref 12
HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.
TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation cs.CV · 2026-06-10 · unverdicted · none · ref 23 · 2 links
A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning cs.CV · 2026-06-08 · unverdicted · none · ref 12
FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.
Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation cs.CV · 2026-06-05 · unverdicted · none · ref 30 · 2 links
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
TIDES: Time-Derivative Event Simulation via Deformable Reconstruction cs.CV · 2026-06-01 · unverdicted · none · ref 37
TIDES simulates realistic event camera streams in continuous time via dynamic Gaussian splatting with adaptive occlusion handling and sensor artifact modeling, claiming SOTA fidelity and better downstream transfer than prior methods.
SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory cs.CV · 2026-05-30 · unverdicted · none · ref 48
SuperMemory-VQA provides 4,853 human-verified QA pairs from 52.9 hours of egocentric AI glasses recordings to benchmark AI systems on realistic long-horizon memory tasks including an unanswerable option.
RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations cs.CV · 2026-05-22 · unverdicted · none · ref 13 · 2 links
RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.
AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing cs.CV · 2026-05-20 · unverdicted · none · ref 67 · 2 links
AIGaitor is the first claimed end-to-end on-device monocular motion-capture and deep-learning gait analysis pipeline demonstrated on consumer smartphones.
SDM: A Powerful Tool for Evaluating Model Robustness cs.CV · 2026-05-19 · unverdicted · none · ref 16
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue cs.CV · 2026-05-19 · unverdicted · none · ref 6
LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.
HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos cs.CV · 2026-05-17 · unverdicted · none · ref 35
HL-OutPaint enables high-resolution outpainting of long video sequences via a coarse-to-fine pipeline that first builds Global Coarse Guidance through global-local frame swapping then synthesizes details.
Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 5 · 2 links
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
Field-Localized Forgery Detection for Digital Identity Documents cs.CV · 2026-05-09 · unverdicted · none · ref 15 · 2 links
FLiD is a field-localized forgery detection method for identity documents that outperforms full-document baselines and general detectors with significantly fewer parameters.
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics cs.CV · 2026-05-05 · unverdicted · none · ref 35
AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting cs.CV · 2026-05-04 · unverdicted · none · ref 19 · 2 links
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models cs.CV · 2026-05-02 · unverdicted · none · ref 10 · 4 links
MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.
CSGuard: Toward Forgery-Resistant Watermarking in Diffusion Models via Compressed Sensing Constraint cs.CV · 2026-05-02 · unverdicted · none · ref 3 · 4 links
CSGuard binds diffusion-model watermarks to a secret matrix via compressed sensing, cutting forgery attack success from 100% to 28.12% while preserving 100% detection on legitimate images.
Towards Temporal Compositional Reasoning in Long-Form Sports Videos cs.CV · 2026-04-24 · unverdicted · none · ref 42
SportsTime benchmark and CoTR method improve multimodal AI's temporal compositional reasoning and evidence grounding in long-form sports videos.
HumanScore: Benchmarking Human Motions in Generated Videos cs.CV · 2026-04-22 · unverdicted · none · ref 23
HumanScore defines six metrics for kinematic plausibility, temporal stability, and biomechanical consistency to benchmark human motions in videos from thirteen state-of-the-art generation models, revealing gaps between visual appeal and physical fidelity.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 27 · 3 links
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes cs.CV · 2026-04-20 · unverdicted · none · ref 29 · 2 links
SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.
Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 57 · 2 links
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment cs.CV · 2026-04-14 · unverdicted · none · ref 36
DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.
Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models cs.CV · 2026-04-13 · unverdicted · none · ref 2
VLMs display semantic fixation, with higher accuracy on standard rule mappings than inverse ones across 14 models, narrowed by neutral prompts but widened by loaded ones and affected by post-training alignment.
DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions cs.CV · 2026-04-07 · unverdicted · none · ref 18 · 2 links
DetailVerifyBench supplies 1,000 images and densely annotated long captions to evaluate precise hallucination localization in multimodal large language models.
From Plausibility to Verifiability: Risk-Controlled Generative OCR with Vision-Language Models cs.CV · 2026-03-20 · unverdicted · none · ref 36
A model-agnostic Geometric Risk Controller reduces extreme errors in VLM-based OCR by requiring cross-view consensus before accepting outputs.
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers cs.CV · 2025-10-02 · unverdicted · none · ref 11
InPose formulates pose estimation as an inverse problem solved by guiding a rotation-conditioned diffusion prior with a location-based likelihood term for zero-shot generalization across users.
CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion cs.CV · 2025-09-24 · unverdicted · none · ref 2
CamPVG is the first diffusion-based framework for generating geometrically consistent panoramic videos from camera pose inputs using a panoramic Plücker embedding and spherical epipolar attention module.
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing cs.CV · 2025-06-26 · unverdicted · none · ref 6
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations cs.CV · 2025-06-03 · unverdicted · none · ref 35
BEVCALIB performs LiDAR-camera calibration from raw data by fusing camera and LiDAR bird's-eye view features with a novel feature selector and reports state-of-the-art accuracy on KITTI and NuScenes.
Fleet: Few Shots Lead Effective AI-generated Image Detection cs.CV · 2026-06-30 · unverdicted · none · ref 14
Fleet achieves dynamic few-shot adaptation for AIGI detection via avoidance routing in decoupled subspaces, raising accuracy from 20.4% to 73.1% on new generators like Doubao Seedream 4.0 with 10 shots.
Anomaly Factory 3D: A Modular Framework for Diverse Pseudo-Anomaly Synthesis in Unsupervised 3D Anomaly Detection cs.CV · 2026-06-28 · unverdicted · none · ref 15
AF3AD is a modular synthesis framework using center-conditioned parametric deformations in local PCA frames to create diverse pseudo-anomalies, improving unsupervised 3D anomaly detection on AnomalyShapeNet and Real3D-AD.
OSOR: One-Step Diffusion Inpainting for Effect-Aware Object Removal cs.CV · 2026-06-26 · unverdicted · none · ref 14 · 2 links
OSOR is a one-step diffusion inpainting method using an occupancy-guided discriminator, alpha head, and semantic-anchored verification pipeline to achieve effect-aware object removal, outperforming multi-step baselines in quality at 4-30x speed.
HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models cs.CV · 2026-06-25 · unverdicted · none · ref 11
HarmVideoBench is a multi-layered benchmark for harmful video understanding in LVLMs with three hierarchical dimensions, and BCR is a method that raises average model performance from 61.7% to 84.4%.
HANCLIP: A Family of Hyperbolic Angular Negation Vision Language Models cs.CV · 2026-06-22 · unverdicted · none · ref 35 · 2 links
HANCLIP restructures VLM embeddings with hyperbolic space and angular negation objectives to raise negation sensitivity on NegBench while keeping standard retrieval and classification performance.
Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition cs.CV · 2026-06-21 · unverdicted · none · ref 33
Uncertainty decomposition via deep ensembles separates annotator disagreement from distribution shift in FER, enabling a routing mechanism that retains 1.8x more ambiguous faces at matched OOD rejection compared to single-uncertainty baselines.
Spectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigation cs.CV · 2026-06-18 · unverdicted · none · ref 5
QK Product Steering suppresses dominant singular modes in the per-head QK product of selected middle layers via a closed-form query-only update, yielding 4.0% average relative CHAIR_s reduction on three GQA VLMs.
FATE: Pillar Encoding and Frequency-Aware Training for Event-Based Object Detection cs.CV · 2026-06-15 · unverdicted · none · ref 55
FATE combines pillar encoding via orthogonal polynomial basis with frequency-aware training to enable event-based object detection at up to 200 Hz without internal temporal sub-binning.

Emogen: Emotional image content generation with text-to-image diffusion models

hub tools

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer