super hub Canonical reference

Emogen: Emotional image content generation with text-to-image diffusion models

Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou · 2024 · arXiv 2733.2024

Canonical reference. 91% of citing Pith papers cite this work as background.

273 Pith papers citing it

Background 91% of classified citations

read on arXiv browse 273 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 83 dataset 6 baseline 2 method 2

citation-polarity summary

background 85 use dataset 4 baseline 2 use method 2

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0 · 3 refs

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception

cs.CV · 2026-06-30 · accept · novelty 7.0 · 2 refs

RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.

Intrinsic decomposition and editing of 3D Gaussian splats

cs.GR · 2026-06-30 · unverdicted · novelty 7.0

A method to decompose 3D Gaussian splats into independent albedo and shading components for consistent texture editing in radiance fields.

Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.

Learning to Deny: Action Denial in Multimodal Large Language Models

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.

Diffusion-Based Material Regularization for Physics-Based Inverse Rendering

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

A regularization technique that treats diffusion model outputs as a similarity kernel during material optimization in inverse rendering, enabling joint reconstruction of geometry, materials, and illumination that satisfies the rendering equation and generalizes to new lighting.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

ScaLe-INR: Scale and Learn Implicit Neural Representations

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0 · 4 refs

GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.

Leveraging target dynamics for imaging in complex media

physics.optics · 2026-06-21 · unverdicted · novelty 7.0

Target dynamics provide an intrinsic source of variation equivalent to controlled illumination changes, enabling scattering-compensated reconstruction of dynamic scenes with one acquisition per frame in holographic and fluorescence imaging.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

FLM-Occ: Feed-forward Likelihood Maximization for Efficient Indoor Occupancy Prediction

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.

HERO: Hypothesis-Driven Evidence Retrieval from Omics for Multi-Task Breast Cancer Analysis

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

cs.CL · 2026-06-18 · unverdicted · novelty 7.0

StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.

Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

cs.CV · 2026-06-16 · conditional · novelty 7.0

CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0 · 2 refs

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0 · 2 refs

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

Bridging CAD and Data-Driven Design: Attributed Feature Graphs for Engineering Design

cs.CE · 2026-06-04 · unverdicted · novelty 7.0 · 3 refs

Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

cs.CV · 2026-06-04 · conditional · novelty 7.0

Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.

Multimarginal flow matching with optimal transport potentials

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.

citing papers explorer

Showing 29 of 29 citing papers after filters.

Multimarginal flow matching with optimal transport potentials cs.LG · 2026-06-03 · unverdicted · none · ref 40
OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging cs.LG · 2026-06-01 · unverdicted · none · ref 10
MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.
Navigating Potholes with Geometry-Aware Sharpness Minimization cs.LG · 2026-05-15 · unverdicted · none · ref 10
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion cs.LG · 2026-05-13 · unverdicted · none · ref 16
MuteBench evaluates multimodal fusion robustness to modality missing and within-modality missing on 125000 samples from 9 clinical datasets, finding architecture family predicts tolerance better than parameter count.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image cs.LG · 2026-05-11 · unverdicted · none · ref 28
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization cs.LG · 2026-04-14 · unverdicted · none · ref 44
STOMP extends direct preference optimization to the multi-objective setting via smooth Tchebysheff scalarization and standardization of observed rewards, achieving highest hypervolume in eight of nine protein engineering evaluations.
The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics cs.LG · 2026-06-11 · unverdicted · none · ref 25
The paper links phase-transition behavior in continuous generative samplers to projection caustics in the data geometry and introduces the Critical Boundary Detector as a diagnostic tool.
Silent Failures in Federated Personalization of Foundation Models cs.LG · 2026-05-31 · unverdicted · none · ref 70
Federated personalization of foundation models creates hard-to-detect trustworthiness failures due to privacy constraints, and existing benchmarks cannot adequately evaluate them.
Contribution Weights: A Geometrical Analysis of Self-Attention Transformers cs.LG · 2026-05-29 · unverdicted · none · ref 88 · 2 links
Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.
Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models cs.LG · 2026-05-29 · unverdicted · none · ref 13
Entropy-based test-time compute (ETTC) in VLM ensembles outperforms majority voting by prioritizing high-confidence predictions from stronger models.
Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models cs.LG · 2026-05-27 · unverdicted · none · ref 2
MAP-RPS and LMAP-RPS enable stage-wise D-P traversal in diffusion-based zero-shot inverse problems via MAP initialization followed by re-noised posterior sampling, supported by theoretical analysis.
HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation cs.LG · 2026-05-18 · unverdicted · none · ref 5 · 2 links
HypergraphFormer trains LLMs via supervised fine-tuning to generate hypergraph textual representations for floor plans, claiming better performance than raster or vector methods on RPLAN and a new out-of-distribution dataset while enabling arbitrary boundaries and high editability.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity cs.LG · 2026-05-13 · unverdicted · none · ref 92
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.
Scaling Laws for Mixture Pretraining Under Data Constraints cs.LG · 2026-05-12 · unverdicted · none · ref 13
Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.
Generalized Category Discovery in Federated Graph Learning cs.LG · 2026-05-05 · unverdicted · none · ref 7
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework cs.LG · 2026-05-01 · unverdicted · none · ref 46
PrismAgent deploys four specialized LLM agents in sequence to analyze meme intent, gather context, make preliminary judgments, and deliver a final harm verdict, outperforming prior zero-shot methods on three public datasets.
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping cs.LG · 2026-04-13 · unverdicted · none · ref 27
MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.
3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy cs.LG · 2026-06-22 · unverdicted · none · ref 1
3D masked autoencoders with multimodal alignment to ESM2 outperform 2D variants on single-cell microscopy tasks, reaching ROC-AUC 0.865 on protein-protein interaction and state-of-the-art AUC_micro 0.952 on localization.
Fitting Unknown Number of Hyperplanes with Manifold Optimization cs.LG · 2026-05-27 · unverdicted · none · ref 5
A two-stage manifold optimization method on the sphere uses Riemannian EM with a heavy-tailed kernel and projected density initialization to fit an unknown number of hyperplanes, claiming better geometric accuracy than prior baselines.
Pruning Deep Neural Networks via the Marchenko--Pastur Distribution cs.LG · 2026-05-23 · unverdicted · none · ref 8 · 2 links
Marchenko-Pastur random-matrix pruning of DNNs yields theoretical certificates for accuracy preservation under small fine-tuning and empirical ImageNet results with 50-60% MAC reduction and sub-2pp accuracy drops on ViT and CNN models.
Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective cs.LG · 2026-05-20 · unverdicted · none · ref 15
Derives an asymptotic equivalent for the Representation Gap in equivariant diffusion models, showing it depends primarily on the intrinsic dimension of the task.
Position: State-of-the-Art Claims Require State-of-the-Art Evidence cs.LG · 2026-05-17 · unverdicted · none · ref 1
SOTA claims based on aggregate benchmark scores frequently lack evidence for true model superiority beyond marginal mean improvements.
Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning cs.LG · 2026-05-16 · unverdicted · none · ref 39
RAC adds ranking-aware group loss and clean-corrupted pairwise loss to RL post-training to boost both accuracy and calibration in multimodal reasoning without extra annotations.
SE-GA: Memory-Augmented Self-Evolution for GUI Agents cs.LG · 2026-05-16 · unverdicted · none · ref 6
SE-GA combines Test-Time Memory Extension for dynamic context retrieval with Memory-Augmented Self-Evolution training to reach 89.0% on ScreenSpot and 75.8% on AndroidControl-High.
TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning cs.LG · 2026-06-04 · unverdicted · none · ref 5
TailLoR applies low-rank updates to the singular value matrix of pre-trained weights while using a soft spectral penalty to protect dominant singular directions during continual learning.
When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet cs.LG · 2026-06-04 · unverdicted · none · ref 49
A multiplication-only truncated Neumann approximation for matrix inversion in quantized Gated DeltaNet linear attention delivers up to 5x kernel speedup and 20% decode overhead reduction while preserving accuracy on Qwen3.5 models.
INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference cs.LG · 2026-05-13 · unverdicted · none · ref 17
INAR-VL routes 36% of visual question answering requests to the edge using lightweight complexity signals, cutting latency 24% and energy 26% while retaining 97% of cloud accuracy.
Gated-SwinRMT: Unifying Swin Windowed Attention with Retentive Manhattan Decay via Input-Dependent Gating cs.LG · 2026-04-07 · unverdicted · none · ref 2
Gated-SwinRMT unifies Swin windowed attention with retentive Manhattan decay via gating, reaching 80.22% top-1 accuracy on Mini-ImageNet versus 73.74% for the RMT baseline.
MidSteer: Optimal Affine Framework for Steering Generative Models cs.LG · 2026-04-17 · unreviewed · ref 5

Emogen: Emotional image content generation with text-to-image diffusion models

hub tools

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer