archive
Every paper Pith has read. Search by title, abstract, or pith.
9568 papers in cs.CV · page 13
-
Optimal transport merges 3DGS primitives down to 10 percent
MMGS: 10$\times$ Compressed 3DGS through Optimal Transport Aggregation based on Multi-view Ranking
-
Shared subspaces cut parameters 87 percent in continual VLM learning
iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models
-
Dense synthetic images boost segmentation accuracy
What Makes Synthetic Data Effective in Image Segmentation
-
Brain network experts enable competitive fMRI semantic decoding
FPED: A Functional-Network Prior-Guided Mixture-of-Experts Framework for Interpretable Brain Decoding
-
Quadtrees cut GUI agent visual tokens by 30 percent
AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees
-
Flow-map endpoint velocity replaces fake-score network
Distribution Matching Distillation without Fake Score Network
-
LLM templates expand NAS to discover better architectures
Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search
-
Post-training lifts video models' physical consistency
PhyWorld: Physics-Faithful World Model for Video Generation
-
Method reduces age bias in medical image classification by decorrelating difficulty
Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation
-
HAVEN benchmark aligns video and text across three levels
HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding
-
PCA rotation aligns key channels for accurate VLM pruning
Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference
-
Regularizer cuts demographic gaps in medical image AI
Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification
-
Smartphone video measures forest trees with ~2 cm accuracy
Smartphone-based Circular Plot Sampling for Forest Inventory
-
Quasi-concavity enforces convex shapes in segmentation networks
D-Convexity: A Unified Differentiable Convex Shape Prior via Quasi-Concavity for Data-driven Image Segmentation
-
Quantized model cuts brain tumor AI size by 6x with same accuracy
Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings
-
Layer-wise compression on image stats yields human-like visual features
Efficient coding along the visual hierarchy
-
Freezing image models yields competitive video performance
Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models
-
SSL pretraining helps models know when to skip DR predictions
Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening
-
VLMs need tight data alignment and miss weak signals in egocentric video
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
-
Diffusion model turns uniform organ maps into realistic PET scans
Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model
-
FAGER metric leads in factual checks for AI image generators
FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models
-
CRAFT pipeline leads MAGMaR video QA at 0.739 average
CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
-
Multi-horizon training captures longer solar forecast dependencies
Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting
-
LiFT lifts 2D generators to coherent 3D medical volumes
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators
-
RL fine-tuning aligns traffic simulations with real data
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning
-
One photo produces a mask that defeats facial recognition on any image
Personalized Face Privacy Protection From a Single Image
-
New benchmark tests medical AI models on real-world image shifts
MedFM-Robust: Benchmarking Robustness of Medical Foundation Models
-
Benchmark tests medical AI models on real-world variations
MedFM-Robust: Benchmarking Robustness of Medical Foundation Models
-
Foundation models fail to spot unseen iris attacks and spectral changes
A Systematic Failure Analysis of Vision Foundation Models for Open Set Iris Presentation Attack Detection
-
75 real urban walks released with head poses and gaze for trajectory models
EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction
-
MLLMs often miss artifacts in AI videos
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
-
Self-supervised backbones boost artwork classification
Harnessing Self-Supervised Features for Art Classification
-
LLM gains part-level and time-step control over human motion
MotionMERGE: A Multi-granular Framework for Human Motion Editing, Reasoning, Generation, and Explanation
-
COLMAP metrics match humans 4x better on 3D view consistency
Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate
-
Direct waveform audio matches latent methods on benchmarks
WavFlow: Audio Generation in Waveform Space
-
VLM agent turns vague requests into video edit plans
Aurora: Unified Video Editing with a Tool-Using Agent
-
Active exploration outperforms passive in spatial intelligence tasks
ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
-
Self-distillation from crops boosts MLLM detail recognition
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
-
NVFP4 and balanced SP enable 2x faster long video training
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
-
Diffusion models generate faster by growing resolution during denoising
Spectral Progressive Diffusion for Efficient Image and Video Generation
-
Diffusion models speed up by growing resolution during denoising
Spectral Progressive Diffusion for Efficient Image and Video Generation
-
Single photo gains full PBR lighting control via shared intrinsic maps
PIXLRelight: Controllable Relighting via Intrinsic Conditioning
-
Dual-view selection lifts ego-exo memory accuracy to 58.2 percent
EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos
-
Entity ID tracking stops character drift in AI videos
Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory
-
Robots evolve navigation rules from their own successes and failures
Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction
-
Online steering halves unsafe content in diffusion models
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training
-
Segmentation proxy aligns multimodal understanding and generation
Semantic Generative Tuning for Unified Multimodal Models
-
Training augmentations alone match FGIR accuracy without crops
A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition
-
3D concept scaffold fixes prompt ambiguity in avatar retrieval
CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation
-
Lance beats prior open models at image and video generation
Lance: Unified Multimodal Modeling by Multi-Task Synergy