hub Mixed citations

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al · 2021

Mixed citation behavior. Most common role is background (61%).

94 Pith papers citing it

Background 61% of classified citations

browse 94 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 20 method 10 baseline 1 dataset 1 other 1

citation-polarity summary

background 20 use method 10 baseline 1 unclear 1 use dataset 1

co-cited works

representative citing papers

Disentanglement Beyond Generative Models with Riemannian ICA

cs.LG · 2026-05-21 · unverdicted · novelty 8.0

RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, which recovers sources across manifolds in controlled tests.

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

cs.CV · 2026-05-19 · accept · novelty 8.0

NeuroQA is a large-scale 3D brain MRI visual question answering benchmark with verified image-grounded QA pairs, multi-domain coverage, and baseline evaluations showing current models lag behind text-only performance.

Will It Go Viral? Grounding Micro-Video Popularity Prediction on the Open Web

cs.MM · 2026-05-18 · unverdicted · novelty 7.0

WEBSHORTS dataset and SHORTS-CAST framework ground micro-video popularity prediction in structured open-web context collected at upload time and enable selective online adaptation using delayed labels.

UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

UniTriGen uses unified diffusion in a shared latent space plus lightweight adapters and scene-balanced sampling to produce high-quality aligned VIS-IR-Label triplets from limited paired data, improving few-shot RGB-T semantic segmentation.

CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL

cs.CV · 2026-05-14 · conditional · novelty 7.0

CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight bimanual manipulation tasks.

Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

cs.CR · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

SubPopMark embeds verifiable subpopulation biases into distilled datasets via CVM and USTM optimization stages, allowing provenance inference through comparison of model output signatures against a reference behavior bank.

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

VLMs show a resolution illusion on UHR Earth observation imagery where higher resolution does not improve micro-target perception; UHR-Micro benchmark and MAP-Agent address this via evidence-centered active inspection.

Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.

PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

PoseBridge recovers semantic information lost during skeletonization by extracting pose-anchored cues from human pose estimation and transferring them via skeleton-conditioned bridging and semantic prototype adaptation, yielding 13.3-17.4 point gains on the Kinetics PURLS benchmark.

PhyGround: Benchmarking Physical Reasoning in Generative World Models

cs.CV · 2026-05-11 · accept · novelty 7.0

PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.

GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

GridProbe uses posterior probing on a KxK frame grid to adaptively select question-relevant frames, delivering up to 3.36x TFLOPs reduction with accuracy within 1.6 pp of the full-frame baseline on Video-MME-v2.

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

LoopVLA adds recurrent refinement and learned sufficiency estimation to VLA models, cutting parameters 45% and raising throughput 1.7x while matching baseline task success on LIBERO and VLA-Arena.

IPAD-CLIP: Teaching CLIP to Detect Image Local Perceptual Artifacts

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

IPAD-CLIP adapts CLIP via artifact-aware text embeddings to detect multi-class local perceptual artifacts, backed by a new dataset of 3520 images with pixel-level masks.

SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

SphereVAD performs training-free video anomaly detection by recasting anomaly discrimination as von Mises-Fisher likelihood-ratio geodesic inference on the unit hypersphere using intermediate MLLM features, with Frechet mean centering, holistic scene attention, and spherical geodesic pulling.

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.

Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

PIQL integrates privileged information to accelerate convergence, lower loss, and improve generalization in tabular foundation models.

What Cohort INRs Encode and Where to Freeze Them

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.

Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.

TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial MSSL methods on urban mobility and road tasks.

FreeSpec: Training-Free Long Video Generation via Singular-Spectrum Reconstruction

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

FreeSpec uses SVD-based spectral reconstruction to fuse global low-rank and local high-rank features, reducing content drift and preserving temporal dynamics in long video generation.

Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.

TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attacks

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.

RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

RSRCC is a new 126k-question benchmark for fine-grained remote sensing change question-answering, constructed via a hierarchical semi-supervised pipeline with retrieval-augmented Best-of-N ranking.

citing papers explorer

Showing 15 of 15 citing papers after filters.

Disentanglement Beyond Generative Models with Riemannian ICA cs.LG · 2026-05-21 · unverdicted · none · ref 70
RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, which recovers sources across manifolds in controlled tests.
Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning cs.LG · 2026-05-12 · unverdicted · none · ref 30
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning cs.LG · 2026-05-08 · unverdicted · none · ref 53
Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.
Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning cs.LG · 2026-05-08 · unverdicted · none · ref 27 · 2 links
PIQL integrates privileged information to accelerate convergence, lower loss, and improve generalization in tabular foundation models.
What Cohort INRs Encode and Where to Freeze Them cs.LG · 2026-05-08 · unverdicted · none · ref 50
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation cs.LG · 2026-04-11 · unverdicted · none · ref 184
The first survey on Attention Sink in Transformers structures the literature around fundamental utilization, mechanistic interpretation, and strategic mitigation.
S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs cs.LG · 2026-05-18 · unverdicted · none · ref 24 · 2 links
S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.
ModelLens: Finding the Best for Your Task from Myriads of Models cs.LG · 2026-05-08 · unverdicted · none · ref 40
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time cs.LG · 2026-05-03 · unverdicted · none · ref 27
LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.
ASAP: Attention Sink Anchored Pruning cs.LG · 2026-05-21 · unverdicted · none · ref 24
ASAP prunes tokens in ViTs by anchoring on attention sinks modeled as lazy random walks, using cumulative transition matrices and radial diffusion clustering to compress redundancy while preserving accuracy.
A Composite Activation Function for Learning Stable Binary Representations cs.LG · 2026-05-12 · unverdicted · none · ref 54
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.
Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models cs.LG · 2026-05-11 · unverdicted · none · ref 66
SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.
Diverse Sampling in Diffusion Models with Marginal Preserving Particle Guidance cs.LG · 2026-05-07 · unverdicted · none · ref 42
EDDY adds diversity to diffusion-model samples by using kernel-based anti-symmetric pairwise drifts that preserve marginal distributions via Fokker-Planck symmetries, with practical approximations for expensive cases.
Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion cs.LG · 2026-04-27 · unverdicted · none · ref 30
Diffusion Templates is a unified plugin framework that allows injecting various controllable capabilities into diffusion models through a standardized interface.
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026-05-09 · unreviewed · ref 78

Learning transferable visual models from natural language supervision

hub tools

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer