hub Tool reference

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei · 2009

Tool reference. 75% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

27 Pith papers citing it

Method reference 75% of classified citations

browse 27 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

dataset 6 background 1 baseline 1

citation-polarity summary

use dataset 6 background 1 baseline 1

representative citing papers

PERL: Parameter Efficient Reasoning in CLIP Latent Space

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

PERL augments frozen CLIP with a shared recurrent reasoning module of roughly 6K parameters that iteratively refines representations via latent token injection, delivering strong base-to-novel and transfer performance across 15 benchmarks.

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.

Adaptive Selection of LoRA Components in Privacy-Preserving Federated Learning

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

AS-LoRA adaptively chooses which LoRA factor to update per layer and round using a curvature-aware second-order score, eliminating reconstruction error floors and improving performance in DP federated learning.

RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.

LAION-5B: An open large-scale dataset for training next generation image-text models

cs.CV · 2022-10-16 · accept · novelty 7.0

LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.

The TIME Machine: On The Power of Motion for Efficient Perception

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

TIME is a motion-based embedding from point tracks, trained only on synthetic data via masked autoencoding, that matches state-of-the-art video model performance with up to 10,000x less training data.

Hierarchical Variational Policies for Reward-Guided Diffusion

cs.LG · 2026-05-20 · conditional · novelty 6.0

A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.

SparseSAM: Structured Sparsification of Activations in Segment Anything Models

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

SparseSAM achieves 2x faster inference and 2.8x memory reduction in SAM with only 0.004 mIoU loss at 0.4 density via Stripe-Sort Attention and Residual-Consistency MLP.

Registers Matter for Pixel-Space Diffusion Transformers

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.

Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.

No One Knows the State of the Art in Geospatial Foundation Models

cs.CV · 2026-05-12 · accept · novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation with no minority examples in training.

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.

UniISP: A Unified ISP Framework for Both Human and Machine Vision

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.

Conservative Flows: A New Paradigm of Generative Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Conservative flows generate by running probability-preserving stochastic dynamics initialized at data points rather than noise, using corrected Langevin or predictor-corrector mechanisms on top of any pretrained flow model and showing gains on Swiss-roll, ImageNet-256 and Oxford Flowers-102.

SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

SoftSAE replaces fixed-K sparsity in autoencoders with a learned, input-dependent number of active features via a soft top-k operator.

Linear-Time Global Visual Modeling without Explicit Attention

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.

StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.

MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse

cs.LG · 2026-05-14 · conditional · novelty 5.0

MahaVar augments the Mahalanobis OOD score with class-wise distance variance, which is theoretically higher for in-distribution samples under relaxed Neural Collapse geometry.

TINS: Test-time ID-prototype-separated Negative Semantics Learning for OOD Detection

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

TINS improves OOD detection by learning negative semantics at test time with ID-prototype separation, cutting average FPR95 from 14.04% to 6.72% on the Four-OOD benchmark with ImageNet-1K.

Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts

cs.CV · 2026-05-10 · unverdicted · novelty 5.0

MDMF detects AI-generated images by learning patch-level forensic signatures and quantifying their distributional discrepancies with MMD, yielding larger separation than global methods when micro-defects are present.

Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approximate-gradient methods.

citing papers explorer

Showing 27 of 27 citing papers.

PERL: Parameter Efficient Reasoning in CLIP Latent Space cs.CV · 2026-05-18 · unverdicted · none · ref 5
PERL augments frozen CLIP with a shared recurrent reasoning module of roughly 6K parameters that iteratively refines representations via latent token injection, delivering strong base-to-novel and transfer performance across 15 benchmarks.
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles cs.CV · 2026-05-12 · unverdicted · none · ref 9
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
Adaptive Selection of LoRA Components in Privacy-Preserving Federated Learning cs.LG · 2026-05-07 · unverdicted · none · ref 44
AS-LoRA adaptively chooses which LoRA factor to update per layer and round using a curvature-aware second-order score, eliminating reconstruction error floors and improving performance in DP federated learning.
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy cs.LG · 2026-05-03 · unverdicted · none · ref 31
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
LAION-5B: An open large-scale dataset for training next generation image-text models cs.CV · 2022-10-16 · accept · none · ref 13
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
The TIME Machine: On The Power of Motion for Efficient Perception cs.CV · 2026-05-21 · unverdicted · none · ref 13
TIME is a motion-based embedding from point tracks, trained only on synthetic data via masked autoencoding, that matches state-of-the-art video model performance with up to 10,000x less training data.
Hierarchical Variational Policies for Reward-Guided Diffusion cs.LG · 2026-05-20 · conditional · none · ref 10
A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.
SparseSAM: Structured Sparsification of Activations in Segment Anything Models cs.CV · 2026-05-17 · unverdicted · none · ref 6
SparseSAM achieves 2x faster inference and 2.8x memory reduction in SAM with only 0.004 mIoU loss at 0.4 density via Stripe-Sort Attention and Residual-Consistency MLP.
Registers Matter for Pixel-Space Diffusion Transformers cs.CV · 2026-05-15 · unverdicted · none · ref 38
Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.
Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization cs.CV · 2026-05-14 · unverdicted · none · ref 32
Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.
No One Knows the State of the Art in Geospatial Foundation Models cs.CV · 2026-05-12 · accept · none · ref 18
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 146
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs cs.CV · 2026-05-11 · unverdicted · none · ref 6
Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation with no minority examples in training.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion cs.CV · 2026-05-08 · unverdicted · none · ref 16
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
UniISP: A Unified ISP Framework for Both Human and Machine Vision cs.CV · 2026-05-08 · unverdicted · none · ref 9
UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.
Conservative Flows: A New Paradigm of Generative Models cs.LG · 2026-05-07 · unverdicted · none · ref 30
Conservative flows generate by running probability-preserving stochastic dynamics initialized at data points rather than noise, using corrected Langevin or predictor-corrector mechanisms on top of any pretrained flow model and showing gains on Swiss-roll, ImageNet-256 and Oxford Flowers-102.
SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders cs.LG · 2026-05-07 · unverdicted · none · ref 25 · 2 links
SoftSAE replaces fixed-K sparsity in autoencoders with a learned, input-dependent number of active features via a soft top-k operator.
Linear-Time Global Visual Modeling without Explicit Attention cs.CV · 2026-05-03 · unverdicted · none · ref 9
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents cs.CV · 2026-04-28 · unverdicted · none · ref 8
A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods cs.CV · 2026-04-06 · unverdicted · none · ref 3
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse cs.LG · 2026-05-14 · conditional · none · ref 4
MahaVar augments the Mahalanobis OOD score with class-wise distance variance, which is theoretically higher for in-distribution samples under relaxed Neural Collapse geometry.
TINS: Test-time ID-prototype-separated Negative Semantics Learning for OOD Detection cs.CV · 2026-05-11 · unverdicted · none · ref 8
TINS improves OOD detection by learning negative semantics at test time with ID-prototype separation, cutting average FPR95 from 14.04% to 6.72% on the Four-OOD benchmark with ImageNet-1K.
Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts cs.CV · 2026-05-10 · unverdicted · none · ref 9
MDMF detects AI-generated images by learning patch-level forensic signatures and quantifying their distributional discrepancies with MMD, yielding larger separation than global methods when micro-defects are present.
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations cs.LG · 2026-05-07 · unverdicted · none · ref 15
MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approximate-gradient methods.
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay cs.CV · 2026-05-02 · unverdicted · none · ref 25
Colinearity-Decay regularizer trains ViTs that maintain or improve full-precision accuracy while delivering higher accuracy after low-bit quantization on ImageNet and COCO tasks.
Decision-Aware Attention Propagation for Vision Transformer Explainability cs.CV · 2026-04-20 · unverdicted · none · ref 22
DAP improves ViT attribution maps by injecting decision-relevant gradients into attention propagation, producing more class-sensitive and faithful explanations than standard attention rollout.
CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search cs.AI · 2025-09-30 · unverdicted · none · ref 9
CoLLM-NAS introduces a collaborative two-LLM framework with Navigator, Generator, and Coordinator modules to perform knowledge-guided neural architecture search, reporting state-of-the-art results on ImageNet and NAS-Bench-201 with 4-10x lower search cost.

Imagenet: A large- scale hierarchical image database

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer