Canonical reference

Title resolution pending

Deep learning , author= · 2016

Canonical reference. 71% of citing Pith papers cite this work as background.

144 Pith papers citing it

Background 71% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 24 method 5 baseline 2

citation-polarity summary

background 22 use method 5 baseline 2 unclear 2

representative citing papers

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.

A Majorization-Minimization with Monte Carlo Approach for Hyperparameter Estimation

math.NA · 2026-05-13 · unverdicted · novelty 7.0

M³C replaces the hard hyperparameter optimization with a sequence of simpler problems using a majorant for the log-determinant approximated via Monte Carlo, with proven high-probability convergence to a critical point under assumptions.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Pareto-Guided Optimal Transport for Multi-Reward Alignment

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection

cs.LG · 2026-05-13 · conditional · novelty 7.0

ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inference at matched accuracy.

Identifying the nonlinear string dynamics with port-Hamiltonian neural networks

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Port-Hamiltonian neural networks extended to PDEs recover the Hamiltonian and dissipation of nonlinear string dynamics from data and outperform non-physics-informed baselines.

CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

CAWI replaces standard random initialization of input-to-hidden weights in randomized neural networks with samples drawn from a data-fitted copula that preserves observed feature dependencies, yielding consistent accuracy gains on 83 classification benchmarks.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.

LAPRAS : Learning-Augmented PRivate Answering for linear query Streams

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

LAPRAS uses predictions to answer likely queries with the offline Matrix Mechanism and paces residual budget for unpredicted queries via unbiased stopping-time estimation from the first few unexpected arrivals, achieving near-offline utility when overlap is high.

FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting

cs.GR · 2026-04-30 · unverdicted · novelty 7.0

FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.

FaceParts: Segmentation and Editing of Gaussian Splatting

cs.GR · 2026-03-25 · unverdicted · novelty 7.0

FaceParts performs unsupervised segmentation of facial features in Gaussian Splatting avatars and supports precise editing and cross-avatar part transfer using feature disentanglement, density clustering, and FLAME anchoring.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

cs.CL · 2024-10-14 · unverdicted · novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

citing papers explorer

Showing 50 of 144 citing papers.

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds cs.LG · 2026-05-12 · unverdicted · none · ref 45
Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
WildChat: 1M ChatGPT Interaction Logs in the Wild cs.CL · 2024-05-02 · accept · none · ref 27
WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small cs.LG · 2022-11-01 · conditional · none · ref 56
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging cs.LG · 2026-05-20 · unverdicted · none · ref 102
LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.
Pointwise Generalization in Deep Neural Networks cs.LG · 2026-05-18 · unverdicted · none · ref 25
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method cs.LG · 2026-05-18 · unverdicted · none · ref 100
Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.
What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions cs.LG · 2026-05-14 · unverdicted · none · ref 3
Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.
A Majorization-Minimization with Monte Carlo Approach for Hyperparameter Estimation math.NA · 2026-05-13 · unverdicted · none · ref 147
M³C replaces the hard hyperparameter optimization with a sequence of simpler problems using a majorant for the log-determinant approximated via Monte Carlo, with proven high-probability convergence to a critical point under assumptions.
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment cs.CL · 2026-05-13 · unverdicted · none · ref 5
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 3
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection cs.LG · 2026-05-13 · conditional · none · ref 3
ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inference at matched accuracy.
Identifying the nonlinear string dynamics with port-Hamiltonian neural networks cs.LG · 2026-05-12 · unverdicted · none · ref 77
Port-Hamiltonian neural networks extended to PDEs recover the Hamiltonian and dissipation of nonlinear string dynamics from data and outperform non-physics-informed baselines.
CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks cs.LG · 2026-05-12 · unverdicted · none · ref 39
CAWI replaces standard random initialization of input-to-hidden weights in randomized neural networks with samples drawn from a data-fitted copula that preserves observed feature dependencies, yielding consistent accuracy gains on 83 classification benchmarks.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data cs.LG · 2026-05-08 · unverdicted · none · ref 3
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits math.OC · 2026-05-08 · unverdicted · none · ref 22
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models stat.ML · 2026-05-08 · unverdicted · none · ref 39
TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions stat.ML · 2026-05-07 · unverdicted · none · ref 150
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection cs.LG · 2026-05-07 · unverdicted · none · ref 14
K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.
LAPRAS : Learning-Augmented PRivate Answering for linear query Streams cs.CR · 2026-05-03 · unverdicted · none · ref 50
LAPRAS uses predictions to answer likely queries with the offline Matrix Mechanism and paces residual budget for unpredicted queries via unbiased stopping-time estimation from the first few unexpected arrivals, achieving near-offline utility when overlap is high.
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting cs.GR · 2026-04-30 · unverdicted · none · ref 3
FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.
Benign Overfitting in Adversarial Training for Vision Transformers cs.LG · 2026-04-21 · unverdicted · none · ref 13
Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback cs.LG · 2026-04-21 · unverdicted · none · ref 10
Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.
FaceParts: Segmentation and Editing of Gaussian Splatting cs.GR · 2026-03-25 · unverdicted · none · ref 8
FaceParts performs unsupervised segmentation of facial features in Gaussian Splatting avatars and supports precise editing and cross-avatar part transfer using feature disentanglement, density clustering, and FLAME anchoring.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory cs.CL · 2024-10-14 · unverdicted · none · ref 3
LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts cs.LG · 2024-08-28 · conditional · none · ref 13
Loss-Free Balancing keeps expert loads balanced in MoE models by dynamically adjusting routing-score biases based on recent usage, avoiding auxiliary-loss interference and yielding better performance.
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data cs.LG · 2024-06-06 · conditional · none · ref 3
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
Improving Dictionary Learning with Gated Sparse Autoencoders cs.LG · 2024-04-24 · unverdicted · none · ref 65
Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 81
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation cs.CV · 2023-10-09 · unverdicted · none · ref 248
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers cs.CL · 2023-09-15 · unverdicted · none · ref 76
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
DreamFusion: Text-to-3D using 2D Diffusion cs.CV · 2022-09-29 · accept · none · ref 3
Optimizes a Neural Radiance Field via probability density distillation from a 2D diffusion model to produce text-conditioned 3D scenes viewable from any angle.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 2
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards cs.LG · 2026-05-20 · unverdicted · none · ref 3
DelTA estimates token coefficients to amplify discriminative directions in token-gradient vectors, reweighting the RLVR surrogate to produce more contrastive side-wise centroids and yielding 3.26 and 2.62 point gains on math benchmarks for 8B and 14B Qwen3 models.
Divide and Contrast: Learning Robust Temporal Features without Augmentation cs.LG · 2026-05-20 · unverdicted · none · ref 9
Di-COT is an unsupervised contrastive method that stochastically partitions time-series windows into overlapping sub-blocks to learn representations without augmentation, reporting SOTA results on classification and transfer tasks across multiple benchmarks while cutting training time.
Behavior-Consistent Deep Reinforcement Learning cs.LG · 2026-05-20 · unverdicted · none · ref 205 · 2 links
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
HORST: Composing Optimizer Geometries for Sparse Transformer Training cs.LG · 2026-05-20 · unverdicted · none · ref 7
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
Towards Understanding Self-Pretraining for Sequence Classification cs.LG · 2026-05-20 · unverdicted · none · ref 4
Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.
Neural Collapse by Design: Learning Class Prototypes on the Hypersphere cs.LG · 2026-05-19 · unverdicted · none · ref 60 · 2 links
Supervised classification reaches neural collapse by design via normalized prototype losses on the hypersphere, outperforming CE and SCL on ImageNet-1K and other benchmarks with faster convergence and better transfer.
Learning the dynamics of nonlinear systems with regional stability guarantees through linear matrix inequality constraints eess.SY · 2026-05-18 · unverdicted · none · ref 91
Learns regionally stable RNN models from input-output data by deriving LMI constraints from generalized sector conditions on deadzone activations and a barrier function to certify forward invariance on a compact set.
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents cs.AI · 2026-05-18 · unverdicted · none · ref 3
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
TabKDE: Simple and Scalable Tabular Data Generation with Kernel Density Estimates cs.LG · 2026-05-17 · conditional · none · ref 3
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making cs.LG · 2026-05-15 · unverdicted · none · ref 262
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices cs.CV · 2026-05-15 · unverdicted · none · ref 7
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems cs.CL · 2026-05-15 · unverdicted · none · ref 43
Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image cs.CV · 2026-05-14 · unverdicted · none · ref 3
Sat3DGen improves geometric RMSE from 6.76m to 5.20m and FID from ~40 to 19 for street-level 3D generation from satellite images via geometry-centric constraints and perspective training.
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper cs.CV · 2026-05-14 · unverdicted · none · ref 3
Sparse interleaved multi-view inputs with DenseWarper outperform traditional dense simultaneous multi-view methods for 3D human pose estimation on Human3.6M and MPI-INF-3DHP datasets.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 8
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Composition of Memory Experts for Diffusion World Models cs.LG · 2026-05-12 · unverdicted · none · ref 3
A compositional diffusion world model integrates three specialized memory experts via contrastive product-of-experts to improve temporal consistency, past recall, and navigation while scaling to long contexts without quadratic costs.
Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm cs.CL · 2026-05-11 · unverdicted · none · ref 3
Theoretical analysis of continual factual knowledge acquisition shows data replay stabilizes pretrained knowledge by shifting convergence dynamics while regularization only slows forgetting, leading to the STOC method for attention-based replay selection.
Modeling Implicit Conflict Monitoring Mechanisms against Stereotypes in LLMs cs.SI · 2026-05-10 · unverdicted · none · ref 80
LLMs contain identifiable COCO neurons that enable implicit self-correction against stereotypes; targeted editing of these neurons improves fairness and robustness to jailbreaks while preserving generation quality.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer