Canonical reference

Title resolution pending

Deep learning , author= · 2016

Canonical reference. 71% of citing Pith papers cite this work as background.

144 Pith papers citing it

Background 71% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 24 method 5 baseline 2

citation-polarity summary

background 22 use method 5 baseline 2 unclear 2

representative citing papers

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.

A Majorization-Minimization with Monte Carlo Approach for Hyperparameter Estimation

math.NA · 2026-05-13 · unverdicted · novelty 7.0

M³C replaces the hard hyperparameter optimization with a sequence of simpler problems using a majorant for the log-determinant approximated via Monte Carlo, with proven high-probability convergence to a critical point under assumptions.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Pareto-Guided Optimal Transport for Multi-Reward Alignment

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection

cs.LG · 2026-05-13 · conditional · novelty 7.0

ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inference at matched accuracy.

Identifying the nonlinear string dynamics with port-Hamiltonian neural networks

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Port-Hamiltonian neural networks extended to PDEs recover the Hamiltonian and dissipation of nonlinear string dynamics from data and outperform non-physics-informed baselines.

CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

CAWI replaces standard random initialization of input-to-hidden weights in randomized neural networks with samples drawn from a data-fitted copula that preserves observed feature dependencies, yielding consistent accuracy gains on 83 classification benchmarks.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.

LAPRAS : Learning-Augmented PRivate Answering for linear query Streams

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

LAPRAS uses predictions to answer likely queries with the offline Matrix Mechanism and paces residual budget for unpredicted queries via unbiased stopping-time estimation from the first few unexpected arrivals, achieving near-offline utility when overlap is high.

FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting

cs.GR · 2026-04-30 · unverdicted · novelty 7.0

FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.

FaceParts: Segmentation and Editing of Gaussian Splatting

cs.GR · 2026-03-25 · unverdicted · novelty 7.0

FaceParts performs unsupervised segmentation of facial features in Gaussian Splatting avatars and supports precise editing and cross-avatar part transfer using feature disentanglement, density clustering, and FLAME anchoring.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

cs.CL · 2024-10-14 · unverdicted · novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

citing papers explorer

Showing 50 of 144 citing papers.

Learning the Preferences of a Learning Agent cs.AI · 2026-05-09 · unverdicted · none · ref 3
Formalizes preference learning from a no-regret or Boltzmann-converging learner with theoretical guarantees or impossibility results for IRL algorithms.
Kinematics-Driven Gaussian Shape Deformation for Blurry Monocular Dynamic Scenes cs.CV · 2026-05-09 · unverdicted · none · ref 8
Kinematics-GS reparameterizes Gaussian shapes along motion trajectories with a kinematic prior to reconstruct dynamic 3D scenes from blurry monocular videos by separating dynamic and static components and using coarse-to-fine optimization.
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation stat.ML · 2026-05-08 · unverdicted · none · ref 3
CONTRA generates sharp multi-dimensional conformal prediction regions by defining nonconformity scores as distances from the center in the latent space of a normalizing flow.
On the Blessing of Pre-training in Weak-to-Strong Generalization cs.LG · 2026-05-07 · unverdicted · none · ref 108
Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models cs.CV · 2026-05-06 · unverdicted · none · ref 3
Hallucinations in diffusion models are driven by local intrinsic dimension instabilities on the manifold, which Intrinsic Quenching corrects by deflating it.
Parallel Prefix Verification for Speculative Generation cs.AI · 2026-05-05 · unverdicted · none · ref 3
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition cs.CL · 2026-05-04 · unverdicted · none · ref 3
InfoLaw models pretraining as information accumulation where quality sets information density and repetition causes scale-dependent diminishing returns, predicting loss with low error on unseen mixtures and larger scales up to 7B models and 425B tokens.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 66
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
SBCA: Cross-Modal BERT-driven Actor-Critic for Multi-Asset Portfolio Optimization q-fin.CP · 2026-05-02 · unverdicted · none · ref 34
SBCA is a reinforcement learning framework using BERT cross-modal fusion and Actor-Critic to integrate price data with sentiment text for multi-asset portfolio optimization with practical trading constraints.
A unified perspective on fine-tuning and sampling with diffusion and flow models stat.ML · 2026-04-30 · unverdicted · none · ref 7
A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.
Autoformalizing Memory Specifications with Agents cs.AR · 2026-04-30 · unverdicted · none · ref 3
An agent system autoformalizes industry DRAM specifications into DRAMPyML for verification tasks like assertion generation, with DRAMBench dataset released for benchmarking.
TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data cs.LG · 2026-04-27 · unverdicted · none · ref 3
TTCD uses a non-stationary feature learner and reconstruction-guided distillation inside a transformer to infer contemporaneous and lagged causal graphs from non-stationary time series without strong noise assumptions.
When Quotes Crumble: Detecting Transient Mechanical Liquidity Erosion in Limit Order Books cs.LG · 2026-04-23 · unverdicted · none · ref 3
A simulation-grounded neural detection framework identifies transient mechanical liquidity erosion in limit order books with 36% AUC gain over rule-based baselines.
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability cs.LG · 2026-04-23 · conditional · none · ref 3
Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.
Alignment has a Fantasia Problem cs.AI · 2026-04-23 · unverdicted · none · ref 3
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
The Last Harness You'll Ever Build cs.AI · 2026-04-22 · unverdicted · none · ref 3
A two-level evolution framework automates the design of task-specific harnesses for AI agents by optimizing both per-task performance and a reusable meta-blueprint that enables adaptation to new domains without human engineering.
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling cs.LG · 2026-04-22 · unverdicted · none · ref 93
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models cs.AI · 2026-04-21 · unverdicted · none · ref 3
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
S2H-DPO: Hardness-Aware Preference Optimization for Vision-Language Models cs.CV · 2026-04-20 · unverdicted · none · ref 7
S2H-DPO generates hierarchical prompt-driven preference pairs to improve multi-image reasoning in VLMs while keeping single-image performance intact.
Distributional Off-Policy Evaluation with Deep Quantile Process Regression stat.ML · 2026-04-20 · unverdicted · none · ref 10
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents cs.CL · 2026-04-20 · unverdicted · none · ref 3
Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.
Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles cs.LG · 2026-04-20 · unverdicted · none · ref 14
E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.
Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs cs.AI · 2026-04-20 · unverdicted · none · ref 3
A parameter-free decomposition in MoE models separates routing control from content, showing that expert trajectories cluster tokens by semantic function across languages and forms, making paths rather than experts the natural unit of interpretability.
Characterizing Model-Native Skills cs.AI · 2026-04-19 · conditional · none · ref 39
Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 52
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics cs.LG · 2025-11-11 · conditional · none · ref 81
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
LongLive: Real-time Interactive Long Video Generation cs.CV · 2025-09-26 · conditional · none · ref 54
LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning cs.AI · 2025-07-01 · conditional · none · ref 3
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
MoBA: Mixture of Block Attention for Long-Context LLMs cs.LG · 2025-02-18 · unverdicted · none · ref 3
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
Process Reinforcement through Implicit Rewards cs.LG · 2025-02-03 · conditional · none · ref 3
PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies cs.RO · 2024-12-13 · conditional · none · ref 9
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 67
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference cs.CV · 2024-10-06 · accept · none · ref 52
SparseVLM uses text-guided attention to prune and recycle visual tokens in VLMs, delivering 54% FLOPs reduction and 37% lower latency with 97% accuracy retention on LLaVA.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 202
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval cs.LG · 2024-09-16 · conditional · none · ref 3
RetrievalAttention approximates full attention in long-context LLMs by retrieving relevant KV vectors from CPU-based ANNS indexes with an attention-aware algorithm, achieving near-full accuracy while accessing only 1-3% of the data.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer cs.CV · 2024-08-12 · unverdicted · none · ref 39
CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models cs.AI · 2024-08-01 · conditional · none · ref 4
Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
Revisiting Feature Prediction for Learning Visual Representations from Video cs.CV · 2024-02-15 · conditional · none · ref 6
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
Zephyr: Direct Distillation of LM Alignment cs.LG · 2023-10-25 · accept · none · ref 37
Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection cs.CL · 2023-10-17 · unverdicted · none · ref 61
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting cs.CL · 2023-10-17 · conditional · none · ref 41
LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets cs.AI · 2023-10-10 · unverdicted · none · ref 3
At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment cs.CV · 2023-10-03 · unverdicted · none · ref 77
LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models cs.CL · 2023-09-07 · conditional · none · ref 47
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
Is Conditional Generative Modeling all you need for Decision-Making? cs.LG · 2022-11-28 · unverdicted · none · ref 5
Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.
Unsupervised Dense Information Retrieval with Contrastive Learning cs.IR · 2021-12-16 · unverdicted · none · ref 114
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 193
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
VisualBERT: A Simple and Performant Baseline for Vision and Language cs.CV · 2019-08-09 · conditional · none · ref 90
VisualBERT is a Transformer model that implicitly aligns text and image regions through self-attention and achieves competitive or superior results on VQA, VCR, NLVR2, and Flickr30K after pre-training on captions.
torchtune: PyTorch native post-training library cs.LG · 2026-05-20 · unverdicted · none · ref 3
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.
Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting cs.LG · 2026-05-19 · unverdicted · none · ref 52
KUP-BI distills continuation-style knowledge from a train-only historical library to supply an approximate post-target proxy that is fused into forecasting backbones for improved performance on public datasets.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer