Canonical reference

Title resolution pending

Deep learning , author= · 2016

Canonical reference. 71% of citing Pith papers cite this work as background.

144 Pith papers citing it

Background 71% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 24 method 5 baseline 2

citation-polarity summary

background 22 use method 5 baseline 2 unclear 2

representative citing papers

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.

A Majorization-Minimization with Monte Carlo Approach for Hyperparameter Estimation

math.NA · 2026-05-13 · unverdicted · novelty 7.0

M³C replaces the hard hyperparameter optimization with a sequence of simpler problems using a majorant for the log-determinant approximated via Monte Carlo, with proven high-probability convergence to a critical point under assumptions.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Pareto-Guided Optimal Transport for Multi-Reward Alignment

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection

cs.LG · 2026-05-13 · conditional · novelty 7.0

ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inference at matched accuracy.

Identifying the nonlinear string dynamics with port-Hamiltonian neural networks

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Port-Hamiltonian neural networks extended to PDEs recover the Hamiltonian and dissipation of nonlinear string dynamics from data and outperform non-physics-informed baselines.

CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

CAWI replaces standard random initialization of input-to-hidden weights in randomized neural networks with samples drawn from a data-fitted copula that preserves observed feature dependencies, yielding consistent accuracy gains on 83 classification benchmarks.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.

Statistical Consistency and Generalization of Contrastive Representation Learning

cs.LG · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

The paper proves statistical consistency of contrastive loss to optimal ranking via an AUC criterion and derives generalization bounds O(1/m + 1/sqrt(n)) for supervised and O(1/sqrt(m) + 1/sqrt(n)) for self-supervised CRL that explain benefits of large negative sets.

LAPRAS : Learning-Augmented PRivate Answering for linear query Streams

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

LAPRAS uses predictions to answer likely queries with the offline Matrix Mechanism and paces residual budget for unpredicted queries via unbiased stopping-time estimation from the first few unexpected arrivals, achieving near-offline utility when overlap is high.

FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting

cs.GR · 2026-04-30 · unverdicted · novelty 7.0

FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.

FaceParts: Segmentation and Editing of Gaussian Splatting

cs.GR · 2026-03-25 · unverdicted · novelty 7.0

FaceParts performs unsupervised segmentation of facial features in Gaussian Splatting avatars and supports precise editing and cross-avatar part transfer using feature disentanglement, density clustering, and FLAME anchoring.

citing papers explorer

Showing 44 of 144 citing papers.

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models cs.CL · 2023-09-07 · conditional · none · ref 47
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
Is Conditional Generative Modeling all you need for Decision-Making? cs.LG · 2022-11-28 · unverdicted · none · ref 5
Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.
Unsupervised Dense Information Retrieval with Contrastive Learning cs.IR · 2021-12-16 · unverdicted · none · ref 114
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 193
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
VisualBERT: A Simple and Performant Baseline for Vision and Language cs.CV · 2019-08-09 · conditional · none · ref 90
VisualBERT is a Transformer model that implicitly aligns text and image regions through self-attention and achieves competitive or superior results on VQA, VCR, NLVR2, and Flickr30K after pre-training on captions.
torchtune: PyTorch native post-training library cs.LG · 2026-05-20 · unverdicted · none · ref 3
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.
Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting cs.LG · 2026-05-19 · unverdicted · none · ref 52
KUP-BI distills continuation-style knowledge from a train-only historical library to supply an approximate post-target proxy that is fused into forecasting backbones for improved performance on public datasets.
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering cs.CL · 2026-05-19 · unverdicted · none · ref 4
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
UNR-Explainer: Counterfactual Explanations for Unsupervised Node Representation Learning Models cs.LG · 2026-05-17 · unverdicted · none · ref 68
UNR-Explainer applies MCTS to find subgraphs that change k-NN relations in unsupervised node embeddings, claiming superior performance on GraphSAGE and DGI across datasets.
A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning stat.CO · 2026-05-15 · unverdicted · none · ref 87
A recursive cubing framework identifies stable hyperparameter regions for MC dropout uncertainty quantification in spatial deep learning and produces competitive or superior predictive intervals versus a statistical baseline on simulations and land-surface temperature data.
Monetary Policy in the Media Spotlight: Sentiments, Signals, and Economic Impact econ.EM · 2026-05-14 · unverdicted · none · ref 122
Media sentiment indicators from Canadian news, when added to a New Keynesian model with endogenous central-bank response, improve out-of-sample forecasts and account for part of monetary-policy propagation to output and prices.
Information theoretic underpinning of self-supervised learning by clustering cs.LG · 2026-05-12 · unverdicted · none · ref 23
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification cs.LG · 2026-05-12 · unverdicted · none · ref 3
MaskTab is a masked pretraining method for industrial tabular data that delivers measurable gains in classification AUC and KS metrics while enabling effective distillation to smaller models.
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies cs.LG · 2026-05-12 · unverdicted · none · ref 76
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction math.OC · 2026-05-09 · unverdicted · none · ref 98
Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari cs.LG · 2026-05-09 · unverdicted · none · ref 3
Transformer world models on Atari exhibit game-specific scaling regimes, but joint training on 26 environments produces consistent monotonic gains that improve downstream control policies to a median normalized score of 0.770.
Finer is Better (with the Right Scaling) cs.LG · 2026-05-08 · unverdicted · none · ref 5
The block-size paradox in LLM microscaling is caused by underflow in subnormal E4M3 scaling factors; preventing underflow and using 4-over-6 selection resolves it, with brute-force confirming MSE strictly improves as blocks get finer.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 27
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
MIRA: A Score for Conditional Distribution Accuracy and Model Comparison stat.ML · 2026-05-03 · unverdicted · none · ref 24
MIRA is a new analytic score for conditional distribution accuracy derived from equal probability mass assignment, enabling Bayesian model comparison via direct posterior validation.
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments cs.LG · 2026-04-21 · unverdicted · none · ref 3
RaBitQ outperforms TurboQuant in most tested settings for inner-product estimation, nearest-neighbor search, and KV cache quantization, while several TurboQuant runtime and recall results could not be reproduced from the released code.
Curvature-Aware PCA with Geodesic Tangent Space Aggregation for Semi-Supervised Learning cs.LG · 2026-04-20 · unverdicted · none · ref 63
GTSA-PCA replaces global PCA covariance with curvature-weighted local operators and a geodesic alignment step to produce geometry-aware embeddings that improve on standard PCA and UMAP in small-sample high-curvature settings.
Understanding the Prompt Sensitivity cs.CL · 2026-04-20 · unverdicted · none · ref 3
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 6
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion cs.CV · 2024-10-04 · unverdicted · none · ref 3
By fine-tuning DUST3R to output per-timestep pointmaps on scarce dynamic video datasets, MonST3R achieves stronger video depth and pose estimation without explicit motion modeling.
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models cs.CV · 2024-08-09 · unverdicted · none · ref 3
mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.
InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 128
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.
Claw AI Lab: An Autonomous Multi-Agent Research Team cs.AI · 2026-05-21 · unverdicted · none · ref 3
Claw AI Lab presents an interactive multi-agent platform for autonomous AI research that supports customizable teams, real-time control, and a code harness for experiment integration and result integrity.
Accurate, Efficient, and Explainable Deep Learning Approaches for Environmental Science Problems cs.LG · 2026-05-19 · unverdicted · none · ref 91
The work introduces WaLeF/FIDLAr for flood forecasting, CoDiCast for probabilistic weather, and Hypercube-RAG for explainable environmental QA, claiming superior accuracy, efficiency, and interpretability over baselines.
Accelerated Gradient Descent for Faster Convergence with Minimal Overhead cs.LG · 2026-05-15 · unverdicted · none · ref 3
CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.
GradShield: Alignment Preserving Finetuning cs.CL · 2026-05-13 · unverdicted · none · ref 3
GradShield is a data filtering technique using FIHS scores and adaptive thresholding to prevent safety misalignment in finetuned LLMs while preserving utility.
Evaluating Federated Learning approaches for mammography under breast density heterogeneity cs.LG · 2026-05-09 · unverdicted · none · ref 3
FedAvg matches centralized training accuracy on mammography data split by breast density heterogeneity, showing standard FL can handle this clinical variation without special fixes.
Beyond Toy Benchmarks: A Systematic Evaluation of OOD Detection Methods For Plant Pathology Classification cs.CV · 2026-05-09 · unverdicted · none · ref 3
Energy-based fine-tuning outperforms other OOD detection methods on the real-world Plant Pathology 2021 dataset, improving detection over softmax while maintaining in-distribution accuracy.
UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning cs.CV · 2026-05-05 · unverdicted · none · ref 9
UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.
Fine-Grained Graph Generation through Latent Mixture Scheduling cs.AI · 2026-05-04 · unverdicted · none · ref 5
A novel CVAE with mixture scheduling achieves fine-grained structural control in graph generation, showing high quality and controllability on five datasets.
Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection cs.CL · 2026-04-22 · unverdicted · none · ref 4
Multilingual pooling for quality classifiers outperforms monolingual baselines in rank stability and accuracy for LLM pretraining data selection across high- and low-resource languages.
Effects of Cross-lingual Evidence in Multilingual Medical Question Answering cs.CL · 2026-04-22 · unverdicted · none · ref 5
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
Evidence of a Cognitive Shift in AI Education: How Students Are Rethinking Human Intelligence? cs.CY · 2026-04-14 · unverdicted · none · ref 24
Longitudinal poll data from 471 students in AI courses shows a shift toward preferring human intelligence, reaching 65% in technical courses and 90% in design courses by 2026.
Helping Customers in Distress: An LLM-powered Agent that Converses, Probes, and Routes cs.HC · 2026-03-31 · unverdicted · none · ref 3
An LLM-powered triaging agent for banking fraud reports uses multi-turn conversations and synthetic customer simulations to achieve a 30.6% increase in classification accuracy over prior methods.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 165
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 196
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
Stochastic Optimization and Data Science math.OC · 2026-05-16 · unverdicted · none · ref 76
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.
To Use AI as Dice of Possibilities with Timing Computation cs.AI · 2026-05-01 · unreviewed · ref 76
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking cs.CR · 2026-05-01 · unreviewed · ref 39
Consistent Diffusion Language Models cs.LG · 2026-04-30 · unreviewed · ref 3

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer