super hub Mixed citations

Advances in neural information processing systems , volume=

author=, Language models are few-shot learners

Mixed citation behavior. Most common role is background (64%).

130 Pith papers citing it

Background 64% of classified citations

browse 130 citing papers more from author=

hub tools

JSON dossier citing papers JSON

citation-role summary

background 11 dataset 1 method 1 other 1

citation-polarity summary

background 9 unclear 2 support 1 use dataset 1 use method 1

claims ledger

background phenomenon might further improve the performance of the co-scientist as a tool for scientific discovery. Improved multimodal reasoning and tool-use capabilities.Some of the most interesting data in scientific publications is not written in text but may be encoded visually in figures and charts. However, even state-of-the-art frontier models may not comprehensively utilize such data with optimal reasoning [89] and the AI co-scientist system is unlikely to be an exception. Stronger benchmarks and
other Bernoulli random variables with α(q) success probability. According to the Hoeffding's inequal- ity Hoeffding (1963) we have P(|m−α(q)N| ≥t)≤2 exp −2t2 N ,(8) wheret≥0. It implies P(|m−α(q)N| ≤t)≥1−2 exp −2t2 N ⇔P(−t≤m−α(q)N≤t)≥1−2 exp −2t2 N ⇒P(−t≤m−α(q)N)≥1−2 exp −2t2 N . By settingt= q N 2 log 2 ϵ from anyϵ >0, one may check, P m≥α(q)N− r N 2 log 1 ϵ ! ≥1−ϵ.(9) In order to ensurem≥N/2, we need to have α(q)N− r N 2 log 1 ϵ ≥ 1 2 N ⇒α(q)≥ 1 2 + r 1 2N log 1 ϵ (10) B Discussion T
background performance would not be significantly degraded if projBx were replaced with a nonzero constant value more representative of the training distribution. To correct for this, in our causality experiments we ablate directions by replacing them with theirmeanvalues computed across a dataset, instead of zeroing them out. Specifically, to ablate a directionu, we use the formula: x′ =x+P u(x−x)(21) whereP u is the projection matrix foruand xis the mean representation. E. Static interpretability analysi
background an All-Reduce operation so that they can use the identical gradient to update the model parameters. The All-Reduce operation accumulates distributed gradients (say Xi at i worker) from all workers (say P workers) using a reduction operation (typically sum or mean in training), which can be formally represented X=AllReduce(X 1, X2, ..., XP ) = PX i=1 Xi.(5) The gradients have the same dimensionality as the model weights, which means additional memory is required to store them for communication an

authors

author= Language models are few-shot learners

co-cited works

representative citing papers

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

Progress measures for grokking via mechanistic interpretability

cs.LG · 2023-01-12 · accept · novelty 8.0

Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.

When Does Model Collapse Occur in Structured Interactive Learning?

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic results for M-estimators.

BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Convergence of difference inclusions via a diameter criterion

math.OC · 2026-05-14 · unverdicted · novelty 7.0

A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.

Language-Induced Priors for Domain Adaptation

cs.LG · 2026-05-14 · conditional · novelty 7.0

Language-Induced Priors from LLMs guide source selection in cold-start domain adaptation through an EM algorithm, matching oracle MSE under a correct prior and remaining asymptotically consistent.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

cs.CL · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces TBPO, which derives a Bregman-divergence density-ratio matching objective for token-level preference optimization that generalizes DPO while preserving the induced optimal policy.

Variance-aware Reward Modeling with Anchor Guidance

stat.ML · 2026-05-12 · unverdicted · novelty 7.0

Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.

Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.

Logic-Regularized Verifier Elicits Reasoning from LLMs

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.

AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification

astro-ph.IM · 2026-05-07 · unverdicted · novelty 7.0

AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.

TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation

q-bio.CB · 2026-05-06 · unverdicted · novelty 7.0

TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing cs.CR · 2026-05-11 · unverdicted · none · ref 9
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.

Advances in neural information processing systems , volume=

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer