super hub Mixed citations

Advances in neural information processing systems , volume=

author=, Language models are few-shot learners

Mixed citation behavior. Most common role is background (67%).

130 Pith papers citing it

Background 67% of classified citations

browse 130 citing papers more from author=

hub tools

JSON dossier citing papers JSON

citation-role summary

background 10 method 1 other 1

citation-polarity summary

background 8 unclear 2 support 1 use method 1

claims ledger

background phenomenon might further improve the performance of the co-scientist as a tool for scientific discovery. Improved multimodal reasoning and tool-use capabilities.Some of the most interesting data in scientific publications is not written in text but may be encoded visually in figures and charts. However, even state-of-the-art frontier models may not comprehensively utilize such data with optimal reasoning [89] and the AI co-scientist system is unlikely to be an exception. Stronger benchmarks and
other Bernoulli random variables with α(q) success probability. According to the Hoeffding's inequal- ity Hoeffding (1963) we have P(|m−α(q)N| ≥t)≤2 exp −2t2 N ,(8) wheret≥0. It implies P(|m−α(q)N| ≤t)≥1−2 exp −2t2 N ⇔P(−t≤m−α(q)N≤t)≥1−2 exp −2t2 N ⇒P(−t≤m−α(q)N)≥1−2 exp −2t2 N . By settingt= q N 2 log 2 ϵ from anyϵ >0, one may check, P m≥α(q)N− r N 2 log 1 ϵ ! ≥1−ϵ.(9) In order to ensurem≥N/2, we need to have α(q)N− r N 2 log 1 ϵ ≥ 1 2 N ⇒α(q)≥ 1 2 + r 1 2N log 1 ϵ (10) B Discussion T
background performance would not be significantly degraded if projBx were replaced with a nonzero constant value more representative of the training distribution. To correct for this, in our causality experiments we ablate directions by replacing them with theirmeanvalues computed across a dataset, instead of zeroing them out. Specifically, to ablate a directionu, we use the formula: x′ =x+P u(x−x)(21) whereP u is the projection matrix foruand xis the mean representation. E. Static interpretability analysi
background an All-Reduce operation so that they can use the identical gradient to update the model parameters. The All-Reduce operation accumulates distributed gradients (say Xi at i worker) from all workers (say P workers) using a reduction operation (typically sum or mean in training), which can be formally represented X=AllReduce(X 1, X2, ..., XP ) = PX i=1 Xi.(5) The gradients have the same dimensionality as the model weights, which means additional memory is required to store them for communication an

authors

author= Language models are few-shot learners

co-cited works

representative citing papers

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

Progress measures for grokking via mechanistic interpretability

cs.LG · 2023-01-12 · accept · novelty 8.0

Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.

When Does Model Collapse Occur in Structured Interactive Learning?

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic results for M-estimators.

BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

stat.ML · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Convergence of difference inclusions via a diameter criterion

math.OC · 2026-05-14 · unverdicted · novelty 7.0

A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.

Language-Induced Priors for Domain Adaptation

cs.LG · 2026-05-14 · conditional · novelty 7.0

Language-Induced Priors from LLMs guide source selection in cold-start domain adaptation through an EM algorithm, matching oracle MSE under a correct prior and remaining asymptotically consistent.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

cs.CL · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.

Variance-aware Reward Modeling with Anchor Guidance

stat.ML · 2026-05-12 · unverdicted · novelty 7.0

Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.

Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.

Logic-Regularized Verifier Elicits Reasoning from LLMs

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.

AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification

astro-ph.IM · 2026-05-07 · unverdicted · novelty 7.0

AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.

TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation

q-bio.CB · 2026-05-06 · unverdicted · novelty 7.0

TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

citing papers explorer

Showing 50 of 130 citing papers.

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds cs.LG · 2026-05-12 · unverdicted · none · ref 44
Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
Progress measures for grokking via mechanistic interpretability cs.LG · 2023-01-12 · accept · none · ref 4
Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.
When Does Model Collapse Occur in Structured Interactive Learning? cs.LG · 2026-05-19 · unverdicted · none · ref 50
Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic results for M-estimators.
BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation cs.GR · 2026-05-19 · unverdicted · none · ref 138
BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.
Pointwise Generalization in Deep Neural Networks cs.LG · 2026-05-18 · unverdicted · none · ref 48
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18 · unverdicted · none · ref 214 · 2 links
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 212
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Convergence of difference inclusions via a diameter criterion math.OC · 2026-05-14 · unverdicted · none · ref 128
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
Language-Induced Priors for Domain Adaptation cs.LG · 2026-05-14 · conditional · none · ref 8
Language-Induced Priors from LLMs guide source selection in cold-start domain adaptation through an EM algorithm, matching oracle MSE under a correct prior and remaining asymptotically consistent.
BOOKMARKS: Efficient Active Storyline Memory for Role-playing cs.CL · 2026-05-13 · unverdicted · none · ref 15
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching cs.CL · 2026-05-12 · unverdicted · none · ref 58 · 2 links
TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Variance-aware Reward Modeling with Anchor Guidance stat.ML · 2026-05-12 · unverdicted · none · ref 32
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition cs.LG · 2026-05-11 · unverdicted · none · ref 2
Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.
The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently cs.LG · 2026-05-11 · unverdicted · none · ref 85
Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization cs.LG · 2026-05-11 · unverdicted · none · ref 33
LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain cs.CL · 2026-05-09 · unverdicted · none · ref 1
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data cs.LG · 2026-05-08 · unverdicted · none · ref 288
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification cs.AI · 2026-05-08 · conditional · none · ref 29
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes cs.CL · 2026-05-07 · unverdicted · none · ref 21
MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.
Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models cs.AI · 2026-05-07 · unverdicted · none · ref 49
Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.
Logic-Regularized Verifier Elicits Reasoning from LLMs cs.CL · 2026-05-07 · unverdicted · none · ref 76
LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification astro-ph.IM · 2026-05-07 · unverdicted · none · ref 38
AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.
TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation q-bio.CB · 2026-05-06 · unverdicted · none · ref 7
TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.
A foundation model of vision, audition, and language for in-silico neuroscience q-bio.NC · 2026-05-05 · unverdicted · none · ref 115
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise cs.CL · 2026-05-05 · unverdicted · none · ref 5
NoisyCausal benchmark tests LLMs on causal reasoning with structured noise, and a modular LLM-plus-causal-graph framework outperforms baselines while generalizing to Cladder.
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 65 · 2 links
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
MUCOCO: Automated Consistency Testing of Code LLMs cs.SE · 2026-04-21 · unverdicted · none · ref 26
MUCOCO applies semantic-preserving mutation analysis to automatically expose inconsistent behaviors in code LLMs, detecting inconsistencies in about 15% of cases across 7 models and 4 tasks while outperforming the TURBULENCE baseline.
Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys cs.AI · 2026-04-19 · unverdicted · none · ref 14
A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data cs.LG · 2024-06-06 · conditional · none · ref 64
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
Detecting Pretraining Data from Large Language Models cs.CL · 2023-10-25 · conditional · none · ref 112
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 181
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers cs.CL · 2023-09-15 · unverdicted · none · ref 165
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 150
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Eliciting Latent Predictions from Transformers with the Tuned Lens cs.LG · 2023-03-14 · accept · none · ref 21
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
Convex Optimization for Alignment and Preference Learning on a Single GPU cs.LG · 2026-05-22 · unverdicted · none · ref 2
COALA applies convex optimization reformulations of neural networks to direct preference optimization, claiming single-GPU training with ~18% of DPO's TFLOPs and competitive performance on multiple datasets and models up to 8B parameters.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 189
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving cs.CV · 2026-05-19 · unverdicted · none · ref 14
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation cs.CL · 2026-05-19 · unverdicted · none · ref 27
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.
ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability cs.LG · 2026-05-19 · unverdicted · none · ref 110
ST-TGExplainer disentangles stability and transition patterns in temporal graphs via a self-explainable TGNN guided by a disentangled information bottleneck objective to produce more faithful explanations.
A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation cs.CL · 2026-05-19 · unverdicted · none · ref 88
MAFIG is a multi-agent framework that uses LLM agents and evaluators to generate reading comprehension items with significantly higher adherence to specified feature constraints than single-agent baselines.
Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models cs.AI · 2026-05-17 · conditional · none · ref 7
A compact 25M chess move predictor exceeds larger fine-tuned models on puzzles, indicating memorization in earlier claims, while LLM-Modulo raises general LLM move accuracy from 1.2% to 21.2% and validity to 95.3%.
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization cs.CL · 2026-05-17 · unverdicted · none · ref 6
Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.
OProver: A Unified Framework for Agentic Formal Theorem Proving cs.CL · 2026-05-17 · unverdicted · none · ref 1
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making cs.CL · 2026-05-17 · unverdicted · none · ref 4
Frontier LLMs exhibit bias from stigmatizing language in clinical vignettes across four conditions, skewing decisions toward less aggressive management, with limited mitigation from Chain-of-Thought or self-debiasing prompts.
Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning cs.LG · 2026-05-15 · unverdicted · none · ref 25
A 149M-parameter distributional energy-based verifier with low-rank adapter ensemble reduces constraint violations in structured LLM reasoning and outperforms or matches much larger models on five benchmarks.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 164
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Probing Privacy Leaks in LLM-based Code Generation via Test Generation cs.SE · 2026-05-14 · unverdicted · none · ref 8
A test-driven pipeline with an auto-constructed privacy feature library detects 2.56 times more confirmed privacy leaks in LLM-based code generation than existing baselines.
Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection cs.AI · 2026-05-13 · unverdicted · none · ref 48
MSIFR stops faulty LLM generations early via staged rule-based checks, reducing token consumption 11-78% with no accuracy loss.
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents cs.CL · 2026-05-13 · unverdicted · none · ref 120 · 2 links
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data cs.RO · 2026-05-13 · unverdicted · none · ref 61
A co-evolutionary VLM-VGM loop on 500 unlabeled images raises planner success by 30 points and simulator success by 48 percent while beating fully supervised baselines.

Advances in neural information processing systems , volume=

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer