super hub Canonical reference

Emergent Abilities of Large Language Models

Barret Zoph, Colin Raffel, Jason Wei, Rishi Bommasani, Sebastian Borgeaud, Yi Tay · 2022 · cs.CL · arXiv 2206.07682

Canonical reference. 86% of citing Pith papers cite this work as background.

150 Pith papers citing it

Background 86% of classified citations

open full Pith review browse 150 citing papers more from Barret Zoph arXiv PDF

abstract

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 34 baseline 2

citation-polarity summary

background 31 support 3 baseline 2

claims ledger

abstract Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language model

authors

Barret Zoph Colin Raffel Jason Wei Rishi Bommasani Sebastian Borgeaud Yi Tay

co-cited works

representative citing papers

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

cs.CL · 2023-04-03 · accept · novelty 8.0

Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.

Progress measures for grokking via mechanistic interpretability

cs.LG · 2023-01-12 · accept · novelty 8.0

Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

Smooth Scaling Laws Hide Stepwise Token Learning

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Token loss trajectories follow localized sigmoids whose learning-time spectrum quantitatively reconstructs scaling-law derivatives on T, D, and M axes and enables faster training via distribution reshaping.

DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

DICE formalizes multi-agent LLM coordination as discounted incomplete-information Markov games and introduces Heterogeneous Quantal Response Equilibrium (HQRE) to achieve unique stable equilibria with bounded regret, demonstrated via prompt-control and fine-tuning algorithms on eleven benchmarks.

EvoBrain: Continual Learning of EEG Foundation Models Across Heterogeneous BCI Tasks

cs.AI · 2026-06-01 · unverdicted · novelty 7.0

EvoBrain introduces a continual learning method with Neuro-Spectral Task Normalization and Response-Affinity Distillation to enable unified EEG decoding across heterogeneous BCI tasks.

Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

Attentive-CoT is an attention-guided fine-tuning objective that improves chain-of-thought performance in multimodal LLMs by delaying answer commitment and increasing sustained visual-token access during rationale generation.

Does Capability Transfer to Subjective Behavior -- and Would Our Instruments Tell Us? A Self-Evolving, Trust-by-Construction Evaluation Paradigm

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.

TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

cs.AI · 2026-05-20 · unverdicted · novelty 7.0

A multi-agent pipeline iteratively refines topology optimization outputs to match natural language preferences for branched structures, achieving 60% success rate across replicates in cantilever and phone-stand tasks.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

On the Emergence of Syntax by Means of Local Interaction

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.

PERCEIVE: A Benchmark for Personalized Emotion and Communication Behavior Understanding on Social Media

cs.SI · 2026-04-10 · unverdicted · novelty 7.0

PERCEIVE is the first bilingual benchmark integrating author content, reader emotions from comments, communication behavior, user attributes, and social graphs for personalized social media emotion understanding.

A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

cs.AR · 2026-04-09 · conditional · novelty 7.0

ATLAS is the first silicon-validated simulation framework for 3D-DRAM LLM accelerators, achieving under 8.57% error and over 97% correlation with real hardware while supporting design exploration.

The Shrinking Lifespan of LLMs in Science

cs.DL · 2026-04-08 · unverdicted · novelty 7.0

LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.

Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

Social dynamics in LLM collectives cause representative agents to make less accurate decisions as peer pressure increases through larger adversarial groups, more capable peers, longer arguments, and persuasive styles.

BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.

Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering

cs.SE · 2026-03-27 · unverdicted · novelty 7.0

StackRepoQA shows LLMs reach only moderate accuracy on multi-file Java QA tasks, with gains from graph-based retrieval but frequent reliance on verbatim answer reproduction.

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

cs.AI · 2026-03-17 · unverdicted · novelty 7.0

FactorEngine mines alpha factors as Turing-complete code via LLM-guided directional search, parameter separation, and a multi-agent pipeline that converts financial reports into executable programs, delivering higher IC/ICIR and Sharpe ratios than baselines in backtests.

Retrieval-Augmented Large Language Models for Evidence-Informed Guidance on Cannabidiol Use in Older Adults

cs.IR · 2026-01-16 · unverdicted · novelty 7.0

Retrieval-augmented LLMs produce more cautious and guideline-aligned recommendations on cannabidiol for older adults than standalone models, demonstrated via automated evaluation on 64 diverse scenarios.

A ghost mechanism: An analytical model of abrupt learning in recurrent networks

cs.LG · 2025-01-04 · unverdicted · novelty 7.0

The ghost mechanism derives a 1D canonical model of abrupt learning in RNNs from ghost points of saddle-node bifurcations, predicting an inverse-power-law critical learning rate and gradient-based failure modes.

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

cs.AI · 2024-10-06 · unverdicted · novelty 7.0

PolyMATH is a new 5,000-image benchmark where top MLLMs reach at most 41 percent accuracy on multi-modal mathematical reasoning, with ablation showing minimal gain from text over images.

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

cs.CV · 2024-06-10 · conditional · novelty 7.0

Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.

citing papers explorer

Showing 50 of 93 citing papers after filters.

Smooth Scaling Laws Hide Stepwise Token Learning cs.CL · 2026-06-29 · unverdicted · none · ref 30 · internal anchor
Token loss trajectories follow localized sigmoids whose learning-time spectrum quantitatively reconstructs scaling-law derivatives on T, D, and M axes and enables faster training via distribution reshaping.
DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination cs.LG · 2026-06-06 · unverdicted · none · ref 75 · internal anchor
DICE formalizes multi-agent LLM coordination as discounted incomplete-information Markov games and introduces Heterogeneous Quantal Response Equilibrium (HQRE) to achieve unique stable equilibria with bounded regret, demonstrated via prompt-control and fine-tuning algorithms on eleven benchmarks.
EvoBrain: Continual Learning of EEG Foundation Models Across Heterogeneous BCI Tasks cs.AI · 2026-06-01 · unverdicted · none · ref 23 · internal anchor
EvoBrain introduces a continual learning method with Neuro-Spectral Task Normalization and Response-Affinity Distillation to enable unified EEG decoding across heterogeneous BCI tasks.
Attention-guided Fine-tuning of Multimodal Large Language Models Improves Chain-of-Thought Reasoning cs.CV · 2026-06-01 · unverdicted · none · ref 50 · internal anchor
Attentive-CoT is an attention-guided fine-tuning objective that improves chain-of-thought performance in multimodal LLMs by delaying answer commitment and increasing sustained visual-token access during rationale generation.
Does Capability Transfer to Subjective Behavior -- and Would Our Instruments Tell Us? A Self-Evolving, Trust-by-Construction Evaluation Paradigm cs.CL · 2026-05-27 · unverdicted · none · ref 124 · internal anchor
Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.
TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization cs.AI · 2026-05-20 · unverdicted · none · ref 23 · internal anchor
A multi-agent pipeline iteratively refines topology optimization outputs to match natural language preferences for branched structures, achieving 60% success rate across replicates in cantilever and phone-stand tasks.
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain cs.CL · 2026-05-09 · unverdicted · none · ref 41 · internal anchor
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models cs.AI · 2026-05-07 · unverdicted · none · ref 47 · internal anchor
Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 10 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
On the Emergence of Syntax by Means of Local Interaction cs.CL · 2026-04-20 · unverdicted · none · ref 44 · internal anchor
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
PERCEIVE: A Benchmark for Personalized Emotion and Communication Behavior Understanding on Social Media cs.SI · 2026-04-10 · unverdicted · none · ref 54 · internal anchor
PERCEIVE is the first bilingual benchmark integrating author content, reader emotions from comments, communication behavior, user attributes, and social graphs for personalized social media emotion understanding.
A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators cs.AR · 2026-04-09 · conditional · none · ref 72 · internal anchor
ATLAS is the first silicon-validated simulation framework for 3D-DRAM LLM accelerators, achieving under 8.57% error and over 97% correlation with real hardware while supporting design exploration.
The Shrinking Lifespan of LLMs in Science cs.DL · 2026-04-08 · unverdicted · none · ref 17 · internal anchor
LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives cs.CL · 2026-04-07 · unverdicted · none · ref 10 · internal anchor
Social dynamics in LLM collectives cause representative agents to make less accurate decisions as peer pressure increases through larger adversarial groups, more capable peers, longer arguments, and persuasive styles.
BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration cs.CL · 2026-04-03 · unverdicted · none · ref 32 · internal anchor
BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.
Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering cs.SE · 2026-03-27 · unverdicted · none · ref 57 · internal anchor
StackRepoQA shows LLMs reach only moderate accuracy on multi-file Java QA tasks, with gains from graph-based retrieval but frequent reliance on verbatim answer reproduction.
FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment cs.AI · 2026-03-17 · unverdicted · none · ref 18 · internal anchor
FactorEngine mines alpha factors as Turing-complete code via LLM-guided directional search, parameter separation, and a multi-agent pipeline that converts financial reports into executable programs, delivering higher IC/ICIR and Sharpe ratios than baselines in backtests.
Retrieval-Augmented Large Language Models for Evidence-Informed Guidance on Cannabidiol Use in Older Adults cs.IR · 2026-01-16 · unverdicted · none · ref 17 · internal anchor
Retrieval-augmented LLMs produce more cautious and guideline-aligned recommendations on cannabidiol for older adults than standalone models, demonstrated via automated evaluation on 64 diverse scenarios.
When transformers learn "impossible" languages, what do they learn? cs.CL · 2026-06-29 · unverdicted · none · ref 53 · internal anchor
Transformers on impossible-language variants show gradual grammatical sensitivity loss but sharp long-sentence generation failures, supporting generative deficiency as a link to non-attestation.
Govern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Software cs.SE · 2026-06-26 · unverdicted · none · ref 31 · internal anchor
Study of 930k+ agent PRs shows repository explains ~50% of integration friction variance, with agents concentrating it twice as much as humans (ICC 0.30 vs 0.16) after controls.
When AI Reviews Its Own Code: Recursive Self-Training Collapse in Code LLMs cs.SE · 2026-06-26 · unverdicted · none · ref 96 · internal anchor
Experiments across code LLMs show no-review collapses fastest, human-gated filters slow collapse, and AI self-gates lose effect over time, degenerating to ungated self-training under self-confirming acceptance as proven via gated distributional reweighting and spectral analysis.
One Generator, Any Process: LLM-Conditioning for the LHC hep-ph · 2026-06-22 · unverdicted · none · ref 287 · internal anchor
LLM embeddings condition a generative transformer to enable faster convergence, better performance, and generalization to unseen LHC processes using a single model.
BCL: Bayesian In-Context Learning Framework for Information Extraction cs.CL · 2026-06-17 · unverdicted · none · ref 62 · internal anchor
BCL introduces a particle-filtering Bayesian update framework to systematically refine label representations in in-context learning for information extraction, claiming consistent gains over prior methods.
From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs cs.AI · 2026-06-08 · unverdicted · none · ref 41 · internal anchor
EntropyInfer adaptively allocates inference compute using per-head attention entropy for rigid/dynamic classification during prefilling and compresses KV cache with generated tokens, achieving up to 2.39x speedup on long contexts.
Arithmetic Pedagogy for Language Models cs.CL · 2026-06-03 · unverdicted · none · ref 26 · internal anchor
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
Validity Threats for Foundation Model Research cs.LG · 2026-06-03 · accept · none · ref 103 · internal anchor
Maps common low-compute research strategies for foundation models onto statistical, internal, external, and construct validity threats via a causal-inference lens.
Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning cs.CL · 2026-05-20 · unverdicted · none · ref 47 · internal anchor
A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 22 · internal anchor
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 238 · internal anchor
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
LLM Jaggedness Unlocks Scientific Creativity cs.AI · 2026-05-11 · unverdicted · none · ref 23 · 2 links · internal anchor
Jagged capabilities in LLMs for scientific idea generation can be leveraged through inference-time ensembles to outperform individual models.
Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization cs.LG · 2026-05-11 · unverdicted · none · ref 3 · internal anchor
Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces cs.AI · 2026-05-09 · unverdicted · none · ref 90 · internal anchor
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
The Propagation Field: A Geometric Substrate Theory of Deep Learning cs.LG · 2026-05-08 · unverdicted · none · ref 7 · internal anchor
Neural networks possess a propagation field of trajectories and Jacobians whose quality can be measured and optimized independently of endpoint loss, yielding better unseen-path generalization and reduced forgetting in continual learning.
Self-Consolidating Language Models: Continual Knowledge Incorporation from Context cs.CL · 2026-05-08 · unverdicted · none · ref 60 · 2 links · internal anchor
SCoL trains LLMs via meta-reinforcement learning to generate layer-specific update instructions that improve knowledge acquisition and retention from context streams over standard baselines.
A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints cs.LG · 2026-05-06 · unverdicted · none · ref 55 · internal anchor
A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA cs.AI · 2026-05-05 · unverdicted · none · ref 88 · internal anchor
Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 272 · internal anchor
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unverdicted · none · ref 19 · internal anchor
A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.
Mixture of Heterogeneous Grouped Experts for Language Modeling cs.CL · 2026-04-25 · unverdicted · none · ref 26 · internal anchor
MoHGE achieves standard MoE performance with 20% fewer parameters and balanced GPU utilization via grouped heterogeneous experts, two-level routing, and specialized auxiliary losses.
How Far Are Video Models from True Multimodal Reasoning? cs.CV · 2026-04-21 · unverdicted · none · ref 74 · internal anchor
Current video models succeed on basic understanding but achieve under 25% success on logically grounded generation and near 0% on interactive generation, exposing gaps in multimodal reasoning.
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens q-bio.NC · 2026-04-20 · unverdicted · none · ref 113 · internal anchor
OmniMouse demonstrates data-driven scaling in multi-task brain models on a 150B-token neural dataset, achieving SOTA across prediction, decoding, and forecasting while model size gains saturate.
LLM-AUG: Robust Wireless Data Augmentation with In-Context Learning in Large Language Models cs.LG · 2026-04-20 · unverdicted · none · ref 13 · internal anchor
LLM-AUG applies LLM in-context learning for embedding-space data augmentation in wireless ML, outperforming baselines and reaching near-oracle accuracy with only 15% labeled data on RadioML and IC datasets.
The role of System 1 and System 2 semantic memory structure in human and LLM biases cs.CL · 2026-04-14 · unverdicted · none · ref 77 · internal anchor
Human semantic memory networks for System 1 and System 2 are structurally distinct and consistently relate to implicit gender bias levels, but LLM networks do not exhibit these properties.
Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task cs.LG · 2026-04-14 · unverdicted · none · ref 24 · internal anchor
Transformers show limited adaptive depth use on relational reasoning, with clearer evidence after finetuning on the task.
MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control cs.CV · 2026-04-07 · unverdicted · none · ref 5 · internal anchor
MMEmb-R1 adaptively applies chain-of-thought reasoning to multimodal embeddings via pair-aware counterfactual selection and RL, reaching 71.2 on MMEB-V2 with a 4B model and lower latency.
Identification of quantum generative circuits with parallel quantum neural network quant-ph · 2026-03-03 · unverdicted · none · ref 25 · internal anchor
ParaQuanNet distinguishes eight quantum generative circuits via 99.5% accurate classification of their output data using parallel quantum embeddings and mutually unbiased measurements.
Multi-Agent Home Energy Management Assistant cs.HC · 2026-02-16 · unverdicted · none · ref 20 · internal anchor
HEMA is a multi-agent LLM system with analysis, knowledge, and control agents plus a self-consistency router that enables conversational home energy tasks, evaluated via LLM-simulated users on 23 metrics.
"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework cs.CL · 2026-01-20 · unverdicted · none · ref 12 · internal anchor
COMPACT adaptively fuses multi-teacher CoT supervisions using graph-based consensus, mutual-information adaptability, and loss-based difficulty metrics to improve small language model reasoning performance while mitigating catastrophic forgetting.
Large Language Model Agent for User-friendly Chemical Process Simulations physics.chem-ph · 2026-01-15 · unverdicted · none · ref 14 · internal anchor
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
Zeus: Towards Tuning-Free Foundation Model for Time Series Analysis cs.LG · 2026-07-02 · unverdicted · none · ref 34 · internal anchor
Zeus proposes a multi-scale Transformer with point-wise tokenization and Multi-Objective Temporal Masking to enable tuning-free performance on forecasting, interpolation, and other time series tasks.

Emergent Abilities of Large Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer