MasFACT transfers historical topology priors across tasks via Fused Gromov-Wasserstein optimal transport and PAC-Bayes conservative adaptation to reduce topology forgetting in continual multi-agent settings.
hub
Continual learning for large language models: A survey
19 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
MedEvoEval is an executable longitudinal evaluation framework that converts medical cases into action-gated simulated episodes to track how doctor agents evolve decision-making, resource use, and experience across multiple encounters.
RECAP benchmark finds that six prompt optimization methods show no significant performance gains under proactive continual adaptation to evolving constraints across four LLMs.
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
CAP-TTA triggers context-aware preconditioned LoRA updates on high bias-risk OOD prompts to reduce toxicity in LLM narrative generation while preserving fluency and avoiding catastrophic forgetting.
EMERGE is a benchmark dataset of 233K Wikipedia passages paired with 1.45 million Wikidata edit operations across seven yearly snapshots from 2019 to 2025 for evaluating knowledge graph updates from emerging text.
SETA decomposes parameters into task-specific and shared sparse experts with adaptive anchoring and routing regularization to improve retention and backward transfer in LLM continual learning.
Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.
MADS selects a 15% core set from the 52K Alpaca-GPT4 dataset via activations in Llama-3.2-3B-Instruct, yielding 2.5% average gains on 7B-13B models across six benchmarks versus full-data training.
Self-generated replay from language models nearly eliminates catastrophic forgetting during finetuning except when models are pretrained close to saturation.
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
LifeAlign uses focalized preference optimization and short-to-long memory consolidation via dimensionality reduction to let LLMs align with new preferences while retaining prior knowledge.
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.
AI will evolve from a research tool into a collaborator, fundamentally reshaping scientific collaboration, discovery, publishing, and evaluation while requiring continuous learning and idea diversity for original contributions.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
\textsc{MasFACT}: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer
MasFACT transfers historical topology priors across tasks via Fused Gromov-Wasserstein optimal transport and PAC-Bayes conservative adaptation to reduce topology forgetting in continual multi-agent settings.
-
Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
-
MedEvoEval: Evaluating Continual Evolution of Doctor Agents through Simulated Clinical Episodes
MedEvoEval is an executable longitudinal evaluation framework that converts medical cases into action-gated simulated episodes to track how doctor agents evolve decision-making, resource use, and experience across multiple encounters.
-
RECAP: Regression Evaluation for Continual Adaptation of Prompts
RECAP benchmark finds that six prompt optimization methods show no significant performance gains under proactive continual adaptation to evolving constraints across four LLMs.
-
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
-
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
-
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
-
Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation
CAP-TTA triggers context-aware preconditioned LoRA updates on high bias-risk OOD prompts to reduce toxicity in LLM narrative generation while preserving fluency and avoiding catastrophic forgetting.
-
Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning
SETA decomposes parameters into task-specific and shared sparse experts with adaptive anchoring and routing regularization to improve retention and backward transfer in LLM continual learning.
-
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents
Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.
-
MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning
MADS selects a 15% core set from the 52K Alpaca-GPT4 dataset via activations in Llama-3.2-3B-Instruct, yielding 2.5% average gains on 7B-13B models across six benchmarks versus full-data training.
-
Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay
Self-generated replay from language models nearly eliminates catastrophic forgetting during finetuning except when models are pretrained close to saturation.
-
MeMo: Memory as a Model
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
-
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization
LifeAlign uses focalized preference optimization and short-to-long memory consolidation via dimensionality reduction to let LLMs align with new preferences while retaining prior knowledge.
-
Plasticity Loss in Deep Reinforcement Learning: A Survey
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
-
Phoenix-VL 1.5 Medium Technical Report
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.
-
The Agentification of Scientific Research: A Physicist's Perspective
AI will evolve from a research tool into a collaborator, fundamentally reshaping scientific collaboration, discovery, publishing, and evaluation while requiring continuous learning and idea diversity for original contributions.