MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Gonzalez and Ion Stoica , booktitle=
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.
DoRA improves LoRA by decomposing weights into magnitude and direction and updating only direction with low-rank matrices, closing much of the gap to full fine-tuning.
Sticky factorial HDP-HMMs applied to multimodal valence-arousal trajectories identify interpretable persistent emotional regimes in conversations, outperforming Gaussian HMM baselines in consistency metrics and enabling context-augmented LLM responses.
citing papers explorer
-
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
-
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
-
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.
-
DoRA: Weight-Decomposed Low-Rank Adaptation
DoRA improves LoRA by decomposing weights into magnitude and direction and updating only direction with low-rank matrices, closing much of the gap to full fine-tuning.
-
Multimodal Hidden Markov Models for Persistent Emotional State Tracking
Sticky factorial HDP-HMMs applied to multimodal valence-arousal trajectories identify interpretable persistent emotional regimes in conversations, outperforming Gaussian HMM baselines in consistency metrics and enabling context-augmented LLM responses.