LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
Qwen2.5: A party of foundation models, September 2024
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
baseline 1
method 1
citation-polarity summary
representative citing papers
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.