HumorRank ranks nine LLMs on textual humor using GTVH-grounded pairwise tournaments and Adaptive Swiss aggregation on the SemEval-2026 MWAHAHA dataset, finding that comedic mechanism mastery matters more than scale.
Qwen2.5: A party of foundation models, September 2024
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4representative citing papers
LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.
citing papers explorer
-
HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models
HumorRank ranks nine LLMs on textual humor using GTVH-grounded pairwise tournaments and Adaptive Swiss aggregation on the SemEval-2026 MWAHAHA dataset, finding that comedic mechanism mastery matters more than scale.
-
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
-
Rotation-Preserving Supervised Fine-Tuning
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
-
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.