{"total":13,"items":[{"citing_arxiv_id":"2606.28057","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MultiHashFormer: Hash-based Generative Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-26T13:03:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18033","ref_index":34,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning","primary_cat":"cs.CL","submitted_at":"2026-06-16T15:09:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Broad empirical evaluation finds that fine-tuning heuristics for source-language choice in cross-lingual transfer do not hold reliably under in-context learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08810","ref_index":50,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Continuous Language Diffusion as a Decoder-Interface Problem","primary_cat":"cs.CL","submitted_at":"2026-06-07T20:00:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31494","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Consolidating Rewarded Perturbations for LLM Post-Training","primary_cat":"cs.CL","submitted_at":"2026-05-29T16:16:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoRP consolidates reward-weighted perturbations into a single model via low-rank structure, improving base LLMs by 8.1 points on average while using one-tenth the budget of prior ensembles and one forward pass.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22567","ref_index":64,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance","primary_cat":"cs.CL","submitted_at":"2026-05-21T14:47:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20128","ref_index":40,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-19T17:15:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09781","ref_index":48,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution","primary_cat":"cs.NE","submitted_at":"2026-05-10T22:00:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"space into cells, retaining the highest-quality solution per cell. QD has succeeded in robotics [9], game design [23], and policy opti- mization [46]. Recent Uncertain QD (UQD) work [16, 17] extended these methods to stochastic domains through adaptive sampling and extraction mechanisms, demonstrating that uncertainty- aware archive maintenance significantly improves solution quality. Multi-objective extensions [48] and automated descriptor discovery [8, 24] have further expanded QD's applicability. Recent work has begun exploring QD for text generation. Quality-Diversity through AI Feedback (QDAIF) [4] demonstrated QD using LLMs as both generators and behavior evaluators, achieving diverse story generation. Quality Diversity through Human Feedback (QDHF) [12] showed that diversity metrics can"},{"citing_arxiv_id":"2507.13841","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Challenge and Reward of Fair Play in Narrative: A Computational Approach","primary_cat":"cs.CL","submitted_at":"2025-07-18T11:55:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Develops an information-theoretic framework showing surprise and coherence trade off in single reader models but coexist via pre- and post-revelation modes, operationalized as reference-less LLM metrics for fair play and validated on generated stories plus classic detective fiction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.17891","ref_index":158,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Scaling Diffusion Language Models via Adaptation from Autoregressive Models","primary_cat":"cs.CL","submitted_at":"2024-10-23T14:04:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2311.12983","ref_index":123,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"GAIA: a benchmark for General AI Assistants","primary_cat":"cs.CL","submitted_at":"2023-11-21T20:34:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.10403","ref_index":97,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PaLM 2 Technical Report","primary_cat":"cs.CL","submitted_at":"2023-05-17T17:46:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2303.17564","ref_index":78,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"BloombergGPT: A Large Language Model for Finance","primary_cat":"cs.LG","submitted_at":"2023-03-30T17:30:36+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2207.14255","ref_index":125,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Efficient Training of Language Models to Fill in the Middle","primary_cat":"cs.CL","submitted_at":"2022-07-28T17:40:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}