pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 9

  1. cs.CL 2026-05-18 reviewed
    Decoupling tool use from execution boosts LLM math reasoning

    Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

    Li Wang +7

  2. cs.CL 2026-05-18 reviewed
    Wiki beats RAG on cross-paper links but costs more tokens

    Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

    Theodore O. Cochran

  3. cs.CR 2026-05-18 reviewed
    Generator turns text prompts into LLM fingerprints in one pass

    Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

    Sixu Chen +7

  4. cs.CL 2026-05-18 reviewed
    BERT and T5 differ in NER performance by tag scheme

    From BERT to T5: A Study of Named Entity Recognition

    Mei Jia

  5. cs.CV 2026-05-18 reviewed
    Accuracy unchanged when latent visual tokens replaced by dummies

    What's Holding Back Latent Visual Reasoning?

    Andr\'e G. Viveiros +3

  6. cs.CL 2026-05-18 reviewed
    No memory method works consistently for LLM agents

    EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

    Yuyao Wang +9

  7. cs.CL 2026-05-18 reviewed
    Governed skill libraries boost frozen agents on benchmarks

    SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

    Hongyi Liu +5

  8. cs.CL 2026-05-18 reviewed
    LLMs match human conditional ratings without pragmatic reasoning

    Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs

    Tara Azin +3

  9. cs.CL 2026-05-18 reviewed
    Index lets researchers search 1.35 billion news articles in under a second

    Infini-News: Efficiently Queryable Access to 1.3 Billion Processed Common Crawl News Articles

    Ruggero Marino Lazzaroni +2

  10. cs.AI 2026-05-18 reviewed
    Self-distillation supplies step-level search signals from own rollouts

    SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

    Yufei Ma +8

  11. cs.CL 2026-05-18 reviewed
    Preference focus cuts device RAG memory 2400 times

    From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

    Changmin Lee +2

  12. cs.CL 2026-05-18 reviewed
    K2V extends RLVR to knowledge domains via process verification

    Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

    Zhonghang Yuan +9

  13. cs.CV 2026-05-18 reviewed
    Shared codebook bridges modalities without full data pairs

    CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

    Zeyu Chen +2

  14. cs.CL 2026-05-18 reviewed
    MDU unlearns data in masked diffusion models by KL reversal

    Machine Unlearning for Masked Diffusion Language Models

    Georu Lee +4

  15. cs.CL 2026-05-18 reviewed
    Multi-turn chats in low-resource languages jailbreak LLMs

    Multilingual jailbreaking of LLMs using low-resource languages

    Dylan Marx +1

  16. cs.CL 2026-05-18 reviewed
    SomaliWeb v1 delivers 303M tokens of cleaned Somali text

    SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

    Khalid Yusuf Dahir

  17. cs.CL 2026-05-18 reviewed
    Memory of precomputed states cuts LLM prefix attention costs

    Context Memorization for Efficient Long Context Generation

    Yasuyuki Okoshi +5

  18. cs.SD 2026-05-18 reviewed
    Speech audio accelerates MRI reconstruction of vocal tracts

    SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

    Md Hasan +8

  19. cs.CL 2026-05-18 reviewed
    GA-S2S adds k-hop graph structure to raise link prediction 19%

    Leveraging Graph Structure in Seq2Seq Models for Knowledge Graph Link Prediction

    Luu Huu Phuc +5

  20. cs.AI 2026-05-18 reviewed
    Varying environment rules builds agents that generalize

    Scalable Environments Drive Generalizable Agents

    Jiayi Zhang +9

  21. cs.AI 2026-05-18 reviewed
    One universal fix reduces hallucinations in 15 models

    TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction

    Tej Sanibh Ranade

  22. cs.CL 2026-05-18 reviewed
    Hybrid system generates natural sentences from nested logic

    FOL2NS: Generating Natural Sentences from First-Order Logic

    Mei Jia

  23. cs.CL 2026-05-18 reviewed
    Explanation guidelines lift LLM prompt accuracy by 35 percent

    iPOE: Interpretable Prompt Optimization via Explanations

    Jiahui Li +3

  24. cs.CL 2026-05-18 reviewed
    Bangla medical questions trip up top AI models

    How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking

    Rafid Ahmed +5

  25. cs.CL 2026-05-18 reviewed
    German news overreports European landslides vs risk data

    How Loud Rumbles Hit Newsstands: A Data Analysis of Coverage and Spatial Bias in German News about Landslides Around the World

    Brielen Madureira +3

  26. cs.CL 2026-05-18 reviewed
    Grafting MoE-expanded deltas adds languages to LLMs efficiently

    A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

    Hao Zhou +8

  27. cs.LG 2026-05-18 reviewed
    Low-precision softmax transformers simulate Turing machines via CoT

    The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

    Moritz Br\"osamle +1

  28. cs.CL 2026-05-18 reviewed
    KVDrive lifts long-context LLM speed 1.74x with SSD tier

    KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

    Jian Lin +7

  29. cs.CL 2026-05-18 reviewed
    P2P edge agents boost LLM task accuracy 8% and reduce latency 16%

    PPAI: Enabling Personalized LLM Agent Interoperability for Collaborative Edge Intelligence

    Zile Wang +6

  30. cs.LG 2026-05-18 reviewed
    Boundary protection recovers 69-90% quality at 13% KV retention

    Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

    Gabriel Garcia

  31. cs.CL 2026-05-18 reviewed
    Tool localizes node errors in multi-agent LLM workflows

    PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

    Kazuki Kawamura +2

  32. cs.CL 2026-05-18 reviewed
    Reranking by label semantics lifts hard-case F1 by over 9 points

    Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

    Anas Belfathi +4

  33. cs.CL 2026-05-18 reviewed
    Neural tweaks make read speech sound like real conversation

    Bridging the Gap: Converting Read Text to Conversational Dialogue

    Parshav Singla +7

  34. cs.CL 2026-05-18 reviewed
    Predictive prefetching cuts RAG latency up to 43.5%

    Predictive Prefetching for Retrieval-Augmented Generation

    Wuyang Zhang +1

  35. cs.CL 2026-05-18 reviewed
    LLM generates explicit vectorized code beating compiler -O3

    AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

    Shangzhan Li +10

  36. cs.CL 2026-05-18 reviewed
    BacktestBench tests LLMs on 18k backtesting QA pairs from real markets

    BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

    Zhensheng Wang +5

  37. cs.CL 2026-05-18 reviewed
    Natural triggers drop sentiment accuracy to 0.04

    Universal Adversarial Triggers

    Benedict Florance Arockiaraj +3

  38. cs.CL 2026-05-18 reviewed
    Prompt compression fails to transfer to diffusion LLMs

    Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

    Sterling Huang +6

  39. cs.LG 2026-05-18 reviewed
    Transient expert steers MoE updates to cut forgetting

    CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

    Yang Liu +2

  40. cs.CL 2026-05-18 reviewed
    Benchmark turns NASA mission text into logic formulas

    A Pilot Benchmark for NL-to-FOL Translation in Planetary Exploration

    Hayden Moore +2

  41. cs.AI 2026-05-18 reviewed
    AI chunking builds maps predicting war in Thucydides model

    Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap

    Akash Kumar Panda +2

  42. cs.CL 2026-05-18 reviewed
    AI agent teams beat human teams at generating creative ideas

    Multi-agent AI systems outperform human teams in creativity

    Tiancheng Hu +7

  43. cs.LG 2026-05-18 reviewed
    Hindsight targets fix actions to cut agent training time 2.26x

    HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

    Woongyeng Yeo +3

  44. cs.CL 2026-05-18 reviewed
    New multi-accent dataset lowers ASR errors on technical talks

    PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions

    Sicheng Jin +2

  45. cs.CL 2026-05-18 reviewed
    SynPro yields 3.7-5.2x more effective tokens from organic data

    Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

    Zichun Yu +1

  46. cs.LO 2026-05-18 reviewed
    Retrieval system compresses Lean proofs over 70 percent

    Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

    Jialin Lu +6

  47. cs.AI 2026-05-18 reviewed
    Memory-equipped agents show rising safety risks over time

    Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

    Ahmad Al-Tawaha +4

  48. cs.CL 2026-05-18 reviewed
    Memory systems score 0.12-0.18 on social group benchmark

    SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?

    Olukunle Owolabi

  49. cs.CL 2026-05-18 reviewed
    LLM-rephrased notes keep broad utility but lose ICD details

    Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

    Jinghui Liu +2

  50. cs.CL 2026-05-18 reviewed
    Fine-tuned small models plan with tools without any catalog in the prompt

    Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

    Yuval Shemla +4