pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 13

  1. cs.CL 2026-05-15 reviewed
    Block attention nears full performance via semantic blocks

    Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

    Shuaiyi Li +7

  2. cs.CL 2026-05-15 reviewed
    Block attention matches full results via segmentation and distillation

    Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

    Shuaiyi Li +7

  3. cs.CL 2026-05-15 reviewed
    Dataset links Russian speeches to images and translations

    Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

    Daria Blinova +6

  4. cs.CV 2026-05-15 reviewed
    VLMs miss image swaps when claiming to recheck visuals

    Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination

    Chufan Shi +6

    1 Piths
  5. cs.CV 2026-05-15 reviewed
    Brain voxels respond to specific image features identified by interpretability tools

    Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

    Idan Daniel Grosbard +2

  6. cs.HC 2026-05-15 reviewed
    Canvas turns linear LLM chats into branching trees

    Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas

    Rifat Mehreen Amin +4

  7. cs.SE 2026-05-15 reviewed
    BootstrapAgent distills repo setup into reusable contracts

    BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

    Sihan Fu +4

  8. cs.CL 2026-05-15 reviewed
    Dataset shows MT systems lose PDF layout during translation

    ForMaT: Dataset for Visually-Grounded Multilingual PDF Translation

    Micha{\l} Ciesi\'o{\l}ka +3

  9. cs.CL 2026-05-15 reviewed
    Small open LLMs match big models in translation quality estimation

    CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs

    Kamil Guttmann +3

  10. cs.CL 2026-05-15 reviewed
    DimMem hits 81% accuracy with 24% lower token cost

    DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory

    Wentao Qiu +4

  11. cs.AI 2026-05-15 reviewed
    Strategy nudging lifts RLVR performance beyond larger rollouts

    Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

    Chanuk Lee +3

  12. cs.CL 2026-05-15 reviewed
    Collaborative filtering assigns optimal contexts per LLM input

    Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

    Jiachen Zhu +11

  13. cs.CL 2026-05-15 reviewed
    Benchmark shows agents fail at composing scattered multimodal evidence

    SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

    Huacan Chai +9

  14. cs.CL 2026-05-15 reviewed
    Hybrid tree-graph evolves agent memory into summaries

    H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure

    Jiawei Yu +3

  15. cs.LG 2026-05-15 reviewed
    Reshaping anchors lets LLMs sample more reasoning modes

    SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs

    Chanuk Lee +2

  16. cs.CL 2026-05-15 reviewed
    Activation steering plus rewards improves unlearning and quality in MLLMs

    ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

    Jiahui Guang +6

  17. cs.CL 2026-05-15 reviewed
    Few-shot LLMs beat BioBERT on patient inquiry triage

    Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries

    Liqi Zhou +1

  18. cs.CL 2026-05-15 reviewed
    Benchmark shows VLMs lag on code-based diagram tasks

    VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

    Xiaoyan Su +10

  19. cs.CL 2026-05-15 reviewed
    Dynamic chunking lifts diffusion LMs over positional blocks

    Dynamic Chunking for Diffusion Language Models

    Yichen Zhu +5

  20. cs.CL 2026-05-15 reviewed
    LLMs miss ambiguity in Chinese sentences

    Evaluating Chinese Ambiguity Understanding in Large Language Models

    Junwen Mo +4

  21. cs.CL 2026-05-15 reviewed
    LLMs heavily favor English with no cost savings from continual pre-training

    Toward LLMs Beyond English-Centric Development

    Sho Takase +1

  22. cs.CL 2026-05-15 reviewed
    Diffusion LLMs reach 5.5x tokens per forward pass

    PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

    Shengyin Sun +9

  23. cs.CL 2026-05-15 reviewed
    LLMs master new code syntax but cannot apply it to solve problems

    Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language

    Vinayshekhar Bannihatti Kumar +3

  24. cs.LG 2026-05-15 reviewed
    Steering vectors accelerate optimization for rare behaviors

    VSPO: Vector-Steered Policy Optimization for Behavioral Control

    Xuechen Zhang +5

  25. cs.CL 2026-05-15 reviewed
    LLMs spot mental health entities but miss relations and reasoning

    MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

    Weixin Liu +6

  26. cs.CL 2026-05-15 reviewed
    Semantic rewards improve LLM uncertainty calibration

    Calibrating LLMs with Semantic-level Reward

    Fengfei Yu +4

  27. cs.CL 2026-05-15 reviewed
    Semantic reward cuts LLM calibration error by up to 40%

    Calibrating LLMs with Semantic-level Reward

    Fengfei Yu +4

  28. cs.CL 2026-05-15 reviewed
    Learned policy decides when to add one sequential step after parallel agents

    Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

    Nurbek Tastan +6

  29. cs.CL 2026-05-15 reviewed
    LLM activation peaks vary by 10,000x across model families

    Measuring Maximum Activations in Open Large Language Models

    Luxuan Chen +11

  30. cs.CL 2026-05-15 reviewed
    Dependency graphs lift Transformer syntactic generalization

    GiLT: Augmenting Transformer Language Models with Dependency Graphs

    Tianyu Huang +3

  31. cs.CL 2026-05-15 reviewed
    Latent geometry fails to ensure good token recovery

    When Latent Geometry Is Not Enough: Draft-Conditioned Latent Refinement for Non-Autoregressive Text Generation

    De Shuai Zhang

  32. cs.LG 2026-05-15 reviewed
    High-divergence prompts improve distillation by up to 15%

    DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

    Jaehun Jung +6

  33. cs.LG 2026-05-15 reviewed
    Divergence-guided prompts deliver 15% gains in VLM distillation

    DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

    Jaehun Jung +6

  34. cs.CL 2026-05-15 reviewed
    Reliability signal cuts token use by a third in reasoning

    Process Rewards with Learned Reliability

    Jinyuan Li +7

  35. cs.CL 2026-05-15 reviewed
    Benchmark tests LLM detectors across 8 languages and real edits

    DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection

    Junchao Wu +10

  36. cs.CL 2026-05-15 reviewed
    DetectRL-X benchmark tests detectors across 8 languages and real AI writing

    DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection

    Junchao Wu +10

  37. cs.CL 2026-05-15 reviewed
    RoPE loses position and token distinction in long contexts

    RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

    Yufeng Du +7

  38. cs.LG 2026-05-15 reviewed
    Draft model prunes 90% of attention in large LLMs

    STS: Efficient Sparse Attention with Speculative Token Sparsity

    Ceyu Xu +3

  39. cs.CL 2026-05-14 reviewed
    New benchmark suite tests LLMs on finance difficulty levels

    FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

    Dmitry Stanishevskii +6

  40. cs.CL 2026-05-14 reviewed
    Benchmark suite tests LLMs across eight financial expertise levels

    FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

    Dmitry Stanishevskii +6

  41. cs.CL 2026-05-14 reviewed
    RAG pipeline reaches 80% F1 on clinical transcript extraction

    Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction

    A H M Rezaul Karim +1

  42. cs.LG 2026-05-14 reviewed
    Open-ended RL boosts LLM reasoning with 46x less data

    GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

    Shangjian Yin +3

  43. cs.CL 2026-05-14 reviewed
    Reasoning models take different paths

    Reasoning Models Don't Just Think Longer, They Move Differently

    Anders Gj{\o}lbye +2

  44. cs.CL 2026-05-14 reviewed
    Fewer parses increase model surprise in garden paths but not enough

    Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis

    William Timkey +2

  45. cs.CL 2026-05-14 reviewed
    Math tasks produce highest attention entropy in LLMs

    Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

    Mahdi Naser-Moghadasi +1

  46. cs.CE 2026-05-14 reviewed
    Reinforcement updates replace feedback loops in LLM alpha discovery

    From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

    Lingzhe Zhang +7

  47. cs.CL 2026-05-14 reviewed
    LLM help adapts to user expertise domains to limit over-reliance

    Capability Conditioned Scaffolding for Professional Human LLM Collaboration

    Sen Yang +1

  48. cs.CL 2026-05-14 reviewed
    Ghana AI legal tool handles 32,000 student queries in 30 months

    Eskwai for Students: Generative AI Assistant for Legal Education in Ghana

    George Boateng +8

  49. cs.CL 2026-05-14 reviewed
    WhatsApp AI bot offers science help to West African students

    Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa

    George Boateng +6

  50. cs.CL 2026-05-14 reviewed
    Humans choose words step by step under tight vocabulary limits

    Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models

    Thomas Hikaru Clark +2