pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 6

  1. cs.LG 2026-05-20 reviewed
    Early entropy drop signals when CoT reasoning helps LLMs

    When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

    Wei Xia +3

  2. cs.LG 2026-05-20 reviewed
    Self-distillation balances consensus across views to cut noise from privileged signals

    AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

    Duy Nguyen +9

  3. cs.CL 2026-05-20 reviewed
    Divide-prompt-refine produces more novel biomedical abstracts without training

    Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

    Sylvey Lin +5

  4. cs.CL 2026-05-20 reviewed
    Pipeline triples accuracy for Indigenous image captions

    Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

    Aashish Dhawan +4

  5. cs.CL 2026-05-20 reviewed
    Offline consolidator cuts agent memory 12x while raising success

    Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

    Chongrui Ye +7

  6. cs.CL 2026-05-20 reviewed
    1B model scores 60.7% on MMLU after 40B instruction tokens

    HRM-Text: Efficient Pretraining Beyond Scaling

    Guan Wang +8

  7. cs.CL 2026-05-20 reviewed
    Self-training amplifies surface markers while deep syntax dies

    Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

    Ming Liu

  8. cs.CL 2026-05-20 reviewed
    25-30% of web medical AIs give inaccurate clinical advice

    Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

    Sunday Oyinlola Ogundoyin +2

  9. cs.CL 2026-05-20 reviewed
    Direct sign-to-sign model beats text cascade on accuracy and speed

    Direct Translation between Sign Languages

    Zetian Wu +5

  10. cs.LG 2026-05-20 reviewed
    Small models copy last CoT number for 89-92% of arithmetic accuracy

    The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

    Ming Liu

  11. cs.MA 2026-05-19 reviewed
    State management beats workspace isolation in multi-agent tasks

    Multi-agent Collaboration with State Management

    Mengyang Liu +4

  12. cs.CL 2026-05-19 reviewed
    Gemination subclass drives errors in Japanese neural morphology

    When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

    Wen Zhang

  13. cs.CL 2026-05-19 reviewed
    One rare verb subtype drives most neural morphology errors

    When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

    Wen Zhang

  14. cs.CL 2026-05-19 reviewed
    Nine biomedical corpora differ in ways size and type stats miss

    What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework

    Robert Leaman +2

  15. cs.AI 2026-05-19 reviewed
    LLM agent accuracy drops to 0.54-0.62 without labels

    AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

    Parsa Mazaheri +1

  16. cs.CL 2026-05-19 reviewed
    Co-occurrence patterns support subject-verb agreement learning

    Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

    Claire Hobbs +1

  17. cs.CV 2026-05-19 reviewed
    AI models lag behind text-only on 3D brain MRI benchmark

    NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

    Mohammad H. Abbasi +14

    5 Piths
  18. cs.LG 2026-05-19 reviewed
    Verbal feedback in RL makes LLM simulations more human-like

    Reinforcing Human Behavior Simulation via Verbal Feedback

    Weiwei Sun +15

  19. cs.CL 2026-05-19 reviewed
    Audit split lifts source precision in LLM wiki tables from 36 to 51 percent

    Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

    Chen Shen

  20. cs.LG 2026-05-19 reviewed
    Trained reflectors improve language agents on new tasks

    Training Language Agents to Learn from Experience

    Yuval Shalev +2

  21. cs.SI 2026-05-19 reviewed
    Reddit dataset tracks 12 MAHA health themes over six years

    Hiding in Plain Sight: Finding MAHA on Reddit

    Sabit Ahmed +2

  22. cs.CL 2026-05-19 reviewed
    CoT prompting leaves gender bias inside LLMs

    Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs

    Edie Pearman +5

  23. cs.CL 2026-05-19 reviewed
    Jigsaw puzzle explains ChatGPT through comic panels

    Puzzled By ChatGPT? No more! A Jigsaw Puzzle to Promote AI Literacy and Awareness

    Francesca Padovani +1

  24. cs.CL 2026-05-19 reviewed
    LLMs switch from instructions to patterns when history conflicts

    Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

    Carolina Camassa +1

  25. cs.CL 2026-05-19 reviewed
    DEL raises LLM number prediction accuracy on math benchmarks

    DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

    Zhaohui Zheng +5

  26. cs.CL 2026-05-19 reviewed
    Non-reasoning fine-tuning beats reasoning for TTCW literary reviews

    When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

    Jinlong Liu +2

  27. cs.CL 2026-05-19 reviewed
    AI dialogue models sync states and predict turns ahead

    Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

    Pablo Riera +4

  28. cs.CL 2026-05-19 reviewed
    TIDE boosts MoE diffusion LLM inference up to 1.5x

    TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

    Zhiben Chen +4

  29. cs.CL 2026-05-19 reviewed
    Staged perception training boosts VLM accuracy with shorter reasoning

    From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

    Juncheng Wu +8

  30. cs.CL 2026-05-19 reviewed
    ClinSeekAgent boosts clinical AI by actively seeking raw evidence

    ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

    Juncheng Wu +7

  31. cs.CL 2026-05-19 reviewed
    Compact tokens from knowledge graphs ground LLMs with 10x fewer tokens

    KoRe: Compact Knowledge Representations for Large Language Models

    Davide Cavicchini +2

  32. cs.CL 2026-05-19 reviewed
    Selective FP4 on prefilling yields 3x speedup for agentic LLMs

    Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

    Haiquan Lu +4

  33. cs.CV 2026-05-19 reviewed
    Counterfactual tests expose failures in LVLM attribution for chest X-rays

    Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

    Guangzhi Xiong +4

  34. cs.CL 2026-05-19 reviewed
    Checklist prompts score 7.5 out of 8 on LLM quality rubric

    Less Back-and-Forth: A Comparative Study of Structured Prompting

    Saurav Ghosh +2

  35. cs.CL 2026-05-19 reviewed
    LLMs miss implicit cues despite explicit instructions

    MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

    Yuanqing Cai +5

  36. cs.CL 2026-05-19 reviewed
    Dataset pairs LLM chats with users' reported thoughts

    ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

    Chuanyang Jin +8

    5 Piths
  37. cs.CL 2026-05-19 reviewed
    Thoughts collected with LLM chats improve behavior forecasts

    ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

    Chuanyang Jin +8

    5 Piths
  38. cs.CL 2026-05-19 reviewed
    Joint lattice testing calibrates cascaded RAG thresholds at target risk

    BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

    Zijun Jia +8

  39. cs.CL 2026-05-19 reviewed
    Draft answer first then reflect to gain 23% accuracy with 57% fewer tokens

    CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

    Dachuan Shi +6

  40. cs.CL 2026-05-19 reviewed
    The paper applies Group-Relative Policy Optimization reinforcement learning to a 1.7B…

    Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

    Jann Pfeifer +2

  41. cs.CL 2026-05-19 reviewed
    Belief consistency raises LLM agent success by 20 points

    Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

    Wenjie Tang +4

  42. cs.CL 2026-05-19 reviewed
    Prompt tuning labels radiology reports with 32 examples

    PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling

    Ying-Jia Lin +5

  43. cs.CL 2026-05-19 reviewed
    Prompt tuning with UMLS synonyms labels reports from 32 examples

    PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling

    Ying-Jia Lin +5

  44. cs.CL 2026-05-19 reviewed
    Language mutations extend conspiracy theory lifespans on X

    Language Mutations Sustain the Persistences of Conspiracy Theories on Social Media

    Calvin Yixiang Cheng +2

  45. cs.CL 2026-05-19 reviewed
    Gemination errors dominate Japanese verb model failures

    Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation

    Wen Zhang

  46. cs.CL 2026-05-19 reviewed
    Gemination drives 75-80% of errors in Japanese past-tense models

    Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation

    Wen Zhang

  47. cs.CL 2026-05-19 reviewed
    Speculative decoding now works across all batch sizes without quality loss

    FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration

    Yaojie Zhang +7

  48. cs.CL 2026-05-19 reviewed
    Three memory layers improve long-term LLM agent recall

    Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory

    Jingwei Sun +4

  49. cs.DC 2026-05-19 reviewed
    GPU-aware expert mapping cuts MoE latency by 7.9 percent on average

    GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

    Sourish Wawdhane +2

  50. cs.LG 2026-05-19 reviewed
    Position-dependent attention fixes constant risk on shifted reasoning

    A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

    Yuyang Zhang +3