pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 4

  1. cs.CL 2026-05-21 reviewed
    Fixing the main failure point can hurt LLM agents

    Diagnosis Is Not Prescription: Linguistic Co-Adaptation Explains Patching Hazards in LLM Pipelines

    Yoon Jeonghun +1

  2. cs.CL 2026-05-21 reviewed
    Medical RAG certifies claims with zero unsupported risk

    Claim-Selective Certification for High-Risk Medical Retrieval-Augmented Generation

    Shao Kan

  3. cs.AI 2026-05-21 reviewed
    LLMs now build planners instead of one-off plans

    Planning in the LLM Era: Building for Reliability and Efficiency

    Michael Katz +3

  4. cs.AI 2026-05-21 reviewed
    7B model beats larger ones at Lean proof optimization

    ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

    Riyaz Ahuja +3

  5. cs.CL 2026-05-21 reviewed
    LLM attention weights tokens to improve DPO

    Token-weighted Direct Preference Optimization with Attention

    Chengyu Huang +3

  6. cs.CL 2026-05-21 reviewed
    Hyper-Align turns hypergraphs into LLM tokens

    Hypergraph as Language

    Mengqi Lei +6

  7. cs.CL 2026-05-21 reviewed
    Agent trajectories compiled into QA pairs improve long-context performance

    ACC: Compiling Agent Trajectories for Long-Context Training

    Qisheng Su +10

  8. cs.LG 2026-05-21 reviewed
    Dictionary realignment keeps OOD explanations faithful

    Geometry-Adaptive Explainer for Faithful Dictionary-Based Interpretability under Distribution Shift

    Sungjun Lim +3

  9. cs.CL 2026-05-21 reviewed
    LLMs beat fine-tuned models on rare suicide circumstances

    Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

    Geoffrey Martin +2

  10. cs.LG 2026-05-21 reviewed
    Energy gating lifts transformer loss by 0.1 with tiny overhead

    Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention

    Athanasios Zeris

  11. cs.CL 2026-05-20 reviewed
    LLMs reduce ten intensity words to five numeric values

    Does Slightly Mean Somewhat? Measuring Vague Intensity Words in LLM Numeric Actions

    Daniel Tabach (Georgia Institute of Technology)

  12. cs.CL 2026-05-20 reviewed
    Retrieval lifts LLM accuracy on rare medical cases from 56% to 82%

    When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

    Doeun Lee +13

  13. cs.LG 2026-05-20 reviewed
    Geometry-aware calibration closes entropy gaps for LLM optimization

    Why Semantic Entropy Fails: Geometry-Aware and Calibrated Uncertainty for Policy Optimization

    Zheyuan Zhang +5

  14. cs.CV 2026-05-20 reviewed
    Context rewrite lifts 3D grounding accuracy by up to 22 points

    MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue

    Anna Deichler +6

    3 Piths
  15. cs.CL 2026-05-20 reviewed
    DivSkill-SQL lifts Text-to-SQL accuracy by up to 11 points

    Residual Skill Optimization for Text-to-SQL Ensembles

    Jiongli Zhu +10

  16. cs.CL 2026-05-20 reviewed
    LLM optimizer diagnoses full-set errors to tune prompts

    Reflective Prompt Tuning through Language Model Function-Calling

    Farima Fatahi Bayat +3

  17. cs.CL 2026-05-20 reviewed
    Contrastive prompts with 'other' turn LLMs into probability estimators

    PromptNCE: Pointwise Mutual Information Predictions Using Only LLMs and Contrastive Estimation Prompts

    Juliette Woodrow +1

  18. cs.CL 2026-05-20 reviewed
    Single-flaw pairs create clear tests for multi-turn LLM judges

    RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator

    Zhenwei Tang +5

  19. cs.CV 2026-05-20 reviewed
    Lightweight cross-encoder matches LLM judges for caption evaluation

    BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model

    Gon\c{c}alo Gomes +2

  20. cs.CL 2026-05-20 reviewed
    Bayes rule gives LLMs token-by-token attribution scores

    Probabilistic Attribution For Large Language Models

    Shilpika Shilpika +4

  21. cs.CL 2026-05-20 reviewed
    Semantic comparison catches AI peer reviews at low false positives

    Sem-Detect: Semantic Level Detection of AI Generated Peer-Reviews

    Andr\'e V. Duarte +5

  22. cs.CL 2026-05-20 reviewed
    Natural language queries reach safety data with schema validation

    Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries

    Mahdi Azhdari +1

  23. cs.LG 2026-05-20 reviewed
    Projection matrix aligns tokenizers for better distillation

    X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation

    Sharath Turuvekere Sreenivas +6

  24. cs.CL 2026-05-20 reviewed
    Open-source LLMs lean left on politics

    How Far Will They Go? Red-Teaming Online Influence with Large Language Models

    Daniel C. Ruiz +4

  25. cs.LG 2026-05-20 reviewed
    Actor updates match value gradients under differentiable rollouts

    Value-Gradient Hypothesis of RL for LLMs

    Arip Asadulaev +3

  26. cs.LG 2026-05-20 reviewed
    Fine-tuned detectors amplify a pretrained typicality axis

    Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction

    Alexander Smirnov

  27. cs.LG 2026-05-20 reviewed
    Entmax turns KV cache truncation into exact support recovery

    EntmaxKV: Support-Aware Decoding for Entmax Attention

    Gon\c{c}alo Duarte +2

    4 Piths
  28. cs.CV 2026-05-20 reviewed
    New benchmark shows LVLMs falter on furniture assembly videos

    Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

    Aditya Chetan +7

  29. cs.CL 2026-05-20 reviewed
    Rewriting cuts unsafe LLM outputs for teen users

    CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

    Heajun An +3

  30. cs.AI 2026-05-20 reviewed
    Platform lets humans and AIs co-author and iterate on papers

    AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists

    Junshu Pan +7

  31. cs.LG 2026-05-20 reviewed
    Rank-1 line from first 50 steps matches full RLVR at 15% cost

    You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

    Zhepei Wei +5

  32. cs.LG 2026-05-20 reviewed
    DelTA raises math scores by over 3 points on 8B models

    DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

    Kaiyi Zhang +2

  33. cs.CL 2026-05-20 reviewed
    LLMs reach 100% consistency adapting grammars to metamodel changes

    Leveraging LLMs for Grammar Adaptation: A Study on Metamodel-Grammar Co-Evolution

    Weixing Zhang +4

  34. cs.CL 2026-05-20 reviewed
    Separate model learns when to generate agent guidance

    Mem-$\pi$: Adaptive Memory through Learning When and What to Generate

    Xiaoqiang Wang +7

  35. cs.CL 2026-05-20 reviewed
    LLM measures track syncretism effects on agreement attraction

    Quantifying the cross-linguistic effects of syncretism on agreement attraction

    Utku Turk +1

  36. cs.CL 2026-05-20 reviewed
    Metaphors widen spectral breadth in transformer layers

    Post-Hoc Understanding of Metaphor Processing in Decoder-Only Language Models via Conditional Scale Entropy

    Lawhori Chakrabarti +5

  37. cs.SE 2026-05-20 reviewed
    Agents pass visible tests but fail held-out usage tests as tasks lengthen

    SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

    Bingchen Zhao +3

  38. cs.CL 2026-05-20 reviewed
    Traditional systems still lead in multilingual coreference task

    Findings of the Fifth Shared Task on Multilingual Coreference Resolution: Expanding Datasets for Long-Range Entities

    Michal Nov\'ak +8

  39. cs.CL 2026-05-20 reviewed
    AI shapes 11-26% of goals in human collaborations

    "I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

    Eunsu Kim +3

  40. cs.CL 2026-05-20 reviewed
    Hybrid jailbreak method reaches 84% success with 30 queries

    LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models

    Abdullah Al Nomaan Nafi +3

  41. cs.CL 2026-05-20 reviewed
    LLMs degrade on numerical tasks beyond 500 social media posts

    Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media

    Yuefeng Shi +2

  42. cs.AI 2026-05-20 reviewed
    43M-paper graph gives AI agents deterministic cross-field links

    SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

    Shuofei Qiao +10

  43. cs.CL 2026-05-20 reviewed
    Spike-gated model reaches 89% sparsity at 8.9 perplexity

    SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

    Ting Liu

  44. cs.CL 2026-05-20 reviewed
    Regularization curbs prompt overfitting for better LLM generalization

    TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

    Lucheng Fu +6

  45. cs.CL 2026-05-20 reviewed
    LLMs follow logical rules for conditionals but miss human implications

    Tracing the ongoing emergence of human-like reasoning in Large Language Models

    Paolo Morosi +4

  46. cs.CL 2026-05-20 reviewed
    Dual safeguards create reliable HIV triage domain in Spanish notes

    Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

    Rodrigo Morales-S\'anchez +2

  47. cs.CL 2026-05-20 reviewed
    Pairwise rewards stabilize RL for reasoning models

    LamPO: A Lambda Style Policy Optimization for Reasoning Language Models

    Redacted by arXiv +6

  48. cs.LG 2026-05-20 reviewed
    10% heads on 10% data deliver 8.3 pp gain with 7x speedup in LLM alignment

    From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

    Hao Chen +9

  49. cs.CL 2026-05-20 reviewed
    Knowledge graphs lift LLM borrowing detection in Luxembourgish to 81%

    Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models

    Nina Hosseini-Kivanani

  50. cs.CL 2026-05-20 reviewed
    Manga109 revised to correct 29,000 dialogue annotations

    Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding

    Jeonghun Baek +4