pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 11

  1. cs.CL 2026-05-17 reviewed
    Stigmatizing language skews LLMs toward less aggressive medical advice

    Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making

    Jen-tse Huang +7

  2. cs.AI 2026-05-17 reviewed
    ChemVA lifts LLMs on chemical diagrams by 20 points

    ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

    Mingyang Rao +6

  3. cs.CL 2026-05-17 reviewed
    LLMs annotate Mandarin narratives nearly as well as humans

    LLMs for automatic annotation of Mandarin narrative transcripts

    Qingwen Zhao +5

  4. cs.CL 2026-05-16 reviewed
    AI models barely beat baseline on pluralistic community moderation

    PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

    Zoher Kachwala +5

  5. cs.CL 2026-05-16 reviewed
    Many models show weaker safety in English than low-resource languages

    Why Do Safety Guardrails Degrade Across Languages?

    Max Zhang +3

  6. cs.LG 2026-05-16 reviewed
    On-device specs match cloud accuracy on 4 of 8 benchmarks

    OpenJarvis: Personal AI, On Personal Devices

    Jon Saad-Falcon +12

  7. cs.AI 2026-05-16 reviewed
    Explicit provenance required to compute AI responsibility

    Responsible Agentic AI Requires Explicit Provenance

    Jinwei Hu +5

  8. cs.CL 2026-05-16 reviewed
    Low-cost adapters enable multimodal LLMs for low-resource languages

    Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

    Firoj Alam +2

  9. cs.CV 2026-05-16 reviewed
    Models collapse on multi-sequence brain MRI questions

    UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation

    Shiv Ghosh +5

  10. cs.CV 2026-05-16 reviewed
    VLMs collapse on multi-sequence brain tumor MRI scans

    UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation

    Shiv Ghosh +5

  11. cs.CL 2026-05-16 reviewed
    Small attention-head sets suppress deceptive commitment across environments

    The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

    Scott Merrill +1

  12. cs.CL 2026-05-16 reviewed
    Router matches top LLM quality at half the cost

    HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools

    Aashna Garg +4

  13. cs.CL 2026-05-16 reviewed
    Three agents boost medical QA accuracy by 6.46 points

    SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning

    Yongfeng Huang +2

  14. cs.CV 2026-05-16 reviewed
    Density weighting recovers 8.7 OCR points in hybrid VLM distillation

    HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation

    Yihao Liang +1

  15. cs.CL 2026-05-16 reviewed
    Auto-generated reasoning chains lift ICL accuracy on multi-step tasks

    ACIL: Auto Chain of Thoughts for In-Context Learning

    Rui Chu

  16. cs.LG 2026-05-16 reviewed
    Scale decides if language model geometry stays organized for prediction

    Scale Determines Whether Language Models Organize Representation Geometry for Prediction

    Weilun Xu

  17. cs.CL 2026-05-16 reviewed
    Top LLMs cover only 47.8% of real consumer reactions

    Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

    Tianyu Wang +2

  18. cs.AI 2026-05-16 reviewed
    LLM agent builds traceable knowledge graphs autonomously

    RAGA: Reading-And-Graph-building-Agent for Autonomous Knowledge Graph Construction and Retrieval-Augmented Generation

    Chengrui Han +1

  19. cs.LG 2026-05-16 reviewed
    AI Agents Differ Sharply in Solo ML Model Training on One GPU

    1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

    Robin-Nico Kampa +4

  20. cs.CL 2026-05-16 reviewed
    Agentic cycle makes translation serve communication goals first

    Agentic AI Translate: An Agentic Translator Prototype for Translation as Communication Design

    Masaru Yamada

  21. cs.LG 2026-05-16 reviewed
    Self-evolution trains math-reasoning LLMs with under 2K samples

    D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

    Ru Zhang +6

  22. cs.CL 2026-05-16 reviewed
    Prompt leaks let simple text match fake hallucination detection

    PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

    Khizar Hussain +1

  23. cs.SI 2026-05-16 reviewed
    Algorithmic feeds reshape how users write

    Algorithmic Cultivation: How Social Media Feeds Shape User Language

    Olivia Pal +3

  24. cs.PL 2026-05-16 reviewed
    Every string over its alphabet is a valid program

    The IsalProgram Programming Language

    Ezequiel L\'opez-Rubio

  25. cs.CL 2026-05-16 reviewed
    The paper presents HalluScore

    HalluScore: Large Language Model Hallucination Question Answering Benchmark

    Aisha Alansari +1

  26. cs.CL 2026-05-16 reviewed
    Fine-tuning stabilizes LLM personality scores but accuracy stays near chance

    Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

    Prateek Rajput +4

  27. cs.CL 2026-05-16 reviewed
    Transformers recover item difficulty signal from wording alone

    Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning

    Jan Net\'ik +1

  28. cs.CL 2026-05-16 reviewed
    Test-time skill synthesis raises LLM agent success rates

    Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

    Jingxing Wang +6

  29. cs.CL 2026-05-16 reviewed
    Two-stage adapters put LLM first in coreference task

    Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

    Antoine Bourgois +2

  30. cs.CL 2026-05-16 reviewed
    Two-stage adapters lead LLM multilingual coreference resolution

    Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

    Antoine Bourgois +2

  31. cs.AI 2026-05-16 reviewed
    EEG shows why people miss some AI hallucinations

    How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

    Shuqi Zhu +6

  32. cs.CL 2026-05-16 reviewed
    Diffusion LLMs learn faster decoding by rolling back mistakes

    Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

    Fanqin Zeng +8

  33. cs.CL 2026-05-16 reviewed
    Reasoning effort fails to change LRM alignment with humans

    Effort as Ceiling, Not Dial: Reasoning Budget Does Not Modulate Cognitive Cost Alignment Between Humans and Large Reasoning Models

    Yueqing Hu +1

  34. cs.CL 2026-05-16 reviewed
    Full-attention LLMs sparsify in hundreds of steps

    Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

    Yanke Zhou +8

  35. cs.CL 2026-05-16 reviewed
    Pinyin and glyph features fix homophone errors in Chinese keyword filtering

    JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR

    Shilin Zhou +1

  36. cs.CE 2026-05-16 reviewed
    LLM trading alpha is not deployment evidence

    The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence

    Yuxuan Ye +9

  37. cs.CV 2026-05-16 reviewed
    DriveSafe uses scene captions to improve driving risk detection

    DriveSafe: A Framework for Risk Detection and Safety Suggestions in Driving Scenarios

    Sainithin Artham +3

  38. cs.CL 2026-05-16 reviewed
    Expert targets raise merged-model 4-bit accuracy from 35% to 77%

    E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

    Wenjun Wang +6

  39. cs.CL 2026-05-16 reviewed
    Multiple translations become one benchmark for Pali

    PaliBench: A Multi-Reference Blueprint for Classical Language Translation Benchmarks

    M\'at\'e Metzger +1

  40. cs.CL 2026-05-16 reviewed
    Mixing a model's own predictions lets it add facts without forgetting old skills

    MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

    Jiarui Liu +7

  41. cs.CL 2026-05-16 reviewed
    MixSD retains 100% of base skills while injecting new facts

    MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

    Jiarui Liu +7

  42. cs.CV 2026-05-16 reviewed
    Induced patterns let VLMs plan beyond single-step vision

    Thinking with Patterns: Breaking the Perceptual Bottleneck in Visual Planning via Pattern Induction

    Yichang Jian +4

  43. cs.CL 2026-05-16 reviewed
    First structured dataset released for Indian RTI decisions

    RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

    Joy Bose

  44. cs.CL 2026-05-16 reviewed
    Block-union tables cut chunked prefill attention time by 2.72x

    CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

    Jiwon Song +3

  45. cs.CL 2026-05-16 reviewed
    Diffusion code generation meets constraints through local edits

    Constrained Code Generation with Discrete Diffusion

    Lize Shao +4

  46. cs.LG 2026-05-16 reviewed
    Decoupling KL and prefixes creates four LLM distillation objectives

    Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation

    Anhao Zhao +5

  47. cs.LG 2026-05-16 reviewed
    LLM confidence trajectories separate correct reasoning without content

    Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

    Shuo Liu +2

  48. cs.CL 2026-05-16 reviewed
    AI agents reach 6.89x GPU kernel speedups but drop on unseen shapes

    AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

    Sharareh Younesian +13

  49. cs.LG 2026-05-16 reviewed
    Eight calibration passes set LoRA ranks by layer

    FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation

    Ramakrishnan Sathyavageeswaran

  50. cs.LG 2026-05-16 reviewed
    Execution rewards keep tool accuracy above 90% at depth 6

    TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition

    Anay Kulkarni +5