pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 2

  1. cs.LG 2026-05-21 reviewed
    Controller routes LLM requests to best mode for 2x speedup

    ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU

    Aman Sunesh +2

  2. cs.LG 2026-05-21 reviewed
    Recognition of evaluations depends on model-benchmark pairs

    Decomposing and Measuring Evaluation Awareness

    Changling Li +5

  3. cs.CL 2026-05-21 reviewed
    Compositionality rises then falls in LLM self-training

    Model Collapse as Cultural Evolution

    Dongxin Guo +2

  4. cs.CL 2026-05-21 reviewed
    RAG method leads in mental health improvement detection

    DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods

    Maryia Zhyrko +3

  5. cs.CL 2026-05-21 reviewed
    Hawkes process lifts late alignment in news text simulations

    HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

    Zewei Deng +2

  6. cs.CL 2026-05-21 reviewed
    LLMs learn what not to say via frequency competition

    Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs

    Dongxin Guo +2

  7. cs.CL 2026-05-21 reviewed
    Multilingual SAEs enable reliable language steering without layer search

    Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

    Yusser Al Ghussin +5

  8. cs.CL 2026-05-21 reviewed
    SAE features from LLMs map onto brain semantic regions

    Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

    Dongxin Guo +2

  9. cs.CL 2026-05-21 reviewed
    Training data language, not English, drives brain-LLM alignment

    Brain-LLM Alignment Tracks Training Data, Not Typology

    Dongxin Guo +2

  10. cs.LG 2026-05-21 reviewed
    RADAR forecasts transfer by comparing representation trajectories

    RADAR: Relative Angular Divergence Across Representations

    Xavier Cadet +2

  11. cs.AI 2026-05-21 reviewed
    Transformers have fixed accuracy limits set by layers and width

    The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

    Dongxin Guo

  12. cs.CL 2026-05-21 reviewed
    Proactive AI questions uncover 82% of autism language traits

    A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

    Chuanbo Hu +6

  13. cs.CL 2026-05-21 reviewed
    FIM pretraining yields linear verbatim memorization growth

    Memorization Dynamics of Fill-in-the-Middle Pretraining

    Tobias von Arx +1

  14. cs.CL 2026-05-21 reviewed
    Pipeline creates first UD treebank for Katharevousa Greek

    A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text

    George Mikros +1

  15. cs.CL 2026-05-21 reviewed
    AI models favor some religions over others in conversion advice

    When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

    Brett Israelsen +5

  16. cs.CL 2026-05-21 reviewed
    LLMs estimate expertise from Slack logs with 21% error

    Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs

    Ko Watanabe +1

  17. cs.CL 2026-05-21 reviewed
    Graph alignment detects LLM hallucinations better than GPT-4o

    Graph Alignment Topology as an Inductive Bias for Grounding Detection

    Paul Landes +3

  18. cs.CL 2026-05-21 reviewed
    LIFT gives diffusion models up to 3x reasoning gains on math tests

    Learnability-Informed Fine-Tuning of Diffusion Language Models

    Shubham Parashar +7

  19. cs.CL 2026-05-21 reviewed
    Error feedback in prompts halves Cypher query execution errors

    RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

    Minseok Jung +2

  20. cs.IR 2026-05-21 reviewed
    LaTeX source yields better RAG chunks than PDF text

    AI-Friendly LaTeX: Using LaTeX Code as a Knowledge Source for Retrieval-Augmented Generation

    Tom Verhoeff

  21. cs.CL 2026-05-21 reviewed
    Linear program yields tokenizers within 1% of optimal

    Tokenisation via Convex Relaxations

    Jan Tempus +4

  22. cs.LG 2026-05-21 reviewed
    Vector rewards produce diverse LLM outputs that raise search scores

    Vector Policy Optimization: Training for Diversity Improves Test-Time Search

    Ryan Bahlous-Boldi +8

  23. cs.AI 2026-05-21 reviewed
    Evidence verifier scores spans by accuracy gain in self-evolving agents

    EVE-Agent: Evidence-Verifiable Self-Evolving Agents

    Yamato Arai +1

  24. cs.CL 2026-05-21 reviewed
    AI chatbots hit 90 percent on fresh news but drop in open answers

    Evaluating Commercial AI Chatbots as News Intermediaries

    Mirac Suzgun +7

  25. cs.CV 2026-05-21 reviewed
    VLMs keep high scores after most image tokens are deleted

    Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?

    Zixuan Lan +3

  26. cs.LG 2026-05-21 reviewed
    Transcoders trace VLM grounding and predict hallucinations at 0.68 AUC

    Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models

    Dimitrios Damianos +4

  27. cs.CL 2026-05-21 reviewed
    Consistency training cuts covert political bias in LLMs

    Reducing Political Manipulation with Consistency Training

    Long Phan +5

  28. cs.CL 2026-05-21 reviewed
    Time-ordered training keeps LLM facts fresher than shuffling

    Understanding Data Temporality Impact on Large Language Models Pre-training

    Hippolyte Pilchen +4

  29. cs.CL 2026-05-21 reviewed
    Temporal biomedical graph rescues up to 65% of LLM errors on disease timelines

    ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

    Md Shamim Ahmed +4

  30. cs.AI 2026-05-21 reviewed
    LLM analysis outperforms acoustics for political pathos

    Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models

    Juergen Dietrich

  31. cs.CV 2026-05-21 reviewed
    Simulated dense placements train IMU model that ignores sensor setup

    AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

    Baiyu Chen +7

  32. cs.AI 2026-05-21 reviewed
    Conversation history pulls LLM judgments toward its tone

    AMEL: Accumulated Message Effects on LLM Judgments

    Sid-ali Temkit

  33. cs.CL 2026-05-21 reviewed
    ToaST cuts tokens over 11% vs BPE at large vocabularies

    Tokenization with Split Trees

    Craig W. Schmidt +6

  34. cs.CL 2026-05-21 reviewed
    Gradient subspace projection boosts LLM self-distillation

    Self-Policy Distillation via Capability-Selective Subspace Projection

    Guangya Hao +4

  35. cs.CL 2026-05-21 reviewed
    Moral cues survive machine translation to Polish

    Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora

    Maciej Skorski

  36. cs.CL 2026-05-21 reviewed
    Images boost LLM poetry detectors past RoBERTa

    Seeing the Poem: Image-Semantic Detection of AI-Generated Modern Chinese Poetry with MLLMs

    Shanshan Wang +8

  37. cs.CL 2026-05-21 reviewed
    AI Action Plan echoes private sector over public life concerns

    Whose Voice Counts? Mapping Stakeholder Perspectives on AI Through Public Submissions to the U.S. Government

    Alina Karakanta +6

  38. cs.CL 2026-05-21 reviewed
    AI office agents fail 44% of gradual attack tests

    Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

    Piercosma Bisconti +13

  39. cs.CL 2026-05-21 reviewed
    Benchmark shows AI agents accept gradual risks in 44 percent of cases

    Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

    Piercosma Bisconti +13

  40. cs.CL 2026-05-21 reviewed
    Moral knowledge beats extra context and model scaling for value detection

    More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

    V\'ictor Yeste +1

  41. cs.CL 2026-05-21 reviewed
    Moral knowledge retrieval beats extra context for political value detection

    More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

    V\'ictor Yeste +1

  42. cs.LG 2026-05-21 reviewed
    CAME-Grad fixes gradient double dilemma in report generation

    The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

    Erjian Zhang +3

  43. cs.LG 2026-05-21 reviewed
    CAME-Grad optimizer lifts radiology reports by 2 percent

    The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

    Erjian Zhang +3

  44. cs.LG 2026-05-21 reviewed
    Dual rewards stabilize unsupervised LLM reasoning

    Two is better than one: A Collapse-free Multi-Reward RLIF Training Framework

    Shourov Joarder +4

  45. cs.CL 2026-05-21 reviewed
    Sensorimotor ratings speed Chinese word recognition

    Chinese sensorimotor and embodiment norms for 3,000 lexicalized concepts

    Jing Chen +4

  46. cs.CL 2026-05-21 reviewed
    Agentic CLEAR automates multi-level LLM agent evaluation

    Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

    Asaf Yehudai +2

  47. cs.LG 2026-05-21 reviewed
    Noise prediction loss matches score matching up to constant

    A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models

    Jiayi Fu +1

  48. cs.CL 2026-05-21 reviewed
    Hyperfitting expands final LLM layer to promote rare tokens

    Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

    Meimingwei Li +3

  49. cs.CL 2026-05-21 reviewed
    Decaying hints lift non-English reasoning without drift

    LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

    Yuchun Fan +11

  50. cs.CL 2026-05-21 reviewed
    Multiple metrics required to judge synthetic data for tool-calling agents

    SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

    Shuaiqi Wang +3