pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 19

  1. cs.CL 2026-05-12 reviewed
    Re-testing lowers most controlled text generation scores

    A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles

    Michela Lorandi +1

  2. cs.CL 2026-05-12 reviewed
    Small detector beats large models at spotting LLM hallucinations

    Scalable Token-Level Hallucination Detection in Large Language Models

    Rui Min +4

  3. cs.CL 2026-05-12 reviewed
    Pretraining exposure predicts LLM popularity better than Wikipedia

    Pretraining Exposure Explains Popularity Judgments in Large Language Models

    Jamshid Mozafari +2

  4. cs.CL 2026-05-12 reviewed
    High-convergence sentences lift LLM accuracy on inferential questions

    Context Convergence Improves Answering Inferential Questions

    Jamshid Mozafari +2

  5. cs.CL 2026-05-12 reviewed
    Benchmark forces models to combine facts from two articles

    MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering

    Rezarta Islamaj +15

  6. cs.CL 2026-05-12 reviewed
    Summing PEFT module outputs boosts multi-attribute text control

    Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

    Michela Lorandi +1

  7. cs.CL 2026-05-12 reviewed
    Index ranks category pairs by confusion risk in data entry

    A categorical error sensitivity index (ISEC): A preventive ordinal decision-support measure for irrecoverable errors in manual data entry systems

    Ricardo Ra\'ul Palma +2

  8. cs.CL 2026-05-12 reviewed
    Retrieval lifts two-hop medical QA to 89% conceptual accuracy

    Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering

    Rezarta Islamaj +15

  9. cs.CL 2026-05-12 reviewed
    Gender bias and facts share the same neurons in language models

    GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

    Leonor Veloso +1

  10. cs.CL 2026-05-12 reviewed
    Token-level ratio matching generalizes DPO for precise alignment

    TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

    Truong Nguyen +5

  11. cs.CL 2026-05-12 reviewed
    Token-level ratio matching aligns models at each generation step

    TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

    Truong Nguyen +5

  12. cs.CL 2026-05-12 reviewed
    Familiarity dominates English word difficulty across three L1 groups

    What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

    Jonas Mayer Martins +3

  13. cs.CR 2026-05-12 reviewed
    New decoder recovers personal data from finetuned models

    Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

    Sae Furukawa +1

  14. cs.CL 2026-05-12 reviewed
    PRISM cuts context use by 10x while lifting accuracy on long agent tasks

    PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

    Jingyi Peng +3

  15. cs.CL 2026-05-12 reviewed
    PRISM hits higher accuracy with 10x less context in long-horizon agent memory

    PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

    Jingyi Peng +3

  16. cs.CL 2026-05-12 reviewed
    Benchmark finds LLMs miss how scams escalate turn by turn

    PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

    Weixiang Sun +7

  17. cs.CL 2026-05-12 reviewed
    Token marks plus contrastive tuning clean disfluent speech transcripts

    Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

    Deepak Kumar +2

  18. cs.CL 2026-05-12 reviewed
    Combined optimization and distillation boosts long-context LLM reasoning

    Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

    Miguel Moura Ramos +2

  19. cs.CL 2026-05-12 reviewed
    Sparse autoencoders expose features inside Whisper ASR

    Mechanistic Interpretability of ASR models using Sparse Autoencoders

    Dan Pluth +3

  20. cs.LG 2026-05-12 reviewed
    LoRA accuracy depends on which parameters are trained

    Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

    Arijit Sehanobish +1

  21. cs.CL 2026-05-12 reviewed
    LLM decoding routes around memory clashes via attention checks

    Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

    Yigeng Zhou +8

  22. cs.AI 2026-05-12 reviewed
    Discovery Agents Beat Learned Models Under Enterprise Shifts

    Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics

    Jishnu Sethumadhavan Nair +16

  23. cs.CL 2026-05-12 reviewed
    Bayesian priors fix up to 50-point errors in LLM user feedback

    Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

    Andrea Morandi +1

  24. cs.CL 2026-05-12 reviewed
    Reconstructing missing facts boosts misinformation detection

    Latent Causal Void: Explicit Missing-Context Reconstruction for Misinformation Detection

    Hui Li +3

  25. cs.CV 2026-05-12 reviewed
    One autoregressive model makes personalized ad images and text

    Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models

    Yexing Xu +17

  26. cs.CL 2026-05-12 reviewed
    Poetic prompts create separate processing paths that evade LLM safety

    Metaphor Is Not All Attention Needs

    Olga Sorokoletova +8

  27. cs.CL 2026-05-12 reviewed
    Data focus and signer adaptation unlock low-resource sign language AI

    Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward

    Nigar Alishzade +1

  28. cs.RO 2026-05-12 reviewed
    World models merge with action generation for embodied AI

    World Action Models: The Next Frontier in Embodied AI

    Siyin Wang +13

  29. cs.CL 2026-05-12 reviewed
    LLMs show limited evidence of grammar violation detectors

    Do Language Models Encode Knowledge of Linguistic Constraint Violations?

    Hardy +1

  30. cs.CL 2026-05-12 reviewed
    LLMs show limited internal grammar violation detectors

    Do Language Models Encode Knowledge of Linguistic Constraint Violations?

    Hardy +1

  31. cs.CL 2026-05-12 reviewed
    Spoken input aids verb learning over child-directed speech

    Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition

    Francesca Padovani +3

  32. cs.CL 2026-05-12 reviewed
    Skill graphs boost agent RL on complex tasks

    SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

    Xiaoyuan Li +6

  33. cs.CL 2026-05-12 reviewed
    Three-stage retrieval pipeline ranks 8th in SemEval multi-turn task

    Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking

    David-Maximilian Caraman +1

  34. cs.CL 2026-05-12 reviewed
    SAGE proposes a framework that trains smaller models to automatically generate and verify…

    SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

    Xiaoyuan Li +8

  35. cs.CR 2026-05-12 reviewed
    Benchmark finds skills expose agents to unsafe attacks

    SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

    Chang Jin +9

  36. cs.CL 2026-05-12 reviewed
    Human actions guide LLM agents past RL barriers

    Learning Agentic Policy from Action Guidance

    Yuxiang Ji +8

  37. cs.CL 2026-05-12 reviewed
    Selective visuals raise Indic subtitle translation scores

    Towards Visually-Guided Movie Subtitle Translation for Indic Languages

    Tarun Chintada +2

  38. cs.CL 2026-05-12 reviewed
    Rubric test predicts LLM post-training success at 90% accuracy

    On Predicting the Post-training Potential of Pre-trained LLMs

    Xiaoyuan Li +7

  39. cs.CL 2026-05-12 reviewed
    Scenario modeling plus intent bridging lifts target-guided dialogues

    Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging

    Maodong Li +2

  40. cs.CV 2026-05-12 reviewed
    Frozen CLIP features top ResNet for instructional video summaries

    Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models

    Maham Nazir +3

  41. cs.SE 2026-05-12 reviewed
    Print statements teach code models to reason step by step

    StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

    Hao Wang +3

  42. cs.CL 2026-05-12 reviewed
    Neuron activation margins augment preference optimization for math

    YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

    Yifan Le

  43. cs.CL 2026-05-12 reviewed
    Sparse autoencoders become steering and optimization tools for LLMs

    Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

    Boyi Deng +17

  44. cs.CL 2026-05-12 reviewed
    Concordance tool assembles local grammars for better name extraction

    Concordance Comparison as a Means of Assembling Local Grammars

    Juliana Pirovani +2

  45. cs.CV 2026-05-12 reviewed
    Unified visual latents cut reasoning tokens in multimodal models

    UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs

    Houcheng Jiang +6

  46. cs.CL 2026-05-12 reviewed
    Boltzmann ranking on trajectories lifts diffusion language model performance

    Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

    Kecheng Chen +11

  47. cs.CL 2026-05-12 reviewed
    Boltzmann ranking of inference trajectories improves DLM post-training

    Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

    Kecheng Chen +11

  48. cs.LG 2026-05-12 reviewed
    Divergence signals adapt credit assignment for LLM agent RL

    GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

    Sijia Li +9

  49. cs.LG 2026-05-12 reviewed
    Divergence spikes adapt credit assignment for LLM agents

    GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

    Sijia Li +9

  50. cs.CL 2026-05-12 reviewed
    Fine-tuning teaches models to control randomness

    Probabilistic Calibration Is a Trainable Capability in Language Models

    Davide Baldelli +4