pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 1

  1. cs.AI 2026-05-22 reviewed
    Optimizer model improves agent skills only via validation-raising text edits

    SkillOpt: Executive Strategy for Self-Evolving Agent Skills

    Yifan Yang +14

  2. cs.CV 2026-05-22 reviewed
    Dedicated image editor lifts multimodal reasoning by 5 points

    ETCHR: Editing To Clarify and Harness Reasoning

    Beichen Zhang +5

  3. cs.CL 2026-05-22 reviewed
    Word swaps in English data speed multilingual training 2x

    Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

    Anastasiia Sedova +3

  4. cs.LG 2026-05-22 reviewed
    Weak teachers boost larger LLMs via loss mixing

    Strong Teacher Not Needed? On Distillation in LLM Pretraining

    Taiming Lu +1

  5. cs.CV 2026-05-22 reviewed
    LLM splits video queries into tool calls merged by boolean logic

    Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

    Michal Shlapentokh-Rothman +3

  6. cs.CL 2026-05-22 reviewed
    Word co-occurrence creates hierarchical geometry in embeddings

    Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

    Andres Nava +1

  7. cs.CL 2026-05-22 reviewed
    NLG evaluation moves from rare to essential

    NLG Evaluation: Past, Present, Future

    Ehud Reiter

  8. cs.CL 2026-05-22 reviewed
    Sense-enhanced embeddings organize semantic types better in graphs

    A graph-based analysis of semantic types and coercion in contextualized word embeddings

    Long Chen +1

  9. cs.CL 2026-05-22 reviewed
    Metadata checks alone miss evidence dependence in benchmarks

    Metadata Predictability Is Not Evidence Dependence: An Intervention-Based Audit for Weak-Label Benchmarks

    Kan Shao

  10. cs.CL 2026-05-22 reviewed
    Benchmark exposes weaknesses in MLLM chart descriptions

    ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models

    Fen Wang +7

  11. cs.CL 2026-05-22 reviewed
    Recursive memory predicts next queries with 22x fewer tokens

    OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations

    Jiangwang Chen +6

  12. cs.CL 2026-05-22 reviewed
    Popular skills often fail to improve LLM agent performance

    OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

    Jiahao Ying +4

  13. cs.CL 2026-05-22 reviewed
    Register, not size, picks the most human-like LLM

    How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

    Bj\"orn Nieth +15

  14. cs.CL 2026-05-22 reviewed
    GE2 leads retrieval accuracy but trails in latency by 14x

    Benchmarking Google Embeddings 2 against Open-Source Models for Multilingual Dense Retrieval and RAG Systems

    Stefano Cirillo +3

  15. cs.LG 2026-05-22 reviewed
    Latent space lets diffusion language models sample faster with better quality

    DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling

    Jean-Marie Lemercier +5

  16. cs.CL 2026-05-22 reviewed
    Two-phase curriculum reaches 99.02% accuracy on name matching

    Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

    Shivam Chourasia +2

  17. cs.CL 2026-05-22 reviewed
    Date-filtered retrieval fixes LLM errors on changed laws

    Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering

    Max Prior +2

  18. cs.LG 2026-05-22 reviewed
    Self-generated tests and code co-evolve to match RLVR results

    CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

    Zhangyi Hu +8

  19. cs.CL 2026-05-22 reviewed
    Automated rubrics let RL scale to open-ended LLM tasks

    ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

    Xiaoyuan Li +7

  20. cs.CL 2026-05-22 reviewed
    SSDAU cuts ambiguity F1 drop in joint extraction from 32% to 8%

    SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction

    Jiawei He +7

  21. cs.CL 2026-05-22 reviewed
    Solution matching measures model alignment with social norms

    Naturalistic measure of social norms alignment

    Yevhen Kostiuk +4

  22. cs.CL 2026-05-22 reviewed
    Tongue shape in /i/ predicts diphthong formant timing

    Articulatory strategy as a source of variation in acoustic vowel dynamics

    Patrycja Strycharczuk +2

  23. cs.CL 2026-05-22 reviewed
    EquiSumm models gender to create fairer tweet summaries

    EquiSumm : A Gender Bias-Aware Framework for Inclusive Tweet Summarization

    Chaitanya Wanjari +4

  24. cs.CL 2026-05-22 reviewed
    Metacognitive rewards lift LLM reasoning up to 11 percent

    Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals

    Sirui Chen +8

  25. cs.CL 2026-05-22 reviewed
    RL framework decouples user preferences from task rewards

    From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning

    Ranxu zhang +7

  26. cs.CL 2026-05-22 reviewed
    Cultural adaptation required before LLMs handle political discourse across cultures

    Cultural Adaptation in Large Language Models for Political Discourse

    Wajdi Zaghouani

  27. cs.CL 2026-05-22 reviewed
    Sign language ERC models reveal domain gap from generic approaches

    Emotion Recognition in Sign Language Conversation

    Yusong Wang +4

  28. cs.CL 2026-05-22 reviewed
    300K Facebook climate posts released as open dataset

    ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

    Wajdi Zaghouani +4

  29. cs.CL 2026-05-22 reviewed
    Hope speech makes up over 64 percent of Arabic Gaza comments

    AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

    Esra'a Sharqawi +1

  30. cs.CL 2026-05-22 reviewed
    Models converge on representations but diverge on reasoning

    Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

    Muhammad Usama +1

  31. cs.CL 2026-05-22 reviewed
    Next-token prediction works only if text prefixes suffice for latent context

    When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

    Francesco Corielli

  32. cs.LG 2026-05-22 reviewed
  33. cs.LG 2026-05-22 reviewed
    Kernel agents top out at 0.94x production baselines

    FastKernels: Benchmarking GPU Kernel Generation in Production

    Gabriele Oliaro +7

  34. cs.HC 2026-05-22 reviewed
    Multi-agent AI raises gardener confidence and trust scores

    CultivAgents: Cultivating Relationship-Centered Multi-Agent Systems for Personalized Gardening

    Yiyang Wang +5

  35. cs.CL 2026-05-22 reviewed
    Machine texts hide human-like spans that complicate detection

    Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement

    Chenwang Wu +3

  36. cs.CL 2026-05-22 reviewed
    Optimizing prompt embeddings boosts in-context learning

    Self-Improving In-Context Learning

    Baturay Saglam +1

  37. cs.CR 2026-05-22 reviewed
    Key-selected synonyms watermark LLM text at 98% detection

    Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

    Kieu Dang +4

  38. cs.CL 2026-05-22 reviewed
    LLMs drop up to 88 points when tasks move to context middle

    Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

    Chuyifei Zhang +3

  39. cs.RO 2026-05-22 reviewed
    VLM boosts robot map coverage by 24% in tests

    Autonomous Frontier-Based Exploration with VLM Guidance

    Aarush Aitha +1

  40. cs.CL 2026-05-22 reviewed
    Block-diffusion VLA reaches SOTA driving accuracy at 12x AR speed

    Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

    Kewei Zhang +11

  41. cs.CR 2026-05-22 reviewed
    ActInv recovers inputs from LLM split-inference activations

    What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

    Mingyuan Fan +3

  42. cs.CL 2026-05-22 reviewed
    Language flips which jailbreaks work on frontier MLLMs

    Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

    Casey Ford +3

    3 Piths
  43. cs.CL 2026-05-22 reviewed
    LLMs miss psychiatric symptoms when functioning looks intact

    When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

    Jianfeng Zhu +3

  44. cs.CL 2026-05-22 reviewed
    Role prompts split into additive persona and task vectors at one site

    As X, Do Y: How Persona and Task Combine in Instruction-Tuned LLMs

    Eric Xu

  45. cs.CL 2026-05-21 reviewed
    BERT classifier labels 55k Ming-Qing letters from title lists

    A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works

    Queenie Luo

  46. cs.CL 2026-05-21 reviewed
    BERTopic beats STM on coherence for short survey texts

    A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses

    Yan Jiang +2

  47. cs.LG 2026-05-21 reviewed
    Global LP ranks every MoE expert to cut memory at low bits

    GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

    Jianing Deng +6

  48. cs.CL 2026-05-21 reviewed
    Optimization cuts LLM token use 25% at F1 0.78

    The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management

    Binqi Shen +4

  49. cs.CL 2026-05-21 reviewed
    Steering vectors modestly lift cultural reasoning in LLMs

    DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

    Yusser Al Ghussin +5

  50. cs.CL 2026-05-21 reviewed
    Mixed curriculum trains memory agents with highest overall QA F1

    What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA

    Xinjie He +6