pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 17

  1. cs.CL 2026-05-13 reviewed
    Language-specific thresholds lift slur detection F1 by 2-5%

    KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model

    Barathi Ganesh HB +3

  2. cs.CL 2026-05-13 reviewed
    LLMs annotate asylum credibility with inconsistent errors

    LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

    Galadrielle Humblot-Renaux +9

  3. cs.CR 2026-05-13 reviewed
    External skill library keeps LLM attacks evolving after saturation

    Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

    Xiaozhe Zhang +6

  4. cs.CL 2026-05-13 reviewed
    Puzzles reveal all-or-nothing success for humans and LLMs

    From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks

    Neh Majmudar +3

  5. cs.LO 2026-05-13 reviewed
    Certificates verify LLM pipelines by auditing only deterministic parts

    Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

    George Koomullil

  6. cs.CL 2026-05-13 reviewed
    Fine-tuned BART and T5 parsers beat prior seq2seq models on constituent parsing

    Exploiting Pre-trained Encoder-Decoder Transformers for Sequence-to-Sequence Constituent Parsing

    Daniel Fern\'andez-Gonz\'alez +1

  7. cs.LG 2026-05-13 reviewed
    Phase rotations on unit circle stabilize explicit memory

    Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory

    Sungwoo Goo +2

  8. cs.CL 2026-05-13 reviewed
    LLMs self-train on examples generated from the query alone

    Query-Conditioned Test-Time Self-Training for Large Language Models

    Chaehee Song +4

  9. cs.CL 2026-05-13 reviewed
    Query self-training adapts LLMs using only input-derived pairs

    Query-Conditioned Test-Time Self-Training for Large Language Models

    Chaehee Song +4

  10. cs.CL 2026-05-13 reviewed
    Document MT then segment refinement beats full-document fixes

    What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

    Shaomu Tan +7

  11. cs.CL 2026-05-13 reviewed
    Shared preference vector controls LLM choices across personas

    Probing Persona-Dependent Preferences in Language Models

    Oscar Gilg +3

  12. cs.CL 2026-05-13 reviewed
    One vector steers LLM preferences across opposing personas

    Probing Persona-Dependent Preferences in Language Models

    Oscar Gilg +3

  13. cs.CL 2026-05-13 reviewed
    One LLM persuades another to ignore its own safety rules

    LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs

    Rodrigo Nogueira +9

  14. cs.CL 2026-05-13 reviewed
    18,900 questions test financial reasoning in six Indic languages

    FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages

    Sarmistha Das +4

  15. cs.CL 2026-05-13 reviewed
    Persona vectors form in first 0.22% of LLM pretraining

    Tracing Persona Vectors Through LLM Pretraining

    Viktor Moskvoretskii +4

  16. cs.RO 2026-05-13 reviewed
    Stereo vision and location priors boost real-world robot navigation

    What Limits Vision-and-Language Navigation ?

    Yunheng Wang +11

  17. cs.CL 2026-05-13 reviewed
    Pooled preferences nearly match individual fine-tuning for personalization

    PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

    Hannah Rose Kirk +6

  18. cs.AI 2026-05-13 reviewed
    Simple recipe scales reasoning model to olympiad gold

    Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

    Yafu Li +27

  19. cs.CL 2026-05-13 reviewed
    Contrastive rollouts assign credit to individual agents in LLM teams

    CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

    Tom Zehle

  20. cs.CL 2026-05-13 reviewed
    Parallel dataset gives medical dialogues in nine Indic languages

    IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

    Shubham Kumar Nigam +2

  21. cs.CL 2026-05-13 reviewed
    Latent info gain ranks visual evidence for better multimodal RAG

    Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation

    Weiqing Luo +5

  22. cs.CL 2026-05-13 reviewed
    Hybrid conversion lets LLMs query building models in plain English

    A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations

    Rabindra Lamsal +4

  23. cs.CL 2026-05-13 reviewed
    GAGPO computes temporal advantages from grouped rollouts without a critic

    GAGPO: Generalized Advantage Grouped Policy Optimization

    Siyuan Zhu +6

  24. stat.ML 2026-05-13 reviewed
    Entropy rises with missing context in LLMs

    LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

    Stef van Buuren

  25. cs.CL 2026-05-13 reviewed
    Models frequently fail to build valid geometry diagrams from text

    GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language

    Jinwoong Kim +2

  26. cs.CL 2026-05-13 reviewed
    Pruning trims long reasoning by 19-42% with little accuracy loss

    STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

    Chenjun Xu +5

  27. cs.CL 2026-05-13 reviewed
    Acquisition rewards yield 2-7% gains in student model training

    AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions

    Ishika Agarwal +6

  28. cs.SE 2026-05-13 reviewed
    LLMs lag experts on system-level performance code

    PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

    Huihao Jing +7

  29. cs.CL 2026-05-13 reviewed
    Teacher confidence gates improve reasoning in small models

    GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning

    Kasidit Sermsri +1

  30. cs.CL 2026-05-13 reviewed
    Knowledge base lifts Text-to-SQL accuracy when data is scarce

    Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model

    Tianhao Qiu +1

  31. cs.CL 2026-05-13 reviewed
    Small 244M Whisper matches large models on Indic speech

    Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

    Kush Juvekar +4

  32. cs.CL 2026-05-13 reviewed
    This paper applies a generative meta-learning algorithm to spoken word classification…

    Does language matter for spoken word classification? A multilingual generative meta-learning approach

    Batsirayi Mupamhi Ziki +2

  33. cs.CL 2026-05-13 reviewed
    Multilingual edge in word classification is smaller than expected

    Does language matter for spoken word classification? A multilingual generative meta-learning approach

    Batsirayi Mupamhi Ziki +2

  34. cs.CL 2026-05-13 reviewed
    LLM JSON stays valid inside tight token budgets

    TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints

    Yoshio Kato +1

  35. cs.CL 2026-05-13 reviewed
    GeMCL classifies 1000 words from five shots each with stable accuracy

    Scaling few-shot spoken word classification with generative meta-continual learning

    Louise Beyers +2

  36. cs.CL 2026-05-13 reviewed
    GeMCL scales spoken word classification to 1000 classes with five shots each

    Scaling few-shot spoken word classification with generative meta-continual learning

    Louise Beyers +2

  37. cs.SE 2026-05-13 reviewed
    Deeper thought per algorithm beats more candidates under fixed tokens

    Effective Harness Engineering for Algorithm Discovery with Coding Agents

    Yoichi Ishibashi +2

  38. cs.CL 2026-05-13 reviewed
    GenAI flattens L2 writers' voices into uniform English

    The Cost of Perfect English: Pragmatic Flattening and the Erasure of Authorial Voice in L2 Writing Supported by GenAI

    Ao Liu +1

  39. cs.IR 2026-05-13 reviewed
    LLMs predict query-specific validity horizons for web content

    RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

    Tingyu Chen +6

  40. cs.CL 2026-05-13 reviewed
    Pruning candidate contexts with search tools improves LLM performance

    Context Training with Active Information Seeking

    Zeyu Huang +6

  41. cs.CL 2026-05-13 reviewed
    Pruning multiple search contexts lifts LLM adaptation gains

    Context Training with Active Information Seeking

    Zeyu Huang +6

  42. cs.LG 2026-05-13 reviewed
    LLMs miss when medical guidelines expire

    Large Language Models Lack Temporal Awareness of Medical Knowledge

    Zihan Guan +8

    1 Piths
  43. cs.CL 2026-05-13 reviewed
    Jailbreak success in diffusion LMs drops to 0.64% via step-wise remasking

    Adaptive Steering and Remasking for Safe Generation in Diffusion Language Models

    Yejin Lee +1

  44. cs.LG 2026-05-13 reviewed
    Bell-shaped sampling trains masked diffusion models 4x faster

    Understanding and Accelerating the Training of Masked Diffusion Language Models

    Chunsan Hong +7

  45. cs.CL 2026-05-13 reviewed
    Multimodal reasoning lifts MI coding accuracy to 52 percent

    Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

    Guangzeng Han +4

  46. cs.CL 2026-05-13 reviewed
    Multimodal voting lifts MI coding accuracy to 52.56%

    Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

    Guangzeng Han +4

  47. cs.CL 2026-05-13 reviewed
    Speech marks when insights transfer across similar problems

    Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

    Linas Nasvytis +1

  48. cs.CL 2026-05-13 reviewed
    Speech reveals when insights transfer across problems

    Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

    Linas Nasvytis +1

  49. cs.CL 2026-05-13 reviewed
    Repeated insight type speeds solving and boosts problem categorization in speech

    Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving

    Linas Nasvytis +1

  50. cs.LG 2026-05-13 reviewed
    LLM states project to F2 for 93% zero-shot ontology accuracy

    Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2

    Hisashi Miyashita +1