pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 16

  1. cs.AI 2026-05-13 reviewed
    GraphRAG retrieval aligns LLM agents with social values

    From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

    Jinxian Qu +3

  2. cs.LG 2026-05-13 reviewed
    Spherical KV stores keys as radius and angle codes to cut cache traffic

    SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference

    Anay Chauhan +6

  3. cs.CL 2026-05-13 reviewed
    Attack collapses speculative decoding speedup by cutting token acceptance

    Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

    Shuoyang Sun +8

  4. cs.CL 2026-05-13 reviewed
    Stealth attack collapses speculative decoding speedup

    Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding

    Shuoyang Sun +8

  5. cs.LG 2026-05-13 reviewed
    HodgeCover compresses MoE experts by covering harmonic cycles

    HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

    Tao Zhong +2

    2 Piths
  6. cs.CL 2026-05-13 reviewed
    42M Spanish cyber model reaches 0.78 conversation score

    VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

    Juan S. Santillana

  7. cs.CL 2026-05-13 reviewed
    Rebalanced training gives 42M Spanish cyber model tool-use ability

    VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

    Juan S. Santillana

  8. cs.CL 2026-05-13 reviewed
    Rebalanced tool-use data lifts 42M Spanish cyber model to 0.23 accuracy

    VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

    Juan S. Santillana

  9. cs.CL 2026-05-13 reviewed
    Six hours of data let a two-stage model beat larger ones on Wardaman

    WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

    Ziheng Zhang +3

  10. cs.SD 2026-05-13 reviewed
    No voice agent tops 0.5 on both accuracy and experience

    EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

    Tara Bogavelli +12

    2 Piths
  11. cs.CL 2026-05-13 reviewed
    Agent weight updates cut token use 83% while raising accuracy

    Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

    Wenrui Bao +5

  12. cs.CL 2026-05-13 reviewed
    Finetuning makes models believe claims labeled false

    Negation Neglect: When models fail to learn negations in training

    Harry Mayne +5

    2 Piths
  13. cs.CL 2026-05-13 reviewed
    LLM pipeline turns text into argument graphs

    An LLM-Based System for Argument Mining

    Paulo Pirozelli +3

  14. cs.CL 2026-05-13 reviewed
    LLM pipeline builds argument graphs from plain text

    An LLM-Based System for Argument Mining

    Paulo Pirozelli +3

  15. cs.CL 2026-05-13 reviewed
    Hidden-state transport geometry locates first LLM reasoning error

    Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry

    Tyler Alvarez +1

  16. cs.CL 2026-05-13 reviewed
    MoE beats dense on active params but loses on total capacity

    Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching

    Abdalrahman Wael

  17. cs.LG 2026-05-13 reviewed
    Trajectory balance stops diffusion models locking onto few paths

    Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

    Saba Ahmadi +2

  18. cs.AI 2026-05-13 reviewed
    Models detect sensory-text mismatches inside but ignore them in answers

    Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

    Trung Nguyen Quang +5

  19. cs.CL 2026-05-13 reviewed
    Fine-tuned 8B LLMs beat larger models on children's story difficulty

    Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety

    Qian Shen (1) +7

  20. cs.CL 2026-05-13 reviewed
    RTLC prompting boosts LLM judge accuracy by 14 points

    RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

    Andrea Morandi

  21. cs.CV 2026-05-13 reviewed
    Stage-wise DPO reduces hallucinations in vision-language models

    Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift

    Qinwu Xu

  22. cs.CL 2026-05-13 reviewed
    Fine-tuning plus hierarchical prompts strengthen propaganda detection

    Fine-tuning with Hierarchical Prompting for Robust Propaganda Classification Across Annotation Schemas

    Lukas St\"ahelin +8

  23. cs.LG 2026-05-13 reviewed
    Low-rank training reaches distinct loss basins from full rank

    Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

    Namrata Shivagunde +3

  24. cs.LG 2026-05-13 reviewed
    Low-rank pre-training lands in different loss basins than full-rank

    Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

    Namrata Shivagunde +3

  25. cs.CL 2026-05-13 reviewed
    Compiler produces reusable configs for LLM workflows at 6.4x speedup

    FlowCompile: An Optimizing Compiler for Structured LLM Workflows

    Junyan Li +4

  26. cs.CL 2026-05-13 reviewed
    Truncating supervision at feedback collapse beats full OPD

    Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

    Kaiyuan Liu +5

  27. cs.LG 2026-05-13 reviewed
    RDPO normalizes and whitens rewards to stabilize RL advantages

    Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

    Yang Bai +7

  28. cs.CL 2026-05-13 reviewed
    Edit-level vote reduces over-correction in LLM grammar fixes

    Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

    Takumi Goto +2

  29. cs.CL 2026-05-13 reviewed
    LLM judges favor machine translations over creative literary ones

    Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations

    Kyo Gerrits +2

  30. cs.CL 2026-05-13 reviewed
    Artificial uncertainty on easy data improves real uncertainty probes

    Inducing Artificial Uncertainty in Language Models

    Sophia Hager +2

  31. cs.CV 2026-05-13 reviewed
    OCR training method improves text reading in blurry and cluttered images

    Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models

    Qinwu Xu +3

  32. cs.AI 2026-05-13 reviewed
    LLMs show recall-safety tradeoffs on real ICU data

    RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

    Chengzhi Shen +9

  33. cs.CL 2026-05-13 reviewed
    Locale prompts eliminate SLM copying in on-device PII replacement

    Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

    Anuj Sadani +1

  34. cs.LG 2026-05-13 reviewed
    Temperature adjustment turns reward models into a calibrated SLOP

    Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

    Ye Wang +2

  35. cs.AI 2026-05-13 reviewed
    Students rate AI slides equal to instructor ones

    AI-Generated Slides: Are They Good? Can Students Tell?

    Juho Leinonen +2

  36. cs.CL 2026-05-13 reviewed
    Ordered demos turn many-shot CoT into test-time learning

    Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

    Tsz Ting Chung +3

  37. cs.CL 2026-05-13 reviewed
    Shared covariance summation leads multilingual editing results

    Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

    Kunil Lee +3

  38. cs.CL 2026-05-13 reviewed
    Reflective experiences guide LLM agents to better memory searches

    R^2-Mem: Reflective Experience for Memory Search

    Xinyuan Wang +4

  39. cs.LG 2026-05-13 reviewed
    Fragmentation strictly raises finite-context log-loss

    Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

    Amirmehdi Jafari Fesharaki +2

  40. cs.CL 2026-05-13 reviewed
    Planning mechanism lifts LLM graph retrieval by 18 percent

    PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

    Mikhail Menschikov +10

  41. cs.LG 2026-05-13 reviewed
    OSDN preconditioner cuts recall residual 39% at 1.3B scale

    OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention

    Chenyu Zhou +5

  42. cs.CL 2026-05-13 reviewed
    Decomposed rewards boost vision-language reasoning

    PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

    Hee Suk Yoon +8

  43. cs.CL 2026-05-13 reviewed
    Memory of prior links improves biomedical entity consistency

    LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking

    Adam Remaki +2

  44. cs.AI 2026-05-13 reviewed
    DRAT predicts LLMs' scientific ideation better than prior tests

    Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers

    Samuel Schapiro +3

  45. cs.AI 2026-05-13 reviewed
    Cognitive folding turns event streams into proactive agent memory

    CogniFold: Always-On Proactive Memory via Cognitive Folding

    Suli Wang +5

  46. cs.CL 2026-05-13 reviewed
    BPE dropout during pretraining improves low-resource NLP results

    Pretraining Language Models with Subword Regularization: An Empirical Study of BPE Dropout in Low-Resource NLP

    Ruan Visser +2

  47. cs.CL 2026-05-13 reviewed
    Token alignments from monolingual data speed LLM vocabulary adaptation

    TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

    Chong Li +4

  48. cs.LG 2026-05-13 reviewed
    Two-stage tuning fixes LLM table errors with 1,000 examples

    LIFT: Last-Mile Fine-Tuning for Table Explicitation

    Divij Khaitan +1

  49. cs.LG 2026-05-13 reviewed
    Multi-stage ranking improves checkpoint selection for multimodal LLMs

    Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

    Qinwu Xu +2

  50. cs.CL 2026-05-13 reviewed
    Language-specific thresholds lift slur detection F1 by 2-5%

    KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model

    Barathi Ganesh HB +3