pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 7

  1. cs.CL 2026-05-19 reviewed
    LLM use adds complex words and syntax to NLP papers

    What Are LLMs Doing to Scientific Communication? Measuring Changes in Writing Practices and Reading Experience

    Filip Mileti\'c +1

  2. cs.AI 2026-05-19 reviewed
    Context map cache raises LLM agent accuracy 6-34% on recurring tasks

    PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

    Zhuohan Gu +3

  3. cs.CL 2026-05-19 reviewed
    Scorer choice sets the layer where authorship signals consolidate

    Where Does Authorship Signal Emerge in Encoder-Based Language Models?

    Francis Kulumba +3

  4. cs.CL 2026-05-19 reviewed
    Model learns when to skip tools for better multimodal answers

    Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

    Qinghe Ma +5

  5. cs.CL 2026-05-19 reviewed
    Influence functions fix model errors via key sample and concept tweaks

    CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

    Yike Sun +7

  6. cs.CV 2026-05-19 reviewed
    Dense benchmark exposes open VLMs' gaps on subtle human actions

    FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure +4

  7. cs.CV 2026-05-19 reviewed
    Open VLMs struggle with fine details in human video actions

    FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure +4

  8. cs.CV 2026-05-19 reviewed
    Dual-stream network lifts weather detection at full speed

    CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving

    Sherif Khairy +1

  9. cs.SD 2026-05-19 reviewed
    Scaled simulations cut speech recognition errors over 30 percent

    Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

    Zhifei Xie +6

  10. cs.AI 2026-05-19 reviewed
    Temporal conditioning changes AV planner style but not scores

    From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning

    Ahmed Y. Gado +4

  11. cs.CL 2026-05-19 reviewed
    Rubric shows LLMs generate mostly high-quality legal propositions

    LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation

    Shanshan Xu +4

  12. cs.CL 2026-05-19 reviewed
    Section-based chunking tops recall in German legal retrieval

    Chunking German Legal Code

    Max Prior +2

  13. cs.CL 2026-05-19 reviewed
    LLMs generate coherent multimodal behaviors for ability and benevolence

    Towards Trust Calibration in Socially Interactive Agents: Investigating Gendered Multimodal Behaviors Generation with LLMs

    Lucie Galland +2

  14. cs.CL 2026-05-19 reviewed
    Long-term medical dialogue benchmark reveals LLM limitations

    Synthesis and Evaluation of Long-term History-aware Medical Dialogue

    Hebin Hu +3

  15. cs.AI 2026-05-19 reviewed
    Pure code boosts programming but hurts complex math reasoning

    What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

    Yuze Zhao +8

  16. cs.CL 2026-05-19 reviewed
    Node topology turned into text improves graph anomaly detection

    TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection

    Wen Shi +8

  17. cs.CL 2026-05-19 reviewed
    Fuzzy concept graph cuts RAG indexing to 30 LLM calls

    ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation

    Roman Prosvirnin +2

  18. cs.CL 2026-05-19 reviewed
    Review of 120 studies maps LLM math reasoning gaps

    Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

    Husnain Amjad +3

  19. cs.CL 2026-05-19 reviewed
    Parser trained on CHILDES beats general tools on child speech

    CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

    Francesca Padovani +6

  20. cs.CL 2026-05-19 reviewed
    84K Arabic samples built for Saudi financial sentiment analysis

    LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

    Mona H. Albaqawi +3

  21. cs.CL 2026-05-19 reviewed
    LLMs fix West Frisian ASR errors on unseen texts

    Can Large Language Models Reliably Correct Errors in Low-Resource ASR? A Contamination-Aware Case Study on West Frisian

    Yun Hao +4

    3 Piths
  22. cs.LG 2026-05-19 reviewed
    OScaR reaches near-lossless INT2 KV cache quantization

    OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

    Zunhai Su +13

  23. cs.CL 2026-05-19 reviewed
    2-bit LLMs retain most accuracy on reasoning tasks

    K-Quantization and its Impact on Output Performance

    Robin Baki Davidsson +1

  24. cs.CL 2026-05-19 reviewed
    One LLM system optimizes text to beat specialists on six tasks

    optimize_anything: A Universal API for Optimizing any Text Parameter

    Lakshya A Agrawal +13

  25. cs.CL 2026-05-19 reviewed
    New Chinese benchmark caps LLM logical accuracy at 37.5 percent

    LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

    Ming Zhang +15

  26. cs.CL 2026-05-19 reviewed
    Open dataset and reweighting match big models in long-context RL

    GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

    Minxuan Lv +11

  27. cs.AI 2026-05-19 reviewed
    Governance recipe lifts LLM skill-library performance from 0.26 to 0.58

    Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

    Xing Zhang +6

  28. cs.CL 2026-05-19 reviewed
    No multi-word expression is absolutely idiomatic

    A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics

    Elena Mikhalkova +5

  29. cs.CL 2026-05-19 reviewed
    One model serves many embedding sizes in retrieval

    m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder

    Yaoxiang Wang +6

  30. cs.CL 2026-05-19 reviewed
    Merging LLMs into VLMs boosts instructions but not math

    Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

    Zhiyu Xu +7

  31. cs.CL 2026-05-19 reviewed
    Base models fool AI detectors into rating text as human

    Base Models Look Human To AI Detectors

    Yixuan Even Xu +4

  32. cs.AI 2026-05-19 reviewed
    Context management determines real-world Transformer Turing-completeness

    Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

    Guanyu Cui +2

  33. cs.CL 2026-05-19 reviewed
    TokenDrift cuts Gen-PPL by 89% at 4 steps in DDLMs

    Drifting Objectives for Refining Discrete Diffusion Language Models

    Daisuke Oba +2

  34. cs.LG 2026-05-19 reviewed
    CEPO boosts math reasoning to 43.43% at 2B and 60.56% at 4B

    CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

    Ahmed Heakl +6

  35. cs.CL 2026-05-19 reviewed
    Backtracking fixes dual biases in LLM reasoning distillation

    Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

    Bing Wang +9

  36. cs.CL 2026-05-19 reviewed
    Pairwise confidence weights sharpen LLM policy optimization

    LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

    Redacted by arXiv +6

  37. cs.CL 2026-05-19 reviewed
    Pairwise sums replace group means in LLM policy optimization

    LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

    Redacted by arXiv +6

  38. cs.CL 2026-05-19 reviewed
    Reassembling entity pairs boosts synthetic QA accuracy by 88.9%

    EmbGen: Teaching with Reassembled Corpora

    Arun K Lenin +3

  39. cs.CL 2026-05-19 reviewed
    Entropy shaping makes LLMs concise on easy math and deeper on hard ones

    Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

    Shuyu Wei +8

  40. cs.CL 2026-05-19 reviewed
    Framework creates custom science benchmarks for LLMs from existing data

    SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

    Yiyang Gu +17

  41. cs.MA 2026-05-19 reviewed
    Architecture lets AI agents break rules legitimately when justified

    PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies

    Ahmad Yehia +6

  42. cs.CL 2026-05-19 reviewed
    Supreme Court quashes 18 points more matrimonial petitions than Karnataka HC

    IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis

    Joy Bose

  43. cs.CL 2026-05-19 reviewed
    Retrieval rewriting lifts LLM calibration up to 58%

    Retrieval-Augmented Linguistic Calibration

    Yi-Fan Yeh +6

  44. cs.CL 2026-05-19 reviewed
    Benchmark labels hallucinations via explicit reference worlds

    HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

    Emmy Liu +6

    5 Piths
  45. cs.MA 2026-05-19 reviewed
    STAR-PólyaMath hits perfect scores on Putnam and IMO

    STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

    Jiaao Wu +5

  46. cs.GT 2026-05-19 reviewed
    LLMs close 99% of deals but earn low profits in hidden pricing

    PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

    Yingjie Lei

  47. cs.CL 2026-05-19 reviewed
    Multi-agent evaluators lock reading items to target difficulty

    A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

    Seonjeong Hwang +3

  48. cs.CL 2026-05-19 reviewed
    Small targeted probes break document parsers as much as large ones

    How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

    Yue Chen +4

  49. cs.CL 2026-05-19 reviewed
    Metric selects only necessary rationales for LLM misinformation checks

    Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

    Bing Wang +8

  50. cs.CL 2026-05-19 reviewed
    LLMs learn redundant copies of concepts across languages

    Language models struggle with compartmentalization

    Thomas Vincent Howe +1