pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 14

  1. cs.CL 2026-05-14 reviewed
    Ukrainian court citations form unsupervised legal ontology

    Automatic Construction of a Legal Citation Graph from 100 Million Ukrainian Court Decisions: Large-Scale Extraction, Topological Analysis, and Ontology-Driven Clustering

    Volodymyr Ovcharov

  2. cs.LG 2026-05-14 reviewed
    Agent turns I/O examples into code via guided evolutionary search

    From I/O to Code with Discovery Agent

    Yihong Dong +9

  3. cs.AI 2026-05-14 reviewed
    LaMR prunes code context to save 31% tokens while matching full performance

    Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

    Jingjing Wang +8

  4. cs.CL 2026-05-14 reviewed
    New tool opens discourse data across 16 languages for local use

    DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations

    Amir Zeldes

  5. cs.RO 2026-05-14 reviewed
    Human video builds physical smarts for top robot policies

    PhysBrain 1.0 Technical Report

    Shijie Lian +12

  6. cs.CL 2026-05-14 reviewed
    Natural literary translations often drift from the original meaning

    Fluency and Faithfulness in Human and Machine Literary Translation

    Sarah Griebel +1

  7. cs.CV 2026-05-14 reviewed
    One token unifies agentic and latent visual reasoning

    ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

    Ziyu Guo +3

  8. cs.LG 2026-05-14 reviewed
    FutureSim shows top AI agents predict events at 25% accuracy

    FutureSim: Replaying World Events to Evaluate Adaptive Agents

    Shashwat Goel +7

  9. cs.CL 2026-05-14 reviewed
    Grep beats vector search in most agentic tasks

    Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

    Sahil Sen +4

  10. cs.CR 2026-05-14 reviewed
    Length alone triggers LLM backdoors to leak secrets

    MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

    Rui Wen +4

  11. cs.CL 2026-05-14 reviewed
    EHR tables sharpen timing in text-based clinical timelines

    Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

    Sayantan Kumar +3

  12. cs.CL 2026-05-14 reviewed
    Memory model lets LLMs add new knowledge without retraining

    MeMo: Memory as a Model

    Ryan Wei Heng Quek +8

  13. cs.CL 2026-05-14 reviewed
    Memory model lets LLMs add knowledge without retraining

    MeMo: Memory as a Model

    Ryan Wei Heng Quek +8

  14. cs.CR 2026-05-14 reviewed
    The paper builds a 507-leaf taxonomy of LLM inference attacks from 932 recent security…

    Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

    Karthik Raghu Iyer +3

  15. cs.CL 2026-05-14 reviewed
    Framework converts text tool benchmarks to audio for voice agents

    From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

    Md Tahmid Rahman Laskar +5

  16. cs.CL 2026-05-14 reviewed
    The paper presents a framework that converts existing text-based tool-calling benchmarks…

    From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

    Md Tahmid Rahman Laskar +5

  17. cs.AI 2026-05-14 reviewed
    Open framework lifts coding agent to 67.5% on SWE-bench

    Orchard: An Open-Source Agentic Modeling Framework

    Baolin Peng +13

  18. cs.LG 2026-05-14 reviewed
    128 random demos suffice for strong RLVR results

    Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

    Kai Yan +2

  19. cs.CL 2026-05-14 reviewed
    Window-level RL raises speculative decoding acceptance to 6.5

    Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

    Jie Jiang +4

  20. cs.CL 2026-05-14 reviewed
    Token counts for Ukrainian legal text differ 1.6 times by model

    Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

    Volodymyr Ovcharov

  21. cs.AI 2026-05-14 reviewed
    Decomposing traces boosts AI agent diagnosis accuracy up to 12x

    Holistic Evaluation and Failure Diagnosis of AI Agents

    Netta Madvil +14

  22. cs.CV 2026-05-14 reviewed
    CIR benchmarks let models solve most queries with one modality

    Do Composed Image Retrieval Benchmarks Require Multimodal Composition?

    Matteo Attimonelli +10

  23. cs.AI 2026-05-14 reviewed
    Graph paths verify legal reasoning in Indian court AI

    Falkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AI

    Joy Bose

  24. cs.CV 2026-05-14 reviewed
    Internal masking cuts hallucinations in vision-language models

    Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

    Tian Qin +5

  25. cs.CL 2026-05-14 reviewed
    Terminal anchors extend LLM context to 64K from short sequences

    EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

    Han Tian +12

  26. cs.CL 2026-05-14 reviewed
    Denoising paths supply low-cost uncertainty scores for language diffusion models

    Uncertainty Quantification for Large Language Diffusion Models

    Artem Vazhentsev +5

  27. cs.SE 2026-05-14 reviewed
    ML classifier beats rules at spotting BDD refactoring chances

    Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines

    Ali Hassaan Mughal +2

  28. cs.SE 2026-05-14 reviewed
    Memory agent keeps repo documentation consistent

    Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

    Suyoung Bae +4

  29. cs.LG 2026-05-14 reviewed
    Action tokens carry the training signal in agentic RL

    Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

    Langzhou He +9

  30. cs.CL 2026-05-14 reviewed
    CIPO turns LLM failures into better reasoning

    Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

    Mengjie Ren +8

  31. cs.CL 2026-05-14 reviewed
    Optimal control view yields language models with both fidelity and parallel speed

    Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

    ZiYi Dong +5

  32. cs.CL 2026-05-14 reviewed
    Optimal control reformulation gives language models fast parallel sampling at high quality

    Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

    ZiYi Dong +5

  33. cs.CL 2026-05-14 reviewed
    Many perfect LLM scores hide dimensional intent failures

    Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

    Gang Peng

  34. cs.CL 2026-05-14 reviewed
    LLM memory systems hit only 46% on group conversations

    GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

    Jingbo Yang +5

  35. cs.CL 2026-05-14 reviewed
    Group chats expose limits of LLM agent memory

    GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

    Jingbo Yang +5

  36. cs.CL 2026-05-14 reviewed
    Ming glossaries used flexible Chinese characters to approximate foreign sounds

    Cross-Linguistic Transcription and Phonological Representation in the Hu\`it\'onggu\v{a}nx\`i Hu\'ay\'iy\`iy\v{u}

    Ji-eun Kim

  37. cs.SE 2026-05-14 reviewed
    Stale code snippets make models output outdated helpers

    When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

    Haojun Weng +4

  38. cs.CL 2026-05-14 reviewed
    RAG follows conflicting context over its own knowledge

    Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

    Yihang Chen +6

  39. cs.CL 2026-05-14 reviewed
    Probe shows RAG follows wrong context in 85 percent of conflict cases

    Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

    Yihang Chen +6

  40. cs.LG 2026-05-14 reviewed
    Guardrails adapt from sparse noisy failures via conservative induction

    LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

    Minbeom Kim +8

  41. cs.LG 2026-05-14 reviewed
    Orthogonal projection isolates hallucination signals in LLM answers

    When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition

    Siyang Yao +2

  42. cs.CV 2026-05-14 reviewed
    Adaptive gate skips reasoning for simple multimodal inputs

    Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

    Longxiang Zhang +4

  43. cs.CL 2026-05-14 reviewed
    Calculus finds optimal vocabulary size for ASR

    A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR

    Sunil Kumar Kopparapu

  44. cs.SE 2026-05-14 reviewed
    Agents resolve 45 percent of chained package upgrades

    SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

    Man Ho Lam +7

  45. cs.CL 2026-05-14 reviewed
    New scores track whether unlearning works across languages

    Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

    Kyomin Hwang +3

  46. cs.CL 2026-05-14 reviewed
    Three-tier memory lifts recommender hit rate by 26 percent

    Agentic Recommender System with Hierarchical Belief-State Memory

    Xiang Shen +10

  47. cs.CL 2026-05-14 reviewed
    Three-tier memory raises recommender hit rate 26 percent

    Agentic Recommender System with Hierarchical Belief-State Memory

    Xiang Shen +10

  48. cs.LG 2026-05-14 reviewed
    Synthetic queries expose five times more LLM failures

    NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

    Qazi Mamunur Rashid +7

  49. cs.LG 2026-05-14 reviewed
    Synthetic queries trigger up to 5x higher LLM failure rates

    NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

    Qazi Mamunur Rashid +7

  50. cs.CL 2026-05-14 reviewed
    Synthetic augmentation lifts defense classification to 58% accuracy

    Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

    Hoang-Thuy-Duong Vu +2