pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 3

  1. cs.CL 2026-05-21 reviewed
    Any embedding model can rank first with the right prompt

    One prompt is not enough: Instruction Sensitivity Undermines Embedding Model Evaluation

    Yevhen Kostiuk +1

  2. cs.CL 2026-05-21 reviewed
    Scene profiles match human word interpretations 86 percent of the time

    Scene Abstraction for Lexical Semantics: Structured Representations of Situated Meaning

    Yejin Cho +1

  3. cs.CV 2026-05-21 reviewed
    Degraded images break spatial reasoning in current AI

    SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

    Xiaolong Zhou +10

  4. cs.AI 2026-05-21 reviewed
    Self-distillation drives search reasoners to 0.440 EM

    Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

    Zihan Liang +6

  5. cs.HC 2026-05-21 reviewed
    Adaptive agent improves integrated thinking for decisions

    Reflecti-Mate: A Conversational Agent for Adaptive Decision-Making Support Through System 1 and System 2 Thinking

    Morita Tarvirdians +4

  6. cs.CL 2026-05-21 reviewed
    Generative re-ranker lifts biomedical linking accuracy 3-24%

    BeLink: Biomedical Entity Linking Meets Generative Re-Ranking

    Darya Shlyk +2

  7. cs.CL 2026-05-21 reviewed
    Curated Bangla dataset corrects honorific errors in LLMs

    Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation

    Md. Asaduzzaman Shuvo +4

  8. cs.LG 2026-05-21 reviewed
    Blockwise resolvent attention runs entity tracking in O(n to 4/3 d) time

    Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity

    Hangyue Zhao +3

  9. cs.CL 2026-05-21 reviewed
    Entropy model separates cognitive from physical speech masking

    In Silico Modeling of the RAMPHO Buffer: Dissociating Informational and Energetic Masking via Phonetic Entropy in Deep Neural Networks

    Stefan Bleeck

  10. cs.CL 2026-05-21 reviewed
    Method finds selective features only partially causal for IOI task

    From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models

    Caleb Munigety

  11. cs.CL 2026-05-21 reviewed
    Conflict posts draw 2-4 times more engagement than resolution posts

    Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

    Aisha Ali Al-Athba +1

  12. cs.CL 2026-05-21 reviewed
    Mixed sources yield best counterspeech for hate plus misinformation

    Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation

    Genoveffa Martone +2

  13. cs.CL 2026-05-21 reviewed
    Query-time RL turns noisy memory into accurate evidence

    DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA

    Jianing Yin +1

  14. cs.AI 2026-05-21 reviewed
    Three models embed ingredients via recipe and chemistry graphs

    Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings

    Jakub Radzikowski +1

  15. cs.CL 2026-05-21 reviewed
    Entropy sum of top tokens selects best LLM reasoning data

    Unified Data Selection for LLM Reasoning

    Xiaoyuan Li +8

  16. cs.CL 2026-05-21 reviewed
    Multi-stage pipeline cuts false positives in Indic abusive comment detection

    Multi-Stage Training for Abusive Comment Detection in Indic Languages

    Pranshu Rastogi +3

  17. cs.LG 2026-05-21 reviewed
    Attack recovers 19% of safety classifier distress data

    Boundary-targeted Membership Inference Attacks on Safety Classifiers

    Anthony Hughes +5

  18. cs.LG 2026-05-21 reviewed
    Boundary attacks recover 19% of safety classifier training data

    Boundary-targeted Membership Inference Attacks on Safety Classifiers

    Anthony Hughes +5

  19. cs.CL 2026-05-21 reviewed
    Fine-tuning induces depression-like biases in LLMs

    Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

    Nicola Milano +1

  20. cs.CL 2026-05-21 reviewed
    LLMs learn to plan transit routes from records alone

    TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

    Hanyu Guo +5

  21. cs.CL 2026-05-21 reviewed
    Reversing root-and-pattern classifies Arabic broken plurals

    Pattern-and-root inflectional morphology: the Arabic broken plural

    Alexis Amid Neme +1

  22. cs.CL 2026-05-21 reviewed
    Chinese toxicity detectors miss 69 percent of implicit attacks

    Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting

    Jingyi Kang +6

  23. cs.CL 2026-05-21 reviewed
    Models fail to match idiom meanings to literal equivalents

    IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

    Kai Golan Hashiloni +5

  24. cs.CL 2026-05-21 reviewed
    Compact model approaches 11B results on aspect sentiment tasks

    GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis

    Yu Du +5

  25. cs.LG 2026-05-21 reviewed
    Strict gate stabilizes self-play RL regardless of reward

    Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL

    Sophia Xiao Pu +6

  26. cs.CL 2026-05-21 reviewed
    Corpus of 252k Arabic posts maps engagement on women's issues

    Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

    Wajdi Zaghouani +3

  27. cs.CL 2026-05-21 reviewed
    Recursive chunking wins for Khmer farm document search

    Evaluation of Chunking Strategies for Effective Text Embedding in Low-Resource Language on Agricultural Documents

    Sovandara Chhoun +4

  28. cs.CL 2026-05-21 reviewed
    Nearest-neighbor overlap predicts embedding model scores

    Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance

    Amanda Myntti +3

  29. cs.CL 2026-05-21 reviewed
    Wikipedia-style rewrite flips quality filter decisions on 7% of docs

    Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

    Mateusz Klimaszewski +1

  30. cs.LG 2026-05-21 reviewed
    4B RL policy beats GPT-5 by picking expert models

    Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

    Jinyang Wu +9

  31. cs.CL 2026-05-21 reviewed
    Factual recall circuits from text only partly apply to speech in multimodal models

    Do Factual Recall Mechanisms Carry over from Text to Speech in Multimodal Language Models?

    Luca Modica +5

  32. cs.AI 2026-05-21 reviewed
    Hygiene rules enable LLM agents to self-improve skills effectively

    Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

    Xing Zhang +6

  33. cs.CL 2026-05-21 reviewed
    Pipeline generates semester-long campus counseling dialogues

    Psy-Chronicle:A Structured Pipeline for Synthesizing Long-Horizon Campus Psychological Counseling Dialogues

    Chaogui Gou +1

  34. cs.AI 2026-05-21 reviewed
    30B agents rival 1T models with 25-95% fewer tokens

    Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

    Mingkai Deng +6

  35. cs.CL 2026-05-21 reviewed
    Multilingual self-checks lift English cultural accuracy

    Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency

    Andrew Ivan Soegeng +2

  36. cs.CL 2026-05-21 reviewed
    BGE-M3 leads Khmer retrieval while generators split by metric

    A Comparative Study of Language Models for Khmer Retrieval-Augmented Question Answering

    Sereiwathna Ros +5

  37. cs.CL 2026-05-21 reviewed
    New Arabic corpus tracks decade of Facebook racism posts

    ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

    Wajdi Zaghouani +3

  38. cs.CL 2026-05-21 reviewed
    LLMs reach 66% match on BIM-to-IDS but only 28% pass content audits

    Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

    Ryo Kanazawa +11

  39. cs.LG 2026-05-21 reviewed
    Subproblem curriculum RL improves LLM math reasoning by 4.1 points

    From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

    Xitai Jiang +5

  40. cs.CL 2026-05-21 reviewed
    Anchoring attention improves multimodal reasoning with less data

    Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention

    Changyuan Tian +9

  41. cs.CL 2026-05-21 reviewed
    Hy-MT2 models beat Microsoft and Doubao translation APIs

    Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

    Mao Zheng +52

  42. cs.CL 2026-05-21 reviewed
    Data flywheel lifts LLM router accuracy from 73% to 90%

    FlyRoute: Self-Evolving Agent Profiling via Data Flywheel for Adaptive Task Routing

    Rongjun Li +2

  43. cs.CV 2026-05-21 reviewed
    Hypernetwork builds on-the-fly LoRA adapters for continual VQA

    HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering

    Yiran Wang +5

  44. cs.CL 2026-05-21 reviewed
    Latent reasoning beats text CoT for audio-visual tasks

    LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

    Yifan Dai +20

  45. cs.CL 2026-05-21 reviewed
    Larger LLMs hallucinate despite knowing the answer

    Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

    Jewon Yeom +5

  46. cs.LG 2026-05-21 reviewed
    Five lines of code expose an LLM's hidden vocabulary secrets

    Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

    Hisashi Miyashita

  47. cs.CL 2026-05-21 reviewed
    RoBERTa reaches 93 percent accuracy on IMDb sentiment task

    From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification

    Dip Biswas Shanto +3

  48. cs.CR 2026-05-21 reviewed
    Camouflaged attacks slash LLM guard detection from 94% to 10%

    Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

    Aaditya Pai

  49. cs.AI 2026-05-21 reviewed
    User refinements raise code agent acceptance from 25.7% to 35.7%

    Echo: Learning from Experience Data via User-Driven Refinement

    Hande Dong +17

  50. cs.CL 2026-05-21 reviewed
    SpecHop speculation trims multi-hop latency up to 40%

    SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents

    Mehrdad Saberi +2