archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 3

cs.CL 2026-05-21 reviewed

Any embedding model can rank first with the right prompt
One prompt is not enough: Instruction Sensitivity Undermines Embedding Model Evaluation

Yevhen Kostiuk +1
cs.CL 2026-05-21 reviewed

Scene profiles match human word interpretations 86 percent of the time
Scene Abstraction for Lexical Semantics: Structured Representations of Situated Meaning

Yejin Cho +1
cs.CV 2026-05-21 reviewed

Degraded images break spatial reasoning in current AI
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Xiaolong Zhou +10
cs.AI 2026-05-21 reviewed

Self-distillation drives search reasoners to 0.440 EM
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Zihan Liang +6
cs.HC 2026-05-21 reviewed

Adaptive agent improves integrated thinking for decisions
Reflecti-Mate: A Conversational Agent for Adaptive Decision-Making Support Through System 1 and System 2 Thinking

Morita Tarvirdians +4
cs.CL 2026-05-21 reviewed

Generative re-ranker lifts biomedical linking accuracy 3-24%
BeLink: Biomedical Entity Linking Meets Generative Re-Ranking

Darya Shlyk +2
cs.CL 2026-05-21 reviewed

Curated Bangla dataset corrects honorific errors in LLMs
Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation

Md. Asaduzzaman Shuvo +4
cs.LG 2026-05-21 reviewed

Blockwise resolvent attention runs entity tracking in O(n to 4/3 d) time
Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity

Hangyue Zhao +3
cs.CL 2026-05-21 reviewed

Entropy model separates cognitive from physical speech masking
In Silico Modeling of the RAMPHO Buffer: Dissociating Informational and Energetic Masking via Phonetic Entropy in Deep Neural Networks

Stefan Bleeck
cs.CL 2026-05-21 reviewed

Method finds selective features only partially causal for IOI task
From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models

Caleb Munigety
cs.CL 2026-05-21 reviewed

Conflict posts draw 2-4 times more engagement than resolution posts
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Aisha Ali Al-Athba +1
cs.CL 2026-05-21 reviewed

Mixed sources yield best counterspeech for hate plus misinformation
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation

Genoveffa Martone +2
cs.CL 2026-05-21 reviewed

Query-time RL turns noisy memory into accurate evidence
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA

Jianing Yin +1
cs.AI 2026-05-21 reviewed

Three models embed ingredients via recipe and chemistry graphs
Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings

Jakub Radzikowski +1
cs.CL 2026-05-21 reviewed

Entropy sum of top tokens selects best LLM reasoning data
Unified Data Selection for LLM Reasoning

Xiaoyuan Li +8
cs.CL 2026-05-21 reviewed

Multi-stage pipeline cuts false positives in Indic abusive comment detection
Multi-Stage Training for Abusive Comment Detection in Indic Languages

Pranshu Rastogi +3
cs.LG 2026-05-21 reviewed

Attack recovers 19% of safety classifier distress data
Boundary-targeted Membership Inference Attacks on Safety Classifiers

Anthony Hughes +5
cs.LG 2026-05-21 reviewed

Boundary attacks recover 19% of safety classifier training data
Boundary-targeted Membership Inference Attacks on Safety Classifiers

Anthony Hughes +5
cs.CL 2026-05-21 reviewed

Fine-tuning induces depression-like biases in LLMs
Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

Nicola Milano +1
cs.CL 2026-05-21 reviewed

LLMs learn to plan transit routes from records alone
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Hanyu Guo +5
cs.CL 2026-05-21 reviewed

Reversing root-and-pattern classifies Arabic broken plurals
Pattern-and-root inflectional morphology: the Arabic broken plural

Alexis Amid Neme +1
cs.CL 2026-05-21 reviewed

Chinese toxicity detectors miss 69 percent of implicit attacks
Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting

Jingyi Kang +6
cs.CL 2026-05-21 reviewed

Models fail to match idiom meanings to literal equivalents
IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

Kai Golan Hashiloni +5
cs.CL 2026-05-21 reviewed

Compact model approaches 11B results on aspect sentiment tasks
GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis

Yu Du +5
cs.LG 2026-05-21 reviewed

Strict gate stabilizes self-play RL regardless of reward
Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL

Sophia Xiao Pu +6
cs.CL 2026-05-21 reviewed

Corpus of 252k Arabic posts maps engagement on women's issues
Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

Wajdi Zaghouani +3
cs.CL 2026-05-21 reviewed

Recursive chunking wins for Khmer farm document search
Evaluation of Chunking Strategies for Effective Text Embedding in Low-Resource Language on Agricultural Documents

Sovandara Chhoun +4
cs.CL 2026-05-21 reviewed

Nearest-neighbor overlap predicts embedding model scores
Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance

Amanda Myntti +3
cs.CL 2026-05-21 reviewed

Wikipedia-style rewrite flips quality filter decisions on 7% of docs
Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

Mateusz Klimaszewski +1
cs.LG 2026-05-21 reviewed

4B RL policy beats GPT-5 by picking expert models
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Jinyang Wu +9
cs.CL 2026-05-21 reviewed

Factual recall circuits from text only partly apply to speech in multimodal models
Do Factual Recall Mechanisms Carry over from Text to Speech in Multimodal Language Models?

Luca Modica +5
cs.AI 2026-05-21 reviewed

Hygiene rules enable LLM agents to self-improve skills effectively
Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Xing Zhang +6
cs.CL 2026-05-21 reviewed

Pipeline generates semester-long campus counseling dialogues
Psy-Chronicle:A Structured Pipeline for Synthesizing Long-Horizon Campus Psychological Counseling Dialogues

Chaogui Gou +1
cs.AI 2026-05-21 reviewed

30B agents rival 1T models with 25-95% fewer tokens
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

Mingkai Deng +6
cs.CL 2026-05-21 reviewed

Multilingual self-checks lift English cultural accuracy
Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency

Andrew Ivan Soegeng +2
cs.CL 2026-05-21 reviewed

BGE-M3 leads Khmer retrieval while generators split by metric
A Comparative Study of Language Models for Khmer Retrieval-Augmented Question Answering

Sereiwathna Ros +5
cs.CL 2026-05-21 reviewed

New Arabic corpus tracks decade of Facebook racism posts
ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

Wajdi Zaghouani +3
cs.CL 2026-05-21 reviewed

LLMs reach 66% match on BIM-to-IDS but only 28% pass content audits
Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

Ryo Kanazawa +11
cs.LG 2026-05-21 reviewed

Subproblem curriculum RL improves LLM math reasoning by 4.1 points
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Xitai Jiang +5
cs.CL 2026-05-21 reviewed

Anchoring attention improves multimodal reasoning with less data
Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention

Changyuan Tian +9
cs.CL 2026-05-21 reviewed

Hy-MT2 models beat Microsoft and Doubao translation APIs
Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

Mao Zheng +52
cs.CL 2026-05-21 reviewed

Data flywheel lifts LLM router accuracy from 73% to 90%
FlyRoute: Self-Evolving Agent Profiling via Data Flywheel for Adaptive Task Routing

Rongjun Li +2
cs.CV 2026-05-21 reviewed

Hypernetwork builds on-the-fly LoRA adapters for continual VQA
HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering

Yiran Wang +5
cs.CL 2026-05-21 reviewed

Latent reasoning beats text CoT for audio-visual tasks
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Yifan Dai +20
cs.CL 2026-05-21 reviewed

Larger LLMs hallucinate despite knowing the answer
Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

Jewon Yeom +5
cs.LG 2026-05-21 reviewed

Five lines of code expose an LLM's hidden vocabulary secrets
Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

Hisashi Miyashita
cs.CL 2026-05-21 reviewed

RoBERTa reaches 93 percent accuracy on IMDb sentiment task
From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification

Dip Biswas Shanto +3
cs.CR 2026-05-21 reviewed

Camouflaged attacks slash LLM guard detection from 94% to 10%
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

Aaditya Pai
cs.AI 2026-05-21 reviewed

User refinements raise code agent acceptance from 25.7% to 35.7%
Echo: Learning from Experience Data via User-Driven Refinement

Hande Dong +17
cs.CL 2026-05-21 reviewed

SpecHop speculation trims multi-hop latency up to 40%
SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents

Mehrdad Saberi +2