archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 12
-
Hard examples stay unlearnable in RLVR despite correct rollouts
The Unlearnability Phenomenon in RLVR for Language Models
-
LLMs forget targeted knowledge via neutral remaps and closed-form edits
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
-
Closed-form update unlearns sensitive LLM knowledge in few shots
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
-
Lightweight LLMs match DNNs on court view generation
Exploring Lightweight Large Language Models for Court View Generation
-
Retrieval assigns legal labels without hallucinations or retraining
Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free
-
500-step pre-training on structured language matches LLM efficiency and adds human-like抵抗
Language Acquisition Device in Large Language Models
-
fMRI decodes continuous affect for individualized caption rewriting
EmoMind: Decoding Affective Captions from Human Brain fMRI
-
Bounding commitments cuts personalization failures to zero
Recall Isn't Enough: Bounding Commitments in Personalized Language Systems
-
AI Agents Complete Only 28% of Healthcare Workflow Tasks
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
-
RoBERTa spots manner and result verbs at 89.6% accuracy
A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research
-
Graphs track dialogue state for better consistency checks
SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs
-
Generative models output continuous emotion intensity scores
Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Text
-
Standard embeddings match MRL after truncation except at 80% cuts
To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Learning, Except In Heavy Truncation Scenarios
-
Alignment updates concentrate in transformer read pathways
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space
-
HTML versions advance math paper accessibility with MathML 4
Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4
-
PQR finds 23-78% more QA agent failures with realistic queries
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures
-
Symphony outperforms on medical speech recognition
Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces
-
Medical speech system outperforms current leaders on clinical tasks
Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces
-
LLMs require critical parameter choices for qualitative work
LLMs in Qualitative Research: Opportunities, Limitations, and Practical Considerations
-
LLM outputs drift toward past context in extended chats
Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework
-
Decay slope couples routing loss and execution rescue in LLM agent libraries
The Scaling Laws of Skills in LLM Agent Systems
-
One framework turns utility numbers into readable bills with carbon totals
A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation
-
AI edits can steer collective opinions across networks
AI-Mediated Communication Can Steer Collective Opinion
-
Swap test choice changes safe layers for transformer pruning
No Free Swap: Protocol-Dependent Layer Redundancy in Transformers
-
Population broadcast lifts LLM agent returns up to 7.7x without weight updates
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
-
AI framework unifies gas distribution
A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation
-
Lesioned models produce aphasia symptoms unlike human cases
Artificial Aphasias in Lesioned Language Models
-
Evidence graph dispatches parallel searchers to reach 86.2 on BrowseComp
Argus: Evidence Assembly for Scalable Deep Research Agents
-
Navigator assembles research from complementary evidence pieces
Argus: Evidence Assembly for Scalable Deep Research Agents
-
Open pipeline lifts clinical LLMs to new benchmark highs
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs
-
LLM tutors spot optimal steps but over-accept wrong solutions
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most
-
State abstraction yields 76% higher returns per token for LLM agents
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
-
Value profiles from surveys cut LLM cross-country errors
Improving Cross-Cultural Survey Simulation with Calibrated Value Personas
-
AI tree search finds better 3D solar panel shapes
Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search
-
Exploration-first training improves LLM agents in new environments
Look Before You Leap: Autonomous Exploration for LLM Agents
-
External subgraphs guide LLMs to sharper multi-step answers
SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation
-
Retrieval reverses bias to make LLMs fairer without tuning
DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation
-
Token relations fix bias in machine text detection
Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection
-
Generative AI supports literature reviews through summarization and queries
Generative Artificial Intelligence for Literature Reviews
-
LLM similarity selection lowers error for low cognitive scores
Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction
-
Hybrid AI beats LLMs on unseen tax law cases
Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
-
RecMem cuts LLM agent memory costs by up to 87%
RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents
-
Typological priors replace flat labels for better multilingual S2ST
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
-
Sparse mid-layer circuit handles LLM judgment across formats
Judge Circuits
-
VLMs vary in adapting math lessons to student profiles
Can Vision Language Models Be Adaptive in Mathematics Education? A Learner Model-based Rubric Study
-
Taxonomy separates AI cultural knowledge from framing and adaptation
Defining Cultural Capabilities for AI Evaluation: A Taxonomy Grounded in Intercultural Communication Theory
-
Symbolic ontology extracts facts from police report narratives
Ontology for Policing: Conceptual Knowledge Learning for Semantic Understanding and Reasoning in Law Enforcement Reports
-
RL fine-tunes MT models to +5 chrF++ without any parallel data
Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective
-
Graduated signals let AI companions flag risks without false alarms on positive states
SLIP & ETHICS: Graduated Intervention for AI Emotional Companions
-
Block attention nears full performance via semantic blocks
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation