archive
Every paper Pith has read. Search by title, abstract, or pith.
7661 papers in cs.CL · page 17
-
Language-specific thresholds lift slur detection F1 by 2-5%
KIT-TIP-NLP at MultiPride: Continual Learning with Multilingual Foundation Model
-
LLMs annotate asylum credibility with inconsistent errors
LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics
-
External skill library keeps LLM attacks evolving after saturation
Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution
-
Puzzles reveal all-or-nothing success for humans and LLMs
From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks
-
Certificates verify LLM pipelines by auditing only deterministic parts
Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture
-
Fine-tuned BART and T5 parsers beat prior seq2seq models on constituent parsing
Exploiting Pre-trained Encoder-Decoder Transformers for Sequence-to-Sequence Constituent Parsing
-
Phase rotations on unit circle stabilize explicit memory
Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory
-
LLMs self-train on examples generated from the query alone
Query-Conditioned Test-Time Self-Training for Large Language Models
-
Query self-training adapts LLMs using only input-derived pairs
Query-Conditioned Test-Time Self-Training for Large Language Models
-
Document MT then segment refinement beats full-document fixes
What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation
-
Shared preference vector controls LLM choices across personas
Probing Persona-Dependent Preferences in Language Models
-
One vector steers LLM preferences across opposing personas
Probing Persona-Dependent Preferences in Language Models
-
One LLM persuades another to ignore its own safety rules
LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs
-
18,900 questions test financial reasoning in six Indic languages
FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages
-
Persona vectors form in first 0.22% of LLM pretraining
Tracing Persona Vectors Through LLM Pretraining
-
Stereo vision and location priors boost real-world robot navigation
What Limits Vision-and-Language Navigation ?
-
Pooled preferences nearly match individual fine-tuning for personalization
PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users
-
Simple recipe scales reasoning model to olympiad gold
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
-
Contrastive rollouts assign credit to individual agents in LLM teams
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution
-
Parallel dataset gives medical dialogues in nine Indic languages
IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages
-
Latent info gain ranks visual evidence for better multimodal RAG
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
-
Hybrid conversion lets LLMs query building models in plain English
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations
-
GAGPO computes temporal advantages from grouped rollouts without a critic
GAGPO: Generalized Advantage Grouped Policy Optimization
-
Entropy rises with missing context in LLMs
LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information
-
Models frequently fail to build valid geometry diagrams from text
GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language
-
Pruning trims long reasoning by 19-42% with little accuracy loss
STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes
-
Acquisition rewards yield 2-7% gains in student model training
AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions
-
LLMs lag experts on system-level performance code
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
-
Teacher confidence gates improve reasoning in small models
GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning
-
Knowledge base lifts Text-to-SQL accuracy when data is scarce
Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model
-
Small 244M Whisper matches large models on Indic speech
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition
-
This paper applies a generative meta-learning algorithm to spoken word classification…
Does language matter for spoken word classification? A multilingual generative meta-learning approach
-
Multilingual edge in word classification is smaller than expected
Does language matter for spoken word classification? A multilingual generative meta-learning approach
-
LLM JSON stays valid inside tight token budgets
TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints
-
GeMCL classifies 1000 words from five shots each with stable accuracy
Scaling few-shot spoken word classification with generative meta-continual learning
-
GeMCL scales spoken word classification to 1000 classes with five shots each
Scaling few-shot spoken word classification with generative meta-continual learning
-
Deeper thought per algorithm beats more candidates under fixed tokens
Effective Harness Engineering for Algorithm Discovery with Coding Agents
-
GenAI flattens L2 writers' voices into uniform English
The Cost of Perfect English: Pragmatic Flattening and the Erasure of Authorial Voice in L2 Writing Supported by GenAI
-
LLMs predict query-specific validity horizons for web content
RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search
-
Pruning candidate contexts with search tools improves LLM performance
Context Training with Active Information Seeking
-
Pruning multiple search contexts lifts LLM adaptation gains
Context Training with Active Information Seeking
-
LLMs miss when medical guidelines expire
Large Language Models Lack Temporal Awareness of Medical Knowledge
1 Piths -
Jailbreak success in diffusion LMs drops to 0.64% via step-wise remasking
Adaptive Steering and Remasking for Safe Generation in Diffusion Language Models
-
Bell-shaped sampling trains masked diffusion models 4x faster
Understanding and Accelerating the Training of Masked Diffusion Language Models
-
Multimodal reasoning lifts MI coding accuracy to 52 percent
Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction
-
Multimodal voting lifts MI coding accuracy to 52.56%
Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction
-
Speech marks when insights transfer across similar problems
Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving
-
Speech reveals when insights transfer across problems
Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving
-
Repeated insight type speeds solving and boosts problem categorization in speech
Leveraging Speech to Identify Signatures of Insight and Transfer in Problem Solving
-
LLM states project to F2 for 93% zero-shot ontology accuracy
Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2