archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 13
-
Prompt tuning with UMLS synonyms labels reports from 32 examples
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling
-
Cleaner code reduces agent token use by 7-8% with no change in success
Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study
-
Critic disagreement guides reward poisoning in RIS networks
When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System
-
Multi-agent system improves autonomous research by 54.7 percent
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
-
Skills add almost no value to cybersecurity agents with rich tool feedback
When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity
-
Two antagonistic Bayesian processes set the optimal learning rate
Training Neural Networks with Optimal Double-Bayesian Learning
-
Self-play with code rewards lifts geospatial AI by 5.5 points
GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
-
LLM benchmarks can be made unlearnable to stop contamination
LLM Benchmark Datasets Should Be Contamination-Resistant
-
Agent skills from expert methods beat docs for PostgreSQL tuning
A Case for Agentic Tuning: From Documentation to Action in PostgreSQL
-
Lookahead training improves neural routing policies
Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction
-
Block-sphere quantizer lowers MSE and inner-product error
Block-Sphere Vector Quantization
-
Entropy change-point detection spots fluent LLM jailbreaks
Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes
-
Rule-based system stages sleep by encoding AASM manual in code
Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules
-
World-ego split lifts long-horizon hybrid robot modeling
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
-
GPU-aware expert mapping cuts MoE latency by 7.9 percent on average
GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems
-
Position-dependent attention fixes constant risk on shifted reasoning
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits
-
Noise in recursion lifts tiny model puzzle accuracy to 99%
Probabilistic Tiny Recursive Model
-
Robotics control ideas yield runtime guardrails for AI social interactions
Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains
-
Context map cache raises LLM agent accuracy 6-34% on recurring tasks
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
-
Model fuses lidar and plot data for lower-bias forest biomass maps
StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels
-
SplitQ keeps 93.5% accuracy at 3-bit VLM quantization
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
-
Parallel CFR runs 3.3 times faster on billion-history poker trees
Real-Time Parallel Counterfactual Regret Minimization
-
Fast method learns node reps from labels without features
Fast and Featureless Node Representation Learning with Partial Pairwise Supervision
-
CNN on solutions guides LLM to write 1000x faster streamliners
Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions
-
Space Data Centers Process Satellite Data in Orbit
Deep Tech to Space: Space Data Centers and AI Revolution at the Edge
-
Persona prompts lift construction safety checks by 12 percent
Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification
-
Post-backprop rescaling fixes gradient scales in deep nets without BatchNorm
StableGrad: Backward Scale Control without Batch Normalization
-
Zero-shot image models fall short on concept faithfulness for XAI
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
-
Open VLMs struggle with fine details in human video actions
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
-
Dense benchmark exposes open VLMs' gaps on subtle human actions
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
-
Dual-stream network lifts weather detection at full speed
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
-
Framework fuses sensor data with physics rules for better passenger counts
A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams
-
Scaled simulations cut speech recognition errors over 30 percent
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
-
Structured simulator cuts wastewater regret by 43.6 percent
Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support
-
Temporal conditioning changes AV planner style but not scores
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
-
Domain cuts let neural operators handle PDE discontinuities
Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions
-
Explainer splits stable and changing links for temporal GNNs
ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability
-
Rubric shows LLMs generate mostly high-quality legal propositions
LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation
-
Benchmark separates ML models on flux extrapolation via tail errors
FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes
-
Section-based chunking tops recall in German legal retrieval
Chunking German Legal Code
-
Laplace diffusion generates long forecasts for irregular time series
Latent Laplace Diffusion for Irregular Multivariate Time Series
-
Stitched model lifts rewards to noisy latents for faster alignment
Stitched Value Model for Diffusion Alignment
-
Semi-supervised method reaches 79.99% Dice in fetal heart ultrasound
Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement
-
Protocol captures synchronized multimodal meeting data
AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research
-
LLMs optimize code via priors
Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization
-
Data-driven rule picks best SGD-to-Muon geometry per layer
From SGD to Muon: Adaptive Optimization via Schatten-p Norms
-
Conformal methods deliver distribution-free coverage for AI agent scores
Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation
-
Hard-coded verifiers beat LLM judges at matching human evaluations
OpenComputer: Verifiable Software Worlds for Computer-Use Agents
-
Variance-aware regret bound proven optimal for logistic MDPs
Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs
-
Rank-1 queries keep ZO signals strong for high-rank LoRA
AR1-ZO: Topology-Aware Rank-1 Zeroth-Order Queries for High-Rank LoRA Fine-Tuning