archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 17

cs.CV 2026-05-18 reviewed

FAGER metric leads in factual checks for AI image generators
FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

Youngsun Lim +3
cs.RO 2026-05-18 reviewed

One model predicts shapes for many tendon-driven continuum robots
Neural Operators for Design-Space Surrogate Modeling of Tendon-Actuated Continuum Robots

Branden Frieden +3
cs.AI 2026-05-18 reviewed

Benchmark shows 15-31 point headroom for better AI delegation
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

Yuxuan Gao +4
cs.LG 2026-05-18 reviewed

ScheduleFree+ beats WSD schedules on long LLM pretraining
ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

Aaron Defazio
cs.AI 2026-05-18 reviewed

LLM elicits dynamic features to optimize system prompts
Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

Zhiyuan Jerry Lin +4
cs.LG 2026-05-18 reviewed

Graph separation shows public channels carry all indirect private influence
Counterfactual Likelihood Tests for Indirect Influence in Private Reasoning Channels

Alexander Boesgaard Lorup (Openhagen)
cs.LG 2026-05-18 reviewed

MANGO achieves top results in online continual learning benchmarks
MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning

Ankita Awasthi +2
cs.CL 2026-05-18 reviewed

Bounded ReAct loop boosts zero-shot DST by 14 points
ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

Yanjun Lin +9
cs.CV 2026-05-18 reviewed

CRAFT pipeline leads MAGMaR video QA at 0.739 average
CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering

Mahesh Bhosale +5
cs.CV 2026-05-18 reviewed

Multi-horizon training captures longer solar forecast dependencies
Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting

Sumit Laha +2
cs.LG 2026-05-18 reviewed

Networks on correlation matrices beat SPD and Grassmannian baselines
Riemannian Networks over Full-Rank Correlation Matrices

Ziheng Chen +3
cs.CL 2026-05-18 reviewed

ElevenLabs ASR leads on code-switched speech at 13 percent error
Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Sajjad Abdoli +4
cs.CL 2026-05-18 reviewed

ElevenLabs Scribe v2 leads on code-switched Arabic
Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Sajjad Abdoli +4
cs.CL 2026-05-18 reviewed

ElevenLabs Scribe leads on code-switched ASR with 13.2% WER
Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Sajjad Abdoli +4
cs.HC 2026-05-18 reviewed

AI agents simulate employee responses to AI workplace changes
Toward an AI-Powered Computational Testbed for Workforce Policy

Sumer S. Vaid +1
cs.CV 2026-05-18 reviewed

LiFT lifts 2D generators to coherent 3D medical volumes
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

Xinhe Zhang +5
cs.LG 2026-05-18 reviewed

KVBuffer cuts linear attention decoding latency by up to 45%
KVBuffer: IO-aware Serving for Linear Attention

Longwei Zou +1
cs.CY 2026-05-18 reviewed

Vision LLMs grade handwritten math with high accuracy
Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs

Jacob Levine +4
cs.AI 2026-05-18 reviewed

Gradient projection and orthogonalization cut multi-task unlearning interference
Interference-Aware Multi-Task Unlearning

Ying-Hua Huang +3
cs.AI 2026-05-18 reviewed

Agent networks need trust built in from the start
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

Yixiang Yao +7
cs.RO 2026-05-18 reviewed

RL fine-tuning aligns traffic simulations with real data
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

Ehsan Ahmadi +7
cs.AI 2026-05-18 reviewed

Hybrid KAN-MLP raises F1 scores 5.33% in IMU activity recognition
KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Mengxi Liu +7
cs.AI 2026-05-18 reviewed

Multi-agent LLM method hits 78.1% accuracy on NL2SQL benchmark
AgentNLQ: A General-Purpose Agent for Natural Language to SQL

Olena Bogdanov +7
cs.AI 2026-05-18 reviewed

Control layer above optimizer keeps LLM training stable under stress
Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

Anis Radianis
cs.LG 2026-05-18 reviewed

Oracle routing lifts selective refusal scores by 12.9 points
Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing

Bryce Hinkley +1
cs.LG 2026-05-18 reviewed

Distillation transfers linearized task arithmetic to non-linear models
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

Thomas Sommariva +3
cs.LG 2026-05-18 reviewed

Distillation gives non-linear models linearized task arithmetic
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

Thomas Sommariva +3
cs.CR 2026-05-18 reviewed

Treat AI models as untrusted to secure agents
Agent Security is a Systems Problem

Mihai Christodorescu +13
cs.CR 2026-05-18 reviewed

Agent security requires system-level enforcement treating models as untrusted
Agent Security is a Systems Problem

Mihai Christodorescu +13
cs.CR 2026-05-18 reviewed

TRIAD bounds time-to-failure for multi-turn multimodal attacks
Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

Doohee You
cs.CV 2026-05-18 reviewed

Self-supervised backbones boost artwork classification
Harnessing Self-Supervised Features for Art Classification

Federico Melis +4
cs.LG 2026-05-18 reviewed

Synthetic prior with stress and realism lifts tabular model performance
Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

Mohamed Bouadi +5
cs.CL 2026-05-18 reviewed

Adaptive block selection matches full attention at 75% sparsity
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

Yuxiang Huang +7
cs.CL 2026-05-18 reviewed

Code harness turns LLMs into verifiable AI agents
Code as Agent Harness

Xuying Ning +41
cs.CV 2026-05-18 reviewed

Active exploration outperforms passive in spatial intelligence tasks
ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Yining Hong +7
cs.AI 2026-05-18 reviewed

Neural architecture learns object state manifolds from sensor data
WorldString: Actionable World Representation

Kunqi Xu +6
cs.AI 2026-05-18 reviewed

Neural architecture learns object state changes from 3D scans
WorldString: Actionable World Representation

Kunqi Xu +6
cs.CV 2026-05-18 reviewed

Self-distillation from crops boosts MLLM detail recognition
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Qianhao Yuan +6
cs.AI 2026-05-18 reviewed

AI medical advisors underweight patient autonomy
What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

Payal Chandak +13
cs.AI 2026-05-18 reviewed

PHR context boosts helpfulness of LLM health answers
Evaluating the Utility of Personal Health Records in Personalized Health AI

Rory Sayres +21
cs.CL 2026-05-18 reviewed

LLM fact recall improves with model size and topic frequency in data
Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency

Matthew L. Smith +4
cs.RO 2026-05-18 reviewed

Benchmark tests dexterous Texas Hold'em play at 61 percent success
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Feng Chen +8
cs.CV 2026-05-18 reviewed

Segmentation proxy aligns multimodal understanding and generation
Semantic Generative Tuning for Unified Multimodal Models

Songsong Yu +3
cs.LG 2026-05-18 reviewed

Distilled students match 90% AUC from health foundation models
Distilling Tabular Foundation Models for Structured Health Data

Aditya Tanna +4
cs.DC 2026-05-18 reviewed

PopPy speeds Python AI apps up to 6.4x by parallelizing external calls
PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

Stephen Mell +4
cs.LG 2026-05-18 reviewed

Tabular foundation models show little diversity for ensembling
Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

Aditya Tanna +5
cs.AI 2026-05-18 reviewed

Benchmark tests LLM agents on generating reusable skills
SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

Yifan Zhou +10
cs.AI 2026-05-18 reviewed

LLM converts user prompts into optimization model patches
Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Tinghan Ye +4
cs.SE 2026-05-18 reviewed

Multi-agent pipeline extracts traceable specs from legacy code
Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Sanderson Oliveira de Macedo +1
cs.AI 2026-05-18 reviewed

Perturbation metric scores and trains better AI explanations
Learning Quantifiable Visual Explanations Without Ground-Truth

Amritpal Singh +4