archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 4

cs.LG 2026-05-21 reviewed

Smart grid detection uses 75% fewer measurements
Cyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature Optimization

Adis Alihod\v{z}i\'c +2
cs.RO 2026-05-21 reviewed

Multi-agent RL drones beat humans with half the collisions
Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

Ismail Geles +3
cs.LG 2026-05-21 reviewed

ProxySHAP lowers error in Shapley interaction estimates
Proxy-Based Approximation of Shapley and Banzhaf Interactions

Santo M. A. R. Thies +5
cs.LG 2026-05-21 reviewed

Proxy method sets new accuracy standard for Shapley interactions
Proxy-Based Approximation of Shapley and Banzhaf Interactions

Santo M. A. R. Thies +5
cs.LG 2026-05-21 reviewed

Cheap PoE defense narrows gap under adaptive distillation attacks
The Distillation Game: Adaptive Attacks & Efficient Defenses

Youssef Allouah +3
cs.AI 2026-05-21 reviewed

One handler generates both streaming API and MCP tool
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

Edwin Jose
cs.AI 2026-05-21 reviewed

LLM analysis outperforms acoustics for political pathos
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models

Juergen Dietrich
cs.LG 2026-05-21 reviewed

State distributions shape post-training outcomes more than loss functions
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

Dong Nie
cs.LG 2026-05-21 reviewed

Full covariance matching cuts DDPM path error to O(1/T^2)
The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler

Md Sahil Akhtar +3
cs.AI 2026-05-21 reviewed

AI models equate atrocities up to 100 percent when asked for balance
Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

Andrii Kryshtal
cs.SD 2026-05-21 reviewed

Diffusion models match discrete models for live music
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Zachary Novack +10
cs.AI 2026-05-21 reviewed

Parametric modules make answer set programs declarative
Parametric Modular Answer Set Programs Made Declarative

Jorge Fandinno +2
cs.CV 2026-05-21 reviewed

Simulated dense placements train IMU model that ignores sensor setup
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Baiyu Chen +7
cs.AI 2026-05-21 reviewed

Conversation history pulls LLM judgments toward its tone
AMEL: Accumulated Message Effects on LLM Judgments

Sid-ali Temkit
cs.LG 2026-05-21 reviewed

Relativised options let agents reuse experience across goals in offline RL
Abstraction for Offline Goal-Conditioned Reinforcement Learning

Clarisse Wibault +4
cs.AI 2026-05-21 reviewed

AI reshapes informal mentoring alongside formal roles
Beyond the Org Chart: AI and the Transformation of Invisible Work

Stephanie Rosenthal +1
cs.RO 2026-05-21 reviewed

UAV scouts cut ground robot travel costs by 32-38 percent
Scout-Assisted Planning for Heterogeneous Robot Teams under Partially Known Environments

Hoang-Dung Bui +3
cs.AI 2026-05-21 reviewed

AI models fail to forecast scientific advances
Forecasting Scientific Progress with Artificial Intelligence

Sean Wu +9
cs.CV 2026-05-21 reviewed

Taylor expansion picks surprising frames in long videos
Swift Sampling: Selecting Temporal Surprises via Taylor Series

Dahye Kim +5
cs.AI 2026-05-21 reviewed

Capable models overpredict tails in superlinear forecasts
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

Nick Merrill +2
cs.AI 2026-05-21 reviewed

More capable models worsen forecasts on growth risks
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

Nick Merrill +2
cs.AI 2026-05-21 reviewed

LLM agents fall short on professional finance spreadsheets
MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

Thomson Yen +11
cs.AI 2026-05-21 reviewed

One prompt builds a full AI research team with code harness
Claw AI Lab: An Autonomous Multi-Agent Research Team

Fan Wu +14
cs.CL 2026-05-21 reviewed

Moral cues survive machine translation to Polish
Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora

Maciej Skorski
cs.AI 2026-05-21 reviewed

Benchmark scores prompting skill for text-to-image systems
AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters

Hanjun Luo +8
cs.AI 2026-05-21 reviewed

RL training nearly doubles AI success on spreadsheet tasks
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Banghao Chi +11
cs.CL 2026-05-21 reviewed

Moral knowledge retrieval beats extra context for political value detection
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

V\'ictor Yeste +1
cs.CL 2026-05-21 reviewed

Moral knowledge beats extra context and model scaling for value detection
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

V\'ictor Yeste +1
cs.SE 2026-05-21 reviewed

Contractual skills turn agent instructions into inspectable task contracts
Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents

Ting Liu
cs.CY 2026-05-21 reviewed

Healthcare LLM benchmarks fail because of hidden user assumptions
Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions

Naveen Raman +4
cs.CL 2026-05-21 reviewed

Agentic CLEAR automates multi-level LLM agent evaluation
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

Asaf Yehudai +2
cs.RO 2026-05-21 reviewed

Agentic-VLA speeds VLA convergence 2.4x with adaptive rewards
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

Ruofan Jin +1
cs.CR 2026-05-21 reviewed

AI Framework Secures Cardless Banking Against Fraud
Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms

Md Israfeel
cs.AI 2026-05-21 reviewed

Small model beats GPT-5 at predicting desires and beliefs in persuasion
Think Thrice Before You Speak: Dual knowledge-enhanced Theory-of-Mind Reasoning for Persuasive Agents

Minghui Ma +9
cs.LG 2026-05-21 reviewed

Residual stress learning narrows real-to-sim gap in dynamics
MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy

Jiaxu Wang +7
cs.CV 2026-05-21 reviewed

3D reconstruction turns floorplan localization into alignment task
SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Junhyeong Cho +2
cs.CL 2026-05-21 reviewed

Hyperfitting expands final LLM layer to promote rare tokens
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

Meimingwei Li +3
cs.CV 2026-05-21 reviewed

Generative models create controlled videos to test MLLM spatio-temporal reasoning
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

Jinho Park +3
cs.CR 2026-05-21 reviewed

AI security benchmarks undermined by three flaws
Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

Sahar Abdelnabi +3
cs.CV 2026-05-21 reviewed

Similar cases form graphs that refine medical image diagnoses
Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement

Yiming Xu +5
cs.CE 2026-05-21 reviewed

Hypergraphs built from time series without prior structure
Dynamic Hypergraph Representation Learning for Multivariate Time Series without Prior Knowledge

Marco Gregnanin +3
cs.AI 2026-05-21 reviewed

Agents reach only 62.5% on real terminal tasks
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

Zhaoyang Chu +10

2 Piths
cs.AI 2026-05-21 reviewed

Safety arguments update confidence dynamically with runtime data
A Subjective Logic-based method for runtime confidence updates in safety arguments

Benjamin Herd +3
cs.LG 2026-05-21 reviewed

Multicollinearity inflates AI explanation variance in cybersecurity
Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets

Ioannis J. Vourganas +1
cs.AI 2026-05-21 reviewed

Meta-learning adapts controllers for uncertain systems with few samples
Meta-Learning for Rapid Adaptation in Reference Tracking of Uncertain Nonlinear Systems

Jiaqi Yan +4
cs.AI 2026-05-21 reviewed

Self-distillation drives search reasoners to 0.440 EM
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Zihan Liang +6
cs.AI 2026-05-21 reviewed

Ranking components predicts harness optimizer performance
Towards Direct Evaluation of Harness Optimizers via Priority Ranking

Kai Tzu-iunn Ong +11
cs.AI 2026-05-21 reviewed

Latent sharing speeds up collaborative driving coordination
LACO: Adaptive Latent Communication for Collaborative Driving

Tianhao Chen +2
cs.AI 2026-05-21 reviewed

Workflows baked into small model weights cut agent costs 100x
Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Simon Dennis +3
cs.CL 2026-05-21 reviewed

Generative re-ranker lifts biomedical linking accuracy 3-24%
BeLink: Biomedical Entity Linking Meets Generative Re-Ranking

Darya Shlyk +2