archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 4
-
Smart grid detection uses 75% fewer measurements
Cyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature Optimization
-
Multi-agent RL drones beat humans with half the collisions
Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning
-
ProxySHAP lowers error in Shapley interaction estimates
Proxy-Based Approximation of Shapley and Banzhaf Interactions
-
Proxy method sets new accuracy standard for Shapley interactions
Proxy-Based Approximation of Shapley and Banzhaf Interactions
-
Cheap PoE defense narrows gap under adaptive distillation attacks
The Distillation Game: Adaptive Attacks & Efficient Defenses
-
One handler generates both streaming API and MCP tool
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools
-
LLM analysis outperforms acoustics for political pathos
Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models
-
State distributions shape post-training outcomes more than loss functions
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation
-
Full covariance matching cuts DDPM path error to O(1/T^2)
The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler
-
AI models equate atrocities up to 100 percent when asked for balance
Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts
-
Diffusion models match discrete models for live music
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
-
Parametric modules make answer set programs declarative
Parametric Modular Answer Set Programs Made Declarative
-
Simulated dense placements train IMU model that ignores sensor setup
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild
-
Conversation history pulls LLM judgments toward its tone
AMEL: Accumulated Message Effects on LLM Judgments
-
Relativised options let agents reuse experience across goals in offline RL
Abstraction for Offline Goal-Conditioned Reinforcement Learning
-
AI reshapes informal mentoring alongside formal roles
Beyond the Org Chart: AI and the Transformation of Invisible Work
-
UAV scouts cut ground robot travel costs by 32-38 percent
Scout-Assisted Planning for Heterogeneous Robot Teams under Partially Known Environments
-
AI models fail to forecast scientific advances
Forecasting Scientific Progress with Artificial Intelligence
-
Taylor expansion picks surprising frames in long videos
Swift Sampling: Selecting Temporal Surprises via Taylor Series
-
Capable models overpredict tails in superlinear forecasts
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most
-
More capable models worsen forecasts on growth risks
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most
-
LLM agents fall short on professional finance spreadsheets
MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance
-
One prompt builds a full AI research team with code harness
Claw AI Lab: An Autonomous Multi-Agent Research Team
-
Moral cues survive machine translation to Polish
Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora
-
Benchmark scores prompting skill for text-to-image systems
AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters
-
RL training nearly doubles AI success on spreadsheet tasks
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
-
Moral knowledge retrieval beats extra context for political value detection
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts
-
Moral knowledge beats extra context and model scaling for value detection
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts
-
Contractual skills turn agent instructions into inspectable task contracts
Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents
-
Healthcare LLM benchmarks fail because of hidden user assumptions
Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions
-
Agentic CLEAR automates multi-level LLM agent evaluation
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents
-
Agentic-VLA speeds VLA convergence 2.4x with adaptive rewards
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models
-
AI Framework Secures Cardless Banking Against Fraud
Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms
-
Small model beats GPT-5 at predicting desires and beliefs in persuasion
Think Thrice Before You Speak: Dual knowledge-enhanced Theory-of-Mind Reasoning for Persuasive Agents
-
Residual stress learning narrows real-to-sim gap in dynamics
MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy
-
3D reconstruction turns floorplan localization into alignment task
SceneAligner: 3D-Grounded Floorplan Localization in the Wild
-
Hyperfitting expands final LLM layer to promote rare tokens
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion
-
Generative models create controlled videos to test MLLM spatio-temporal reasoning
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis
-
AI security benchmarks undermined by three flaws
Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard
-
Similar cases form graphs that refine medical image diagnoses
Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement
-
Hypergraphs built from time series without prior structure
Dynamic Hypergraph Representation Learning for Multivariate Time Series without Prior Knowledge
-
Agents reach only 62.5% on real terminal tasks
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks
2 Piths -
Safety arguments update confidence dynamically with runtime data
A Subjective Logic-based method for runtime confidence updates in safety arguments
-
Multicollinearity inflates AI explanation variance in cybersecurity
Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets
-
Meta-learning adapts controllers for uncertain systems with few samples
Meta-Learning for Rapid Adaptation in Reference Tracking of Uncertain Nonlinear Systems
-
Self-distillation drives search reasoners to 0.440 EM
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning
-
Ranking components predicts harness optimizer performance
Towards Direct Evaluation of Harness Optimizers via Priority Ranking
-
Latent sharing speeds up collaborative driving coordination
LACO: Adaptive Latent Communication for Collaborative Driving
-
Workflows baked into small model weights cut agent costs 100x
Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost
-
Generative re-ranker lifts biomedical linking accuracy 3-24%
BeLink: Biomedical Entity Linking Meets Generative Re-Ranking