pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 4

  1. cs.LG 2026-05-21 reviewed
    Smart grid detection uses 75% fewer measurements

    Cyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature Optimization

    Adis Alihod\v{z}i\'c +2

  2. cs.RO 2026-05-21 reviewed
    Multi-agent RL drones beat humans with half the collisions

    Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

    Ismail Geles +3

  3. cs.LG 2026-05-21 reviewed
    ProxySHAP lowers error in Shapley interaction estimates

    Proxy-Based Approximation of Shapley and Banzhaf Interactions

    Santo M. A. R. Thies +5

  4. cs.LG 2026-05-21 reviewed
    Proxy method sets new accuracy standard for Shapley interactions

    Proxy-Based Approximation of Shapley and Banzhaf Interactions

    Santo M. A. R. Thies +5

  5. cs.LG 2026-05-21 reviewed
    Cheap PoE defense narrows gap under adaptive distillation attacks

    The Distillation Game: Adaptive Attacks & Efficient Defenses

    Youssef Allouah +3

  6. cs.AI 2026-05-21 reviewed
    One handler generates both streaming API and MCP tool

    HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

    Edwin Jose

  7. cs.AI 2026-05-21 reviewed
    LLM analysis outperforms acoustics for political pathos

    Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models

    Juergen Dietrich

  8. cs.LG 2026-05-21 reviewed
    State distributions shape post-training outcomes more than loss functions

    Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

    Dong Nie

  9. cs.LG 2026-05-21 reviewed
    Full covariance matching cuts DDPM path error to O(1/T^2)

    The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler

    Md Sahil Akhtar +3

  10. cs.AI 2026-05-21 reviewed
    AI models equate atrocities up to 100 percent when asked for balance

    Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

    Andrii Kryshtal

  11. cs.SD 2026-05-21 reviewed
    Diffusion models match discrete models for live music

    Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

    Zachary Novack +10

  12. cs.AI 2026-05-21 reviewed
    Parametric modules make answer set programs declarative

    Parametric Modular Answer Set Programs Made Declarative

    Jorge Fandinno +2

  13. cs.CV 2026-05-21 reviewed
    Simulated dense placements train IMU model that ignores sensor setup

    AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

    Baiyu Chen +7

  14. cs.AI 2026-05-21 reviewed
    Conversation history pulls LLM judgments toward its tone

    AMEL: Accumulated Message Effects on LLM Judgments

    Sid-ali Temkit

  15. cs.LG 2026-05-21 reviewed
    Relativised options let agents reuse experience across goals in offline RL

    Abstraction for Offline Goal-Conditioned Reinforcement Learning

    Clarisse Wibault +4

  16. cs.AI 2026-05-21 reviewed
    AI reshapes informal mentoring alongside formal roles

    Beyond the Org Chart: AI and the Transformation of Invisible Work

    Stephanie Rosenthal +1

  17. cs.RO 2026-05-21 reviewed
    UAV scouts cut ground robot travel costs by 32-38 percent

    Scout-Assisted Planning for Heterogeneous Robot Teams under Partially Known Environments

    Hoang-Dung Bui +3

  18. cs.AI 2026-05-21 reviewed
    AI models fail to forecast scientific advances

    Forecasting Scientific Progress with Artificial Intelligence

    Sean Wu +9

  19. cs.CV 2026-05-21 reviewed
    Taylor expansion picks surprising frames in long videos

    Swift Sampling: Selecting Temporal Surprises via Taylor Series

    Dahye Kim +5

  20. cs.AI 2026-05-21 reviewed
    Capable models overpredict tails in superlinear forecasts

    Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

    Nick Merrill +2

  21. cs.AI 2026-05-21 reviewed
    More capable models worsen forecasts on growth risks

    Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

    Nick Merrill +2

  22. cs.AI 2026-05-21 reviewed
    LLM agents fall short on professional finance spreadsheets

    MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

    Thomson Yen +11

  23. cs.AI 2026-05-21 reviewed
    One prompt builds a full AI research team with code harness

    Claw AI Lab: An Autonomous Multi-Agent Research Team

    Fan Wu +14

  24. cs.CL 2026-05-21 reviewed
    Moral cues survive machine translation to Polish

    Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora

    Maciej Skorski

  25. cs.AI 2026-05-21 reviewed
    Benchmark scores prompting skill for text-to-image systems

    AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters

    Hanjun Luo +8

  26. cs.AI 2026-05-21 reviewed
    RL training nearly doubles AI success on spreadsheet tasks

    Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

    Banghao Chi +11

  27. cs.CL 2026-05-21 reviewed
    Moral knowledge retrieval beats extra context for political value detection

    More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

    V\'ictor Yeste +1

  28. cs.CL 2026-05-21 reviewed
    Moral knowledge beats extra context and model scaling for value detection

    More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

    V\'ictor Yeste +1

  29. cs.SE 2026-05-21 reviewed
    Contractual skills turn agent instructions into inspectable task contracts

    Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents

    Ting Liu

  30. cs.CY 2026-05-21 reviewed
    Healthcare LLM benchmarks fail because of hidden user assumptions

    Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions

    Naveen Raman +4

  31. cs.CL 2026-05-21 reviewed
    Agentic CLEAR automates multi-level LLM agent evaluation

    Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

    Asaf Yehudai +2

  32. cs.RO 2026-05-21 reviewed
    Agentic-VLA speeds VLA convergence 2.4x with adaptive rewards

    Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

    Ruofan Jin +1

  33. cs.CR 2026-05-21 reviewed
    AI Framework Secures Cardless Banking Against Fraud

    Innovations in Cardless Artificial Intelligence Banking: A Comprehensive Framework for Cyber Secure and Fraud Mitigation using Machine Learning Algorithms

    Md Israfeel

  34. cs.AI 2026-05-21 reviewed
    Small model beats GPT-5 at predicting desires and beliefs in persuasion

    Think Thrice Before You Speak: Dual knowledge-enhanced Theory-of-Mind Reasoning for Persuasive Agents

    Minghui Ma +9

  35. cs.LG 2026-05-21 reviewed
    Residual stress learning narrows real-to-sim gap in dynamics

    MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy

    Jiaxu Wang +7

  36. cs.CV 2026-05-21 reviewed
    3D reconstruction turns floorplan localization into alignment task

    SceneAligner: 3D-Grounded Floorplan Localization in the Wild

    Junhyeong Cho +2

  37. cs.CL 2026-05-21 reviewed
    Hyperfitting expands final LLM layer to promote rare tokens

    Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

    Meimingwei Li +3

  38. cs.CV 2026-05-21 reviewed
    Generative models create controlled videos to test MLLM spatio-temporal reasoning

    VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

    Jinho Park +3

  39. cs.CR 2026-05-21 reviewed
    AI security benchmarks undermined by three flaws

    Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

    Sahar Abdelnabi +3

  40. cs.CV 2026-05-21 reviewed
    Similar cases form graphs that refine medical image diagnoses

    Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement

    Yiming Xu +5

  41. cs.CE 2026-05-21 reviewed
    Hypergraphs built from time series without prior structure

    Dynamic Hypergraph Representation Learning for Multivariate Time Series without Prior Knowledge

    Marco Gregnanin +3

  42. cs.AI 2026-05-21 reviewed
    Agents reach only 62.5% on real terminal tasks

    TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

    Zhaoyang Chu +10

    2 Piths
  43. cs.AI 2026-05-21 reviewed
    Safety arguments update confidence dynamically with runtime data

    A Subjective Logic-based method for runtime confidence updates in safety arguments

    Benjamin Herd +3

  44. cs.LG 2026-05-21 reviewed
    Multicollinearity inflates AI explanation variance in cybersecurity

    Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets

    Ioannis J. Vourganas +1

  45. cs.AI 2026-05-21 reviewed
    Meta-learning adapts controllers for uncertain systems with few samples

    Meta-Learning for Rapid Adaptation in Reference Tracking of Uncertain Nonlinear Systems

    Jiaqi Yan +4

  46. cs.AI 2026-05-21 reviewed
    Self-distillation drives search reasoners to 0.440 EM

    Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

    Zihan Liang +6

  47. cs.AI 2026-05-21 reviewed
    Ranking components predicts harness optimizer performance

    Towards Direct Evaluation of Harness Optimizers via Priority Ranking

    Kai Tzu-iunn Ong +11

  48. cs.AI 2026-05-21 reviewed
    Latent sharing speeds up collaborative driving coordination

    LACO: Adaptive Latent Communication for Collaborative Driving

    Tianhao Chen +2

  49. cs.AI 2026-05-21 reviewed
    Workflows baked into small model weights cut agent costs 100x

    Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

    Simon Dennis +3

  50. cs.CL 2026-05-21 reviewed
    Generative re-ranker lifts biomedical linking accuracy 3-24%

    BeLink: Biomedical Entity Linking Meets Generative Re-Ranking

    Darya Shlyk +2