pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 17

  1. cs.CV 2026-05-18 reviewed
    FAGER metric leads in factual checks for AI image generators

    FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

    Youngsun Lim +3

  2. cs.RO 2026-05-18 reviewed
    One model predicts shapes for many tendon-driven continuum robots

    Neural Operators for Design-Space Surrogate Modeling of Tendon-Actuated Continuum Robots

    Branden Frieden +3

  3. cs.AI 2026-05-18 reviewed
    Benchmark shows 15-31 point headroom for better AI delegation

    DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

    Yuxuan Gao +4

  4. cs.LG 2026-05-18 reviewed
    ScheduleFree+ beats WSD schedules on long LLM pretraining

    ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

    Aaron Defazio

  5. cs.AI 2026-05-18 reviewed
    LLM elicits dynamic features to optimize system prompts

    Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

    Zhiyuan Jerry Lin +4

  6. cs.LG 2026-05-18 reviewed
    Graph separation shows public channels carry all indirect private influence

    Counterfactual Likelihood Tests for Indirect Influence in Private Reasoning Channels

    Alexander Boesgaard Lorup (Openhagen)

  7. cs.LG 2026-05-18 reviewed
    MANGO achieves top results in online continual learning benchmarks

    MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning

    Ankita Awasthi +2

  8. cs.CL 2026-05-18 reviewed
    Bounded ReAct loop boosts zero-shot DST by 14 points

    ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

    Yanjun Lin +9

  9. cs.CV 2026-05-18 reviewed
    CRAFT pipeline leads MAGMaR video QA at 0.739 average

    CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering

    Mahesh Bhosale +5

  10. cs.CV 2026-05-18 reviewed
    Multi-horizon training captures longer solar forecast dependencies

    Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting

    Sumit Laha +2

  11. cs.LG 2026-05-18 reviewed
    Networks on correlation matrices beat SPD and Grassmannian baselines

    Riemannian Networks over Full-Rank Correlation Matrices

    Ziheng Chen +3

  12. cs.CL 2026-05-18 reviewed
    ElevenLabs ASR leads on code-switched speech at 13 percent error

    Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

    Sajjad Abdoli +4

  13. cs.CL 2026-05-18 reviewed
    ElevenLabs Scribe v2 leads on code-switched Arabic

    Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

    Sajjad Abdoli +4

  14. cs.CL 2026-05-18 reviewed
    ElevenLabs Scribe leads on code-switched ASR with 13.2% WER

    Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

    Sajjad Abdoli +4

  15. cs.HC 2026-05-18 reviewed
    AI agents simulate employee responses to AI workplace changes

    Toward an AI-Powered Computational Testbed for Workforce Policy

    Sumer S. Vaid +1

  16. cs.CV 2026-05-18 reviewed
    LiFT lifts 2D generators to coherent 3D medical volumes

    LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

    Xinhe Zhang +5

  17. cs.LG 2026-05-18 reviewed
    KVBuffer cuts linear attention decoding latency by up to 45%

    KVBuffer: IO-aware Serving for Linear Attention

    Longwei Zou +1

  18. cs.CY 2026-05-18 reviewed
    Vision LLMs grade handwritten math with high accuracy

    Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs

    Jacob Levine +4

  19. cs.AI 2026-05-18 reviewed
    Gradient projection and orthogonalization cut multi-task unlearning interference

    Interference-Aware Multi-Task Unlearning

    Ying-Hua Huang +3

  20. cs.AI 2026-05-18 reviewed
    Agent networks need trust built in from the start

    Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

    Yixiang Yao +7

  21. cs.RO 2026-05-18 reviewed
    RL fine-tuning aligns traffic simulations with real data

    RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

    Ehsan Ahmadi +7

  22. cs.AI 2026-05-18 reviewed
    Hybrid KAN-MLP raises F1 scores 5.33% in IMU activity recognition

    KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

    Mengxi Liu +7

  23. cs.AI 2026-05-18 reviewed
    Multi-agent LLM method hits 78.1% accuracy on NL2SQL benchmark

    AgentNLQ: A General-Purpose Agent for Natural Language to SQL

    Olena Bogdanov +7

  24. cs.AI 2026-05-18 reviewed
    Control layer above optimizer keeps LLM training stable under stress

    Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

    Anis Radianis

  25. cs.LG 2026-05-18 reviewed
    Oracle routing lifts selective refusal scores by 12.9 points

    Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing

    Bryce Hinkley +1

  26. cs.LG 2026-05-18 reviewed
    Distillation transfers linearized task arithmetic to non-linear models

    Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

    Thomas Sommariva +3

  27. cs.LG 2026-05-18 reviewed
    Distillation gives non-linear models linearized task arithmetic

    Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

    Thomas Sommariva +3

  28. cs.CR 2026-05-18 reviewed
    Treat AI models as untrusted to secure agents

    Agent Security is a Systems Problem

    Mihai Christodorescu +13

  29. cs.CR 2026-05-18 reviewed
    Agent security requires system-level enforcement treating models as untrusted

    Agent Security is a Systems Problem

    Mihai Christodorescu +13

  30. cs.CR 2026-05-18 reviewed
    TRIAD bounds time-to-failure for multi-turn multimodal attacks

    Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

    Doohee You

  31. cs.CV 2026-05-18 reviewed
    Self-supervised backbones boost artwork classification

    Harnessing Self-Supervised Features for Art Classification

    Federico Melis +4

  32. cs.LG 2026-05-18 reviewed
    Synthetic prior with stress and realism lifts tabular model performance

    Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

    Mohamed Bouadi +5

  33. cs.CL 2026-05-18 reviewed
    Adaptive block selection matches full attention at 75% sparsity

    DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

    Yuxiang Huang +7

  34. cs.CL 2026-05-18 reviewed
    Code harness turns LLMs into verifiable AI agents

    Code as Agent Harness

    Xuying Ning +41

  35. cs.CV 2026-05-18 reviewed
    Active exploration outperforms passive in spatial intelligence tasks

    ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

    Yining Hong +7

  36. cs.AI 2026-05-18 reviewed
    Neural architecture learns object state manifolds from sensor data

    WorldString: Actionable World Representation

    Kunqi Xu +6

  37. cs.AI 2026-05-18 reviewed
    Neural architecture learns object state changes from 3D scans

    WorldString: Actionable World Representation

    Kunqi Xu +6

  38. cs.CV 2026-05-18 reviewed
    Self-distillation from crops boosts MLLM detail recognition

    Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

    Qianhao Yuan +6

  39. cs.AI 2026-05-18 reviewed
    AI medical advisors underweight patient autonomy

    What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

    Payal Chandak +13

  40. cs.AI 2026-05-18 reviewed
    PHR context boosts helpfulness of LLM health answers

    Evaluating the Utility of Personal Health Records in Personalized Health AI

    Rory Sayres +21

  41. cs.CL 2026-05-18 reviewed
    LLM fact recall improves with model size and topic frequency in data

    Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency

    Matthew L. Smith +4

  42. cs.RO 2026-05-18 reviewed
    Benchmark tests dexterous Texas Hold'em play at 61 percent success

    DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

    Feng Chen +8

  43. cs.CV 2026-05-18 reviewed
    Segmentation proxy aligns multimodal understanding and generation

    Semantic Generative Tuning for Unified Multimodal Models

    Songsong Yu +3

  44. cs.LG 2026-05-18 reviewed
    Distilled students match 90% AUC from health foundation models

    Distilling Tabular Foundation Models for Structured Health Data

    Aditya Tanna +4

  45. cs.DC 2026-05-18 reviewed
    PopPy speeds Python AI apps up to 6.4x by parallelizing external calls

    PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

    Stephen Mell +4

  46. cs.LG 2026-05-18 reviewed
    Tabular foundation models show little diversity for ensembling

    Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

    Aditya Tanna +5

  47. cs.AI 2026-05-18 reviewed
    Benchmark tests LLM agents on generating reusable skills

    SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

    Yifan Zhou +10

  48. cs.AI 2026-05-18 reviewed
    LLM converts user prompts into optimization model patches

    Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

    Tinghan Ye +4

  49. cs.SE 2026-05-18 reviewed
    Multi-agent pipeline extracts traceable specs from legacy code

    Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

    Sanderson Oliveira de Macedo +1

  50. cs.AI 2026-05-18 reviewed
    Perturbation metric scores and trains better AI explanations

    Learning Quantifiable Visual Explanations Without Ground-Truth

    Amritpal Singh +4