pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 13

  1. cs.CL 2026-05-19 reviewed
    Prompt tuning with UMLS synonyms labels reports from 32 examples

    PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling

    Ying-Jia Lin +5

  2. cs.SE 2026-05-19 reviewed
    Cleaner code reduces agent token use by 7-8% with no change in success

    Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study

    Priyansh Trivedi +1

  3. cs.LG 2026-05-19 reviewed
    Critic disagreement guides reward poisoning in RIS networks

    When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System

    Deemah H. Tashman +1

  4. cs.AI 2026-05-19 reviewed
    Multi-agent system improves autonomous research by 54.7 percent

    AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

    Jiaqi Liu +35

  5. cs.AI 2026-05-19 reviewed
    Skills add almost no value to cybersecurity agents with rich tool feedback

    When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity

    Samuel Jacob Chacko +3

  6. cs.LG 2026-05-19 reviewed
    Two antagonistic Bayesian processes set the optimal learning rate

    Training Neural Networks with Optimal Double-Bayesian Learning

    Vy Bui +4

  7. cs.AI 2026-05-19 reviewed
    Self-play with code rewards lifts geospatial AI by 5.5 points

    GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards

    Kyeongjin Ahn +3

  8. cs.LG 2026-05-19 reviewed
    LLM benchmarks can be made unlearnable to stop contamination

    LLM Benchmark Datasets Should Be Contamination-Resistant

    Ali Al-Lawati +3

  9. cs.SE 2026-05-19 reviewed
    Agent skills from expert methods beat docs for PostgreSQL tuning

    A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

    Hongyu Lin +6

  10. cs.LG 2026-05-19 reviewed
    Lookahead training improves neural routing policies

    Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction

    Xia Jiang +3

  11. cs.LG 2026-05-19 reviewed
    Block-sphere quantizer lowers MSE and inner-product error

    Block-Sphere Vector Quantization

    Heesang Ann +2

  12. cs.LG 2026-05-19 reviewed
    Entropy change-point detection spots fluent LLM jailbreaks

    Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes

    Mohammed Alshaalan +1

  13. eess.SP 2026-05-19 reviewed
    Rule-based system stages sleep by encoding AASM manual in code

    Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

    Emil Hardarson +5

  14. cs.CV 2026-05-19 reviewed
    World-ego split lifts long-horizon hybrid robot modeling

    World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks

    Zuyao Lin +5

  15. cs.DC 2026-05-19 reviewed
    GPU-aware expert mapping cuts MoE latency by 7.9 percent on average

    GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

    Sourish Wawdhane +2

  16. cs.LG 2026-05-19 reviewed
    Position-dependent attention fixes constant risk on shifted reasoning

    A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

    Yuyang Zhang +3

  17. cs.AI 2026-05-19 reviewed
    Noise in recursion lifts tiny model puzzle accuracy to 99%

    Probabilistic Tiny Recursive Model

    Amin Sghaier +2

  18. cs.AI 2026-05-19 reviewed
    Robotics control ideas yield runtime guardrails for AI social interactions

    Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

    Rebecca Ramnauth +2

  19. cs.AI 2026-05-19 reviewed
    Context map cache raises LLM agent accuracy 6-34% on recurring tasks

    PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

    Zhuohan Gu +3

  20. cs.CV 2026-05-19 reviewed
    Model fuses lidar and plot data for lower-bias forest biomass maps

    StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels

    Reza M. Asiyabi +4

  21. cs.CV 2026-05-19 reviewed
    SplitQ keeps 93.5% accuracy at 3-bit VLM quantization

    Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

    Yi Zhong +4

  22. cs.GT 2026-05-19 reviewed
    Parallel CFR runs 3.3 times faster on billion-history poker trees

    Real-Time Parallel Counterfactual Regret Minimization

    Boning Li +1

  23. cs.LG 2026-05-19 reviewed
    Fast method learns node reps from labels without features

    Fast and Featureless Node Representation Learning with Partial Pairwise Supervision

    Sujan Chakraborty +1

  24. cs.AI 2026-05-19 reviewed
    CNN on solutions guides LLM to write 1000x faster streamliners

    Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions

    Patrick Spracklen

  25. cs.DC 2026-05-19 reviewed
    Space Data Centers Process Satellite Data in Orbit

    Deep Tech to Space: Space Data Centers and AI Revolution at the Edge

    Jonas Weiss +18

  26. cs.CV 2026-05-19 reviewed
    Persona prompts lift construction safety checks by 12 percent

    Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification

    Ananth Sriram +2

  27. cs.LG 2026-05-19 reviewed
    Post-backprop rescaling fixes gradient scales in deep nets without BatchNorm

    StableGrad: Backward Scale Control without Batch Normalization

    Jose I. Mestre +4

  28. cs.CV 2026-05-19 reviewed
    Zero-shot image models fall short on concept faithfulness for XAI

    A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability

    Giacomo Astolfi +4

  29. cs.CV 2026-05-19 reviewed
    Open VLMs struggle with fine details in human video actions

    FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure +4

  30. cs.CV 2026-05-19 reviewed
    Dense benchmark exposes open VLMs' gaps on subtle human actions

    FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure +4

  31. cs.CV 2026-05-19 reviewed
    Dual-stream network lifts weather detection at full speed

    CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving

    Sherif Khairy +1

  32. cs.LG 2026-05-19 reviewed
    Framework fuses sensor data with physics rules for better passenger counts

    A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams

    Yiyao Xu +3

  33. cs.SD 2026-05-19 reviewed
    Scaled simulations cut speech recognition errors over 30 percent

    Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

    Zhifei Xie +6

  34. cs.AI 2026-05-19 reviewed
    Structured simulator cuts wastewater regret by 43.6 percent

    Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support

    Gary Simethy +2

  35. cs.AI 2026-05-19 reviewed
    Temporal conditioning changes AV planner style but not scores

    From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning

    Ahmed Y. Gado +4

  36. cs.LG 2026-05-19 reviewed
    Domain cuts let neural operators handle PDE discontinuities

    Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions

    Ha Dang +2

  37. cs.LG 2026-05-19 reviewed
    Explainer splits stable and changing links for temporal GNNs

    ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability

    Hongjiang Chen +7

  38. cs.CL 2026-05-19 reviewed
    Rubric shows LLMs generate mostly high-quality legal propositions

    LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation

    Shanshan Xu +4

  39. cs.LG 2026-05-19 reviewed
    Benchmark separates ML models on flux extrapolation via tail errors

    FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes

    Anya Fries +4

  40. cs.CL 2026-05-19 reviewed
    Section-based chunking tops recall in German legal retrieval

    Chunking German Legal Code

    Max Prior +2

  41. cs.LG 2026-05-19 reviewed
    Laplace diffusion generates long forecasts for irregular time series

    Latent Laplace Diffusion for Irregular Multivariate Time Series

    Zinuo You +2

  42. cs.CV 2026-05-19 reviewed
    Stitched model lifts rewards to noisy latents for faster alignment

    Stitched Value Model for Diffusion Alignment

    Hyojun Go +10

  43. cs.CV 2026-05-19 reviewed
    Semi-supervised method reaches 79.99% Dice in fetal heart ultrasound

    Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement

    Tonghao Zhuang (1) +7

  44. cs.HC 2026-05-19 reviewed
    Protocol captures synchronized multimodal meeting data

    AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research

    Meisam Jamshidi Seikavandi +8

  45. cs.AI 2026-05-19 reviewed
    LLMs optimize code via priors

    Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

    Dmitry Redko (1) +9

  46. cs.AI 2026-05-19 reviewed
    Data-driven rule picks best SGD-to-Muon geometry per layer

    From SGD to Muon: Adaptive Optimization via Schatten-p Norms

    Thomas Massena (IRIT +4

  47. cs.AI 2026-05-19 reviewed
    Conformal methods deliver distribution-free coverage for AI agent scores

    Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation

    Yuxuan Gao +2

  48. cs.AI 2026-05-19 reviewed
    Hard-coded verifiers beat LLM judges at matching human evaluations

    OpenComputer: Verifiable Software Worlds for Computer-Use Agents

    Jinbiao Wei +6

  49. cs.AI 2026-05-19 reviewed
    Variance-aware regret bound proven optimal for logistic MDPs

    Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

    Pierre Boudart (SIERRA) +4

  50. cs.LG 2026-05-19 reviewed
    Rank-1 queries keep ZO signals strong for high-rank LoRA

    AR1-ZO: Topology-Aware Rank-1 Zeroth-Order Queries for High-Rank LoRA Fine-Tuning

    Ziye Chen +5