pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 8

  1. cs.CV 2026-05-20 reviewed
    Ultrasound VQA model learns to zoom closer before diagnosing

    Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming

    Yue Zhou +7

  2. cs.AI 2026-05-20 reviewed
    EMOD 3.0 expands AOP-Wiki data model for AI and NAMs

    AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)

    Virginia K. Hench +4

  3. cs.HC 2026-05-20 reviewed
    Six elements close the human-AI synergy gap

    Addressing the Synergy Gap: The Six Elements of the Design Space

    Tommaso Turchi +4

  4. cs.AI 2026-05-20 reviewed
    Composing thought modes generates harder

    MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis

    Haiyang Shen +13

  5. cs.CY 2026-05-20 reviewed
    AI shortens math study time 27 percent but cuts retention odds 25 percent

    Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

    Sina Rismanchian +4

  6. cs.CV 2026-05-20 reviewed
    New benchmark shows LVLMs falter on furniture assembly videos

    Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

    Aditya Chetan +7

  7. cs.AI 2026-05-20 reviewed
    Holocaust testimony analysis finds overlaps between archives

    The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison

    Itamar Trainin +2

  8. cs.AI 2026-05-20 reviewed
    Multi-agent system yields preference-aligned designs in 60% of trials

    TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

    Isabella A. Stewart +2

  9. cs.CL 2026-05-20 reviewed
    Rewriting cuts unsafe LLM outputs for teen users

    CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

    Heajun An +3

  10. cs.LG 2026-05-20 reviewed
    Position weighting lifts AIME scores by over 1 point in distillation

    When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning

    Xiaogeng Liu +4

  11. cs.AI 2026-05-20 reviewed
    Hybrid OOD monitors lift LLM failure recall from 39 to 45 percent

    Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

    Dylan Feng +3

  12. cs.LG 2026-05-20 reviewed
    Amortized resampling yields 2-3x compute gains for diffusion teachers

    Variance Reduction for Expectations with Diffusion Teachers

    Jesse Bettencourt +4

  13. cs.LG 2026-05-20 reviewed
    Amortized noise sampling cuts diffusion teacher variance 10x

    Variance Reduction for Expectations with Diffusion Teachers

    Jesse Bettencourt +4

  14. cs.LG 2026-05-20 reviewed
    Embedding learning rate boost replicates muP transfer

    Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

    Dayal Singh Kalra +1

  15. cs.AI 2026-05-20 reviewed
    Derivation errors drive over 70% of failures on new AI benchmark

    DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

    Sixiong Xie +10

  16. cs.AI 2026-05-20 reviewed
    Platform lets humans and AIs co-author and iterate on papers

    AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists

    Junshu Pan +7

  17. cs.CV 2026-05-20 reviewed
    WikiVQABench tests VLMs on Wikipedia questions needing external knowledge

    WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

    Basel Shbita +2

  18. cs.LG 2026-05-20 reviewed
    JIT compilation speeds web agents by 10 times

    Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

    Caleb Winston +3

  19. cs.CL 2026-05-20 reviewed
    Separate model learns when to generate agent guidance

    Mem-$\pi$: Adaptive Memory through Learning When and What to Generate

    Xiaoqiang Wang +7

  20. cs.RO 2026-05-20 reviewed
    Diffusion assistance cuts teleop task times 40%

    HITL-D: Human In The Loop Diffusion Assisted Shared Control

    Riley Zilka +3

  21. cs.AI 2026-05-20 reviewed
    Randomization fixes simulator shift but reachability gaps persist

    Mind the Sim-to-Real Gap & Think Like a Scientist

    Harsh Parikh +3

  22. cs.SE 2026-05-20 reviewed
    AI refactoring PRs improve quality in 22.5% of cases

    Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

    Mohamed Almukhtar +2

  23. cs.LG 2026-05-20 reviewed
    Deeper networks approximate structured functions with fewer parameters

    Approximation Theory for Neural Networks: Old and New

    Soumendu Sundar Mukherjee +1

  24. cs.RO 2026-05-20 reviewed
    Reasoning changes flag 5.3x larger path errors under driving sensor noise

    Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

    Abhinaw Priyadershi +1

  25. cs.CV 2026-05-20 reviewed
    VLMs miss most time-based glitches in game videos

    TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos

    Yakun Yu +6

  26. cs.LG 2026-05-20 reviewed
    PyTorch library matches specialized tools in LLM tuning

    torchtune: PyTorch native post-training library

    Mark Obozov +10

  27. cs.AI 2026-05-20 reviewed
    Power caps cut LLM energy use by 26% while reducing QoS violations

    PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

    Can Hankendi +3

  28. cs.LG 2026-05-20 reviewed
    One embedding predicts conditions and retrieves precedents

    HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation

    Shreyas Vinaya Sathyanarayana +2

  29. cs.LG 2026-05-20 reviewed
    Gossip-based critic sharing lifts multi-cell OFDMA sum-rates in 6G

    FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

    Amin Farajzadeh +1

  30. cs.CV 2026-05-20 reviewed
    Top-n encoder selection lifts blended emotion accuracy

    Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

    Junghyun Lee +3

  31. cs.AI 2026-05-20 reviewed
    Student questions expose AI research limits at 17 percent pass rate

    Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

    Haiyang Shen +11

  32. cs.AI 2026-05-20 reviewed
    Student Questions Expose AI Research Systems at 17 Percent Pass Rate

    Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

    Haiyang Shen +11

  33. cs.SE 2026-05-20 reviewed
    Stdlib reimplementations match third-party Python library speeds

    Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries

    Peng Ding +1

  34. cs.CY 2026-05-20 reviewed
    LLMs reach max shock level in Milgram-style test

    Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment

    Roland Pihlakas +1

  35. cs.AI 2026-05-20 reviewed
    One foundation model to run all 6G tasks autonomously

    Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G

    Liang Wu +3

  36. cs.HC 2026-05-20 reviewed
    AI ghosts of the dead favor emotion over accuracy

    Designing Conversations with the Dead: How People Engage with Generative Ghosts

    Jack Manning +4

  37. cs.LG 2026-05-20 reviewed
    Transport maps to PDE measures are Hölder continuous

    On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

    Likun Lin +3

  38. cs.SE 2026-05-20 reviewed
    Agents pass visible tests but fail held-out usage tests as tasks lengthen

    SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

    Bingchen Zhao +3

  39. cs.NE 2026-05-20 reviewed
    XOR-and-shift over GF(2) meets Marcus's three cognitive pillars

    How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

    Hiroyuki Chuma +2

  40. cs.NE 2026-05-20 reviewed
    XOR-shift over GF(2) enables variable binding and recursive structure

    How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

    Hiroyuki Chuma +2

  41. cs.CV 2026-05-20 reviewed
    Simulation feedback picks best synthetic scenes for driving models

    Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

    Hongzhi Ruan +7

  42. cs.AI 2026-05-20 reviewed
    Multi-agent reports raise LLM scaffold performance by 30 points

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    Akshay Manglik +8

  43. cs.AI 2026-05-20 reviewed
    Multi-agent system turns full LLM traces into evidence-backed insights

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    Akshay Manglik +8

  44. cs.LG 2026-05-20 reviewed
    PDE residual selects training data to cut neural operator costs

    Data-Efficient Neural Operator Training via Physics-Based Active Learning

    Alicja Polanska +3

  45. cs.AI 2026-05-20 reviewed
    43M-paper graph gives AI agents deterministic cross-field links

    SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

    Shuofei Qiao +10

  46. cs.CL 2026-05-20 reviewed
    Spike-gated model reaches 89% sparsity at 8.9 perplexity

    SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

    Ting Liu

  47. cs.CL 2026-05-20 reviewed
    Regularization curbs prompt overfitting for better LLM generalization

    TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

    Lucheng Fu +6

  48. cs.DC 2026-05-20 reviewed
    Simulator predicts LLM serving latency with 6% error

    Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

    Yicheng Feng +5

  49. cs.LG 2026-05-20 reviewed
    RL cuts pedestrian waits 79% via better crosswalks and signals

    DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning

    Bibek Poudel +4

  50. cs.CV 2026-05-20 reviewed
    Adaptive fusion gives linear SSMs flexible vision and 3D fusion

    Deformba: Vision State Space Model with Adaptive State Fusion

    Hongyu Ke +6