archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 8

cs.CV 2026-05-20 reviewed

Ultrasound VQA model learns to zoom closer before diagnosing
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming

Yue Zhou +7
cs.AI 2026-05-20 reviewed

EMOD 3.0 expands AOP-Wiki data model for AI and NAMs
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)

Virginia K. Hench +4
cs.HC 2026-05-20 reviewed

Six elements close the human-AI synergy gap
Addressing the Synergy Gap: The Six Elements of the Design Space

Tommaso Turchi +4
cs.AI 2026-05-20 reviewed

Composing thought modes generates harder
MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis

Haiyang Shen +13
cs.CY 2026-05-20 reviewed

AI shortens math study time 27 percent but cuts retention odds 25 percent
Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

Sina Rismanchian +4
cs.CV 2026-05-20 reviewed

New benchmark shows LVLMs falter on furniture assembly videos
Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

Aditya Chetan +7
cs.AI 2026-05-20 reviewed

Holocaust testimony analysis finds overlaps between archives
The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison

Itamar Trainin +2
cs.AI 2026-05-20 reviewed

Multi-agent system yields preference-aligned designs in 60% of trials
TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

Isabella A. Stewart +2
cs.CL 2026-05-20 reviewed

Rewriting cuts unsafe LLM outputs for teen users
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

Heajun An +3
cs.LG 2026-05-20 reviewed

Position weighting lifts AIME scores by over 1 point in distillation
When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning

Xiaogeng Liu +4
cs.AI 2026-05-20 reviewed

Hybrid OOD monitors lift LLM failure recall from 39 to 45 percent
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

Dylan Feng +3
cs.LG 2026-05-20 reviewed

Amortized resampling yields 2-3x compute gains for diffusion teachers
Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt +4
cs.LG 2026-05-20 reviewed

Amortized noise sampling cuts diffusion teacher variance 10x
Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt +4
cs.LG 2026-05-20 reviewed

Embedding learning rate boost replicates muP transfer
Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

Dayal Singh Kalra +1
cs.AI 2026-05-20 reviewed

Derivation errors drive over 70% of failures on new AI benchmark
DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

Sixiong Xie +10
cs.AI 2026-05-20 reviewed

Platform lets humans and AIs co-author and iterate on papers
AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists

Junshu Pan +7
cs.CV 2026-05-20 reviewed

WikiVQABench tests VLMs on Wikipedia questions needing external knowledge
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Basel Shbita +2
cs.LG 2026-05-20 reviewed

JIT compilation speeds web agents by 10 times
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Caleb Winston +3
cs.CL 2026-05-20 reviewed

Separate model learns when to generate agent guidance
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate

Xiaoqiang Wang +7
cs.RO 2026-05-20 reviewed

Diffusion assistance cuts teleop task times 40%
HITL-D: Human In The Loop Diffusion Assisted Shared Control

Riley Zilka +3
cs.AI 2026-05-20 reviewed

Randomization fixes simulator shift but reachability gaps persist
Mind the Sim-to-Real Gap & Think Like a Scientist

Harsh Parikh +3
cs.SE 2026-05-20 reviewed

AI refactoring PRs improve quality in 22.5% of cases
Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

Mohamed Almukhtar +2
cs.LG 2026-05-20 reviewed

Deeper networks approximate structured functions with fewer parameters
Approximation Theory for Neural Networks: Old and New

Soumendu Sundar Mukherjee +1
cs.RO 2026-05-20 reviewed

Reasoning changes flag 5.3x larger path errors under driving sensor noise
Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

Abhinaw Priyadershi +1
cs.CV 2026-05-20 reviewed

VLMs miss most time-based glitches in game videos
TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos

Yakun Yu +6
cs.LG 2026-05-20 reviewed

PyTorch library matches specialized tools in LLM tuning
torchtune: PyTorch native post-training library

Mark Obozov +10
cs.AI 2026-05-20 reviewed

Power caps cut LLM energy use by 26% while reducing QoS violations
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

Can Hankendi +3
cs.LG 2026-05-20 reviewed

One embedding predicts conditions and retrieves precedents
HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation

Shreyas Vinaya Sathyanarayana +2
cs.LG 2026-05-20 reviewed

Gossip-based critic sharing lifts multi-cell OFDMA sum-rates in 6G
FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

Amin Farajzadeh +1
cs.CV 2026-05-20 reviewed

Top-n encoder selection lifts blended emotion accuracy
Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

Junghyun Lee +3
cs.AI 2026-05-20 reviewed

Student questions expose AI research limits at 17 percent pass rate
Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Haiyang Shen +11
cs.AI 2026-05-20 reviewed

Student Questions Expose AI Research Systems at 17 Percent Pass Rate
Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Haiyang Shen +11
cs.SE 2026-05-20 reviewed

Stdlib reimplementations match third-party Python library speeds
Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries

Peng Ding +1
cs.CY 2026-05-20 reviewed

LLMs reach max shock level in Milgram-style test
Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment

Roland Pihlakas +1
cs.AI 2026-05-20 reviewed

One foundation model to run all 6G tasks autonomously
Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G

Liang Wu +3
cs.HC 2026-05-20 reviewed

AI ghosts of the dead favor emotion over accuracy
Designing Conversations with the Dead: How People Engage with Generative Ghosts

Jack Manning +4
cs.LG 2026-05-20 reviewed

Transport maps to PDE measures are Hölder continuous
On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

Likun Lin +3
cs.SE 2026-05-20 reviewed

Agents pass visible tests but fail held-out usage tests as tasks lengthen
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

Bingchen Zhao +3
cs.NE 2026-05-20 reviewed

XOR-and-shift over GF(2) meets Marcus's three cognitive pillars
How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

Hiroyuki Chuma +2
cs.NE 2026-05-20 reviewed

XOR-shift over GF(2) enables variable binding and recursive structure
How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

Hiroyuki Chuma +2
cs.CV 2026-05-20 reviewed

Simulation feedback picks best synthetic scenes for driving models
Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

Hongzhi Ruan +7
cs.AI 2026-05-20 reviewed

Multi-agent reports raise LLM scaffold performance by 30 points
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Akshay Manglik +8
cs.AI 2026-05-20 reviewed

Multi-agent system turns full LLM traces into evidence-backed insights
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Akshay Manglik +8
cs.LG 2026-05-20 reviewed

PDE residual selects training data to cut neural operator costs
Data-Efficient Neural Operator Training via Physics-Based Active Learning

Alicja Polanska +3
cs.AI 2026-05-20 reviewed

43M-paper graph gives AI agents deterministic cross-field links
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Shuofei Qiao +10
cs.CL 2026-05-20 reviewed

Spike-gated model reaches 89% sparsity at 8.9 perplexity
SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

Ting Liu
cs.CL 2026-05-20 reviewed

Regularization curbs prompt overfitting for better LLM generalization
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

Lucheng Fu +6
cs.DC 2026-05-20 reviewed

Simulator predicts LLM serving latency with 6% error
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Yicheng Feng +5
cs.LG 2026-05-20 reviewed

RL cuts pedestrian waits 79% via better crosswalks and signals
DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning

Bibek Poudel +4
cs.CV 2026-05-20 reviewed

Adaptive fusion gives linear SSMs flexible vision and 3D fusion
Deformba: Vision State Space Model with Adaptive State Fusion

Hongyu Ke +6