archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 8
-
Ultrasound VQA model learns to zoom closer before diagnosing
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
-
EMOD 3.0 expands AOP-Wiki data model for AI and NAMs
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)
-
Six elements close the human-AI synergy gap
Addressing the Synergy Gap: The Six Elements of the Design Space
-
Composing thought modes generates harder
MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis
-
AI shortens math study time 27 percent but cuts retention odds 25 percent
Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build
-
New benchmark shows LVLMs falter on furniture assembly videos
Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly
-
Holocaust testimony analysis finds overlaps between archives
The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison
-
Multi-agent system yields preference-aligned designs in 60% of trials
TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization
-
Rewriting cuts unsafe LLM outputs for teen users
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
-
Position weighting lifts AIME scores by over 1 point in distillation
When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning
-
Hybrid OOD monitors lift LLM failure recall from 39 to 45 percent
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs
-
Amortized resampling yields 2-3x compute gains for diffusion teachers
Variance Reduction for Expectations with Diffusion Teachers
-
Amortized noise sampling cuts diffusion teacher variance 10x
Variance Reduction for Expectations with Diffusion Teachers
-
Embedding learning rate boost replicates muP transfer
Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
-
Derivation errors drive over 70% of failures on new AI benchmark
DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation
-
Platform lets humans and AIs co-author and iterate on papers
AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists
-
WikiVQABench tests VLMs on Wikipedia questions needing external knowledge
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
-
JIT compilation speeds web agents by 10 times
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
-
Separate model learns when to generate agent guidance
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate
-
Diffusion assistance cuts teleop task times 40%
HITL-D: Human In The Loop Diffusion Assisted Shared Control
-
Randomization fixes simulator shift but reachability gaps persist
Mind the Sim-to-Real Gap & Think Like a Scientist
-
AI refactoring PRs improve quality in 22.5% of cases
Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
-
Deeper networks approximate structured functions with fewer parameters
Approximation Theory for Neural Networks: Old and New
-
Reasoning changes flag 5.3x larger path errors under driving sensor noise
Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
-
VLMs miss most time-based glitches in game videos
TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos
-
PyTorch library matches specialized tools in LLM tuning
torchtune: PyTorch native post-training library
-
Power caps cut LLM energy use by 26% while reducing QoS violations
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
-
One embedding predicts conditions and retrieves precedents
HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation
-
Gossip-based critic sharing lifts multi-cell OFDMA sum-rates in 6G
FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G
-
Top-n encoder selection lifts blended emotion accuracy
Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition
-
Student questions expose AI research limits at 17 percent pass rate
Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work
-
Student Questions Expose AI Research Systems at 17 Percent Pass Rate
Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work
-
Stdlib reimplementations match third-party Python library speeds
Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries
-
LLMs reach max shock level in Milgram-style test
Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment
-
One foundation model to run all 6G tasks autonomously
Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G
-
AI ghosts of the dead favor emotion over accuracy
Designing Conversations with the Dead: How People Engage with Generative Ghosts
-
Transport maps to PDE measures are Hölder continuous
On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures
-
Agents pass visible tests but fail held-out usage tests as tasks lengthen
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
-
XOR-and-shift over GF(2) meets Marcus's three cognitive pillars
How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields
-
XOR-shift over GF(2) enables variable binding and recursive structure
How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields
-
Simulation feedback picks best synthetic scenes for driving models
Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
-
Multi-agent reports raise LLM scaffold performance by 30 points
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents
-
Multi-agent system turns full LLM traces into evidence-backed insights
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents
-
PDE residual selects training data to cut neural operator costs
Data-Efficient Neural Operator Training via Physics-Based Active Learning
-
43M-paper graph gives AI agents deterministic cross-field links
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research
-
Spike-gated model reaches 89% sparsity at 8.9 perplexity
SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence
-
Regularization curbs prompt overfitting for better LLM generalization
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
-
Simulator predicts LLM serving latency with 6% error
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
-
RL cuts pedestrian waits 79% via better crosswalks and signals
DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning
-
Adaptive fusion gives linear SSMs flexible vision and 3D fusion
Deformba: Vision State Space Model with Adaptive State Fusion