archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 7
-
Medical world model cuts kidney disease forecast error by 7%
ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data
-
AI gives serious games real-time adaptive training
AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems
-
MLLMs spot correct video timing in prefill but forget during answers
MLLMs Know When Before Speaking: Revealing and Recovering Temporal Grounding via Attention Cues
-
Irreversibility equates four measures and picks low-entropy paths
Thermodynamic Irreversibility of Training Algorithms
-
CausalGuard weights candidate graphs for covered causal effect estimates
CausalGuard: Conformal Inference under Graph Uncertainty
-
VLMs favor SDG priors over evidence on 550k-task benchmark
SDGBiasBench: Benchmarking and Mitigating Vision--Language Models' Biases in Sustainable Development Goals
-
MAVEN pipeline annotates 5300 videos so 8B VLM beats Gemini on CCTV reasoning
MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks
-
Physics laws inside neural nets speed up power-grid modeling
Engineering Hybrid Physics-Informed Neural Networks for Next-Generation Electricity Systems: A State-of-the-Art Review
-
LLMs now build planners instead of one-off plans
Planning in the LLM Era: Building for Reliability and Efficiency
-
7B model beats larger ones at Lean proof optimization
ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization
-
Staged fusion of text audio vision reaches 0.47 emotion correlation
Two-Stage Multimodal Framework for Emotion Mimicry Intensity Prediction
-
Action-updated scene prior lifts robot task success
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
-
Modular experts resolve gradient conflicts in multi-modal medical pretraining
Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models
-
Truncating CoT exposes evasive contamination in LLMs
The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation
-
DoRA raises VLA success rates by 10.4 points over SFT
CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models
-
Accumulating oracle signals yields token-level advantages in one pass
OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning
-
Accumulating oracle signals yields token-level advantages for LLMs
OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning
-
Agent trajectories compiled into QA pairs improve long-context performance
ACC: Compiling Agent Trajectories for Long-Context Training
-
LLMs beat fine-tuned models on rare suicide circumstances
Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity
-
Tensor Cache stores evicted tokens in outer-product memory
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
-
PET/CT model matches full segmentation accuracy with 10% labels
An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation
-
Multimodal codes replace IDs in livestream recs
FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation
-
LLMs reduce ten intensity words to five numeric values
Does Slightly Mean Somewhat? Measuring Vague Intensity Words in LLM Numeric Actions
-
AI agents autonomously build custom visualization apps from data
Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks
-
Crowd preferences yield reusable safety skills for RL tasks
Implicit Safety Alignment from Crowd Preferences
-
Evolved skills from traces solve hard Verilog tasks
Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents
-
Agentic AI uses 4.33x more energy per successful goal than linear baselines
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
-
DivSkill-SQL lifts Text-to-SQL accuracy by up to 11 points
Residual Skill Optimization for Text-to-SQL Ensembles
-
Patch attention model tags LHC jets accurately under tight budgets
Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging
-
Experts disagree on which AI behaviors count as sycophancy
What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct
-
Trust drives acceptance of collaborative decision tech in pediatrics
Understanding Perspectives of Patients, Caregivers and Clinicians towards Emerging Collaborative-decision Making Technologies
-
Causal links turned into arguments explain ML predictions
A Causal Argumentation Method for Explainability of Machine Learning Models
-
Pairwise comparisons yield unbiased preference percentiles
PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation
-
Platform choice alters AI employment impact estimates by factor of 1.9
Who Uses AI? Platform Selection and the Measurement of Occupational AI Exposure
-
Best LLM solves only 40% of drug design tasks
SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?
-
LLM emotional skills prove independent in real chats
AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence
-
Support-aware method certifies ad reserve policies from logs
Support-aware offline policy selection for advertising marketplaces
-
Bayes rule gives LLMs token-by-token attribution scores
Probabilistic Attribution For Large Language Models
-
Exact doubly stochastic mixes via transportation polytopes
TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes
-
GNN approximates altruistic robot transfers for scaling teams
Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems
-
Pushing past refusal boundary boosts jailbreak success
Latent-space Attacks for Refusal Evasion in Language Models
-
Heavy AI use weakens reasoning skills after help ends
The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning
-
Typed boundaries make LLM defense measurable and attributable
PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents
-
AI models classify words as vehicles and vegetables as fruit
Investigating Concept Alignment Using Implausible Category Members
-
Open-source LLMs lean left on politics
How Far Will They Go? Red-Teaming Online Influence with Large Language Models
-
AI turns T1 scans into motion-free high-res MRIs
MRecover: A Conditional Generative Model for Recovering Motion-Corrupted MR images Using AI Generated Contrast
-
EV charging models face fidelity tradeoffs across three layers
Planning, Scheduling, and Behavior in EV Charging Systems: A Critical Survey and Trilemma Framework
-
Stochastic policy amortizes diffusion guidance for 5x faster sampling
Hierarchical Variational Policies for Reward-Guided Diffusion
-
Actor updates match value gradients under differentiable rollouts
Value-Gradient Hypothesis of RL for LLMs
-
Fine-tuned detectors amplify a pretrained typicality axis
Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction