archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 12
-
AgentCo-op links existing agents into genomics workflows without redesign
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
-
RL method raises ToM accuracy from 0.2% to 76% on asymmetric tasks
OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
-
CoT prompting leaves gender bias inside LLMs
Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs
-
This paper tests episodic sampling to build class-balanced batches for CT body…
Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation
-
MXFP4 error splits into three parts each fixing a different RL failure
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
-
MXFP4 error splits into three parts for targeted RL fixes
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
-
Bigger 3D models trained on 50M driving scenes top Waymo leaderboard
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
-
Integral operators gain from longer windows in fMRI tasks
Nonlocal operator learning for fMRI encoding and decoding tasks
-
Meta-RL extracts rules to segment concepts at any reasoning level
ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning
-
LLMs switch from instructions to patterns when history conflicts
Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs
-
Human videos scale humanoid loco-manipulation without custom rewards
SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
-
Distortion in latent space guides better sampling for missing modalities
Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities
-
DEL raises LLM number prediction accuracy on math benchmarks
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
-
Local model classifies security documents at 95 percent accuracy
Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System
-
Per-sample temperatures make teacher soft labels consistent
Consistently Informative Soft-Label Temperature for Knowledge Distillation
-
AI dialogue models sync states and predict turns ahead
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
-
Memory lets RL agents beat competitive benchmarks in trade execution
Memory-Induced Supra-Competitive Outcomes Between Deep Reinforcement Learning Agents in Optimal Trade Execution
-
Krylov approximation unlearns data 48x faster than retraining
Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions
-
Target-SAT triples solvable size for hardest random 3-SAT
Targeting Clause Type Distributions: a Picklock for Random Satisfiability Problems
-
NN variational 2-RDM reaches 0.1 meV below exact energy for Chern insulator
Representability-Aware Neural Networks for Reduced Density Matrices: Application to Fractional Chern Insulators
-
LoRA upgrade turns text-to-image flows bidirectional
FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation
-
EEG microstates from one clustering step outperform traditional features on multiple tasks
Atoms of Thought: Universal EEG Representation Learning with Microstates
-
Four-part SDB contract organizes LLM agent runtimes
A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
-
ASP Automates Long-Term Power Grid Planning
Long-term Power Grid Planning via Answer Set Programming
-
ML ensemble forecasts haor floods 72 hours ahead with 89.6% accuracy
HaorFloodAlert: Deseasonalized ML Ensemble for 72-Hour Flood Prediction in Bangladesh Haor Wetlands
-
Adapting rubric weights speeds RL training by up to 4x
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
-
Counterfactual tests expose failures in LVLM attribution for chest X-rays
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models
-
Checklist prompts score 7.5 out of 8 on LLM quality rubric
Less Back-and-Forth: A Comparative Study of Structured Prompting
-
Repeating smaller datasets speeds up training
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
-
Recovery profiles reveal brain dimensions models miss despite high accuracy
Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment
-
AI verifies local lemmas for Grasshopper problem but leaves global count unresolved
Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem
-
Single recipe scales time series models from 4M to 2.5B parameters
Toto 2.0: Time Series Forecasting Enters the Scaling Era
-
Single trajectory yields neural k-inductive barriers for unknown dynamics
k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics
-
AutoML for health risk prediction reduces to few key components
A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction
-
No fixed marginal covariance is safe for all geometries in JEPAs
Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction
-
Pruning plus retrieval yields up to 5.41× speculative decoding speedups
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding
-
Argumentation rules turn LLM outputs into faithful ternary claim verdicts
Neurosymbolic Learning for Inference-Time Argumentation
-
Per-instance shapelets beat population averages on time-series tasks
INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification
-
Dataset pairs LLM chats with users' reported thoughts
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
5 Piths -
Thoughts collected with LLM chats improve behavior forecasts
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
5 Piths -
Evolutionary code agents gain by recycling deleted lines
What Do Evolutionary Coding Agents Evolve?
-
Joint lattice testing calibrates cascaded RAG thresholds at target risk
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation
-
VLM-guided DPO lifts driving model human alignment by 12%
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
-
Adaptive Manifold Guidance conserves probability during strong guidance
Probability-Conserving Flow Guidance
-
Draft answer first then reflect to gain 23% accuracy with 57% fewer tokens
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning
-
Small tables bind new visual concepts to word triggers
Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision
-
Moderate noise raises LLM agent success 2.85-fold on puzzle task
Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving
-
Staged analysis improves LLM recovery of ROS 2 architectures
Towards LLM-Assisted Architecture Recovery for Real-World ROS~2 Systems: An Agent-Based Multi-Level Approach to Hierarchical Structural Architecture Reconstruction
-
SDM improves adversarial attack performance and efficiency by reconstructing the…
SDM: A Powerful Tool for Evaluating Model Robustness
-
Prompt tuning labels radiology reports with 32 examples
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling