archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 1
-
Optimizer model improves agent skills only via validation-raising text edits
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
-
Shannon capacity produces U-shaped LLM scaling curves
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
-
Model-generated agent skills help on average but trigger negative transfer
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
-
VLMs fail to ground numbers in spatial layouts
SPACENUM: Revisiting Spatial Numerical Understanding in VLMs
-
Dedicated image editor lifts multimodal reasoning by 5 points
ETCHR: Editing To Clarify and Harness Reasoning
-
Token selection speeds geometry transformers over 85 percent
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
-
CHRONOS unifies index decay, pricing and privacy in data markets
CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces
-
Geometric overlays on images lift MLLM spatial scores by 20%
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
-
Persuasive LLM explanations do not raise decision accuracy
Human Decision-Making with Persuasive and Narrative LLM Explanations
-
Foundation models support zero-shot causal image reasoning
Leveraging Foundation Models for Causal Generative Modeling
-
Post-training, not pre-training data, creates LLM geopolitical bias
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt
-
Vision models match humans best at balanced generative-discriminative mix
Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot
-
Adversarial alignment generalizes multimodal knowledge edits
Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment
-
Claude agent verifies programs at 98 percent success rate
Agentic Proving for Program Verification
-
Agent beats baselines at text-guided 3D photo search
PhotoFlow: Agentic 3D Virtual Photography Missions
-
Any2Any moves tracking models to new robots at 1% cost
Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking
-
MemAudit cuts memory poisoning success to zero after attacks
MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection
-
Recursive memory predicts next queries with 22x fewer tokens
OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations
-
Adaptive search fixes blind spots in high-res image perception for LLMs
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
-
One shared RL policy controls thousands of NPCs with distinct personas
One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents
-
Compatible output heads let students learn from noise
Learning Through Noise: Why Subliminal Learning Works and When It Fails
-
Temporal gaps weaken Android malware model defenses
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
-
Entity patches in memory fix consistency in multi-shot videos
EM-Vid: Training-Free Entity-Centric Memory for Efficient and Consistent Multi-Shot Video Generation
-
Latent space lets diffusion language models sample faster with better quality
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
-
Hysteretic attention reaches Turing completeness in constant depth
Preisach Attention: A Hysteretic Model of Sequential Memory
-
Meta-learning yields model performance scores on unlabeled data
Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning
-
Models schedule up to 1450 aircraft disassembly tasks
Solving the Aircraft Disassembly Scheduling Problem
-
Rubrics guide ReAct agents at each step for better search trajectories
Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
-
Three-phase recipe keeps 98% precision in 190M retrieval models
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
-
Hybrid DP-CP solves partial shop scheduling with flexible precedences
CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem
-
Latent policy gradients forecast RL goal generalization
Understanding Goal Generalisation in Sequential Reinforcement Learning
-
ARMS learns shaping rewards in MARL without altering Nash equilibria
ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning
-
PathNavigate scans slides for surprises before matching the question
PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA
-
One network pass trains an agent on every goal at once
Goal-Conditioned Agents that Learn Everything All at Once
-
Randomized screening yields directional stationarity in max-DC programs
RA-DCA: A Randomized Active-Set DCA for Directional Stationarity in Max-Structured DC Programs
-
New sampler cuts RL training time for flow models by up to 53%
Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models
-
Sketches control long video generation via independent shots
DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
-
Velocity consistency shapes embeddings for top time series anomaly detection
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection
-
Guided rollouts and masks fix distillation of target identities
EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation
-
Self-generated tests and code co-evolve to match RLVR results
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
-
MDM distills vision-language datasets into compact synthetic sets
Multimodal Distribution Matching for Vision-Language Dataset Distillation
-
One model forecasts yields for many crops by learning their weather responses
PhenoYieldNet: Learning Crop-Aware Phenological Responses for Multi-Crop Yield Prediction
-
DSEBO switches subspace dimension on convergence
Automated Random Embedding for Practical Bayesian Optimization with Unknown Effective Dimension
-
CBANet raises minority recall in aggressive driving detection
CBANet: A Compact Attention-Based CNN-BiLSTM Network for Aggressive Driving Event Detection
-
Static contexts make individual dynamics identifiable from single snapshots
Learning Individual Dynamics from Sparse Cross-Sectional Snapshots
-
Enterprise AI needs risk reduction testing
AI Assurance: A Comprehensive Testing Strategy for Enterprise AI Systems
-
One-Forcing scores 83.76 on VBench for one-step video
One-Forcing: Towards Stable One-Step Autoregressive Video Generation
-
AI security papers favor attacks over defenses via uneven evaluations
AI Security Research Should Better Incentivize Defense Research
-
SSDAU cuts ambiguity F1 drop in joint extraction from 32% to 8%
SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction
-
Humans identify AI teammates at chance levels in group chats
Socially fluent AI decouples conversational signals from source identity in online interaction