archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 18
-
Multi-task training builds balanced multimodal model
Lance: Unified Multimodal Modeling by Multi-Task Synergy
-
Lance beats prior open models at image and video generation
Lance: Unified Multimodal Modeling by Multi-Task Synergy
-
Cyclic method boosts RL sample efficiency over online baselines
COOPO: Cyclic Offline-Online Policy Optimization Algorithm
-
Holistic encoding scales general planning policies to thousands of objects
Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning
-
LLM agents need three separate safety layers
Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment
-
Config choices rival model selection on GIM benchmark
GIM: Evaluating models via tasks that integrate multiple cognitive domains
-
AI automates research but struggles with novelty and judgment
AI for Auto-Research: Roadmap & User Guide
-
Dual-memory model lifts time series classification accuracy
KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture
-
FedNewton matches SGD accuracy with fewer rounds under privacy
Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning
-
Distilled trees retain 96.5% of TFM accuracy at 1.9 ms CPU speed
Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees
-
Human soft labels improve calibration and training stability
An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration
-
Trained MoE models skip over half their experts after adaptation
Post-Trained MoE Can Skip Half Experts via Self-Distillation
-
Context resampling beats TFM choice in credit risk
Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models
-
Generative models in weight space match fine-tuning performance
Position: Weight Space Should Be a First-Class Generative AI Modality
-
Benchmark finds LLMs clarify only 52.7% of fluid mechanics cases
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science
-
Partial traces recover lifted STRIPS+ domains
Learning Lifted Action Models from Traces with Minimal Information About Actions and States
-
Cross-view data and explicit alignment advance MLLM spatial reasoning
CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
-
SPBM adds constraints to deep learning with linear overhead
Stochastic Penalty-Barrier Methods for Constrained Machine Learning
-
ManiSoft benchmark tests vision-language control on soft robotic arms
ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics
-
Music autoencoder compresses audio 4096 times with quality intact
SAME: A Semantically-Aligned Music Autoencoder
-
Sign-aware aggregation sustains unlearning across sequential VLM requests
CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic
-
Latent actions shorten LLM agent decision horizons
Latent Action Reparameterization for Efficient Agent Inference
-
Typographic attacks make robots grab the wrong objects
Not What You Asked For: Typographic Attacks in Household Robot Manipulation
-
Memory of past evaluations improves rubric updates for RL
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning
-
Randomized iterations turn natural policy gradients into direct backprop
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
-
Stripping consent declarations raises overeager rate in coding agents
Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks
-
Revenue targets mask pricing discipline failures
When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State
-
Query context ranks medical entities across systems
Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning
-
Memory systems score 27.9% under fact interference in long contexts
MINTEval: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems
-
q-log odds lift BM25 NDCG@10 by 89% on code search
Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix
-
Key-Gram memory boosts robot manipulation performance
Key-Gram: Extensible World Knowledge for Embodied Manipulation
-
Quality signals steer flow matching to fix occluded hands in video
StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video
-
Frontier LLMs score under 40% on dynamic tool-use benchmark
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics
-
Tuning-free VLM steers focus to active speaker for emotion recognition
VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation
-
Manifold probe reveals how models encode time and space
Probing for Representation Manifolds in Superposition
-
Continuous diffusion scales to 20x compute gap of autoregressive models
Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
-
Self-generated hints fix token credit in LLM reinforcement learning
AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment
-
Color features alone classify cancer at up to 89% accuracy
Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification
-
DiPRL trains nearly discrete programmatic policies in RL
DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization
-
LLM outputs hypergraphs to generate editable floor plans
HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation
-
DBES metrics select expert paths for up to 94% domain gains at 15% cost
DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs
-
Morphology drives biological signal classification over model type
Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals
-
Concept removal measures causal roles in black-box vision classifiers
OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models
-
LLM generates MCMC samplers from natural language descriptions
AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers
-
One post-training run supports any bit budget for LLM quantization
GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets
-
Generator turns text prompts into LLM fingerprints in one pass
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
-
Flow models gain per-sample confidence at standard sampling cost
Flowing with Confidence
-
Firefly algorithm auto-clusters data without preset count
When Fireflies Cluster; Enhancing Automatic Clustering via Centroid-Guided Firefly Optimization
-
Markov Chain Decoders Fix Heavy-Tail Limits in VAEs
Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models
-
Readable programs match deep RL on job scheduling benchmarks
Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework