archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 18

cs.CV 2026-05-18 reviewed

Multi-task training builds balanced multimodal model
Lance: Unified Multimodal Modeling by Multi-Task Synergy

Fengyi Fu +12
cs.CV 2026-05-18 reviewed

Lance beats prior open models at image and video generation
Lance: Unified Multimodal Modeling by Multi-Task Synergy

Fengyi Fu +12
cs.LG 2026-05-18 reviewed

Cyclic method boosts RL sample efficiency over online baselines
COOPO: Cyclic Offline-Online Policy Optimization Algorithm

Qisai Liu +5
cs.AI 2026-05-18 reviewed

Holistic encoding scales general planning policies to thousands of objects
Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning

Michael Aichm\"uller +3
cs.AI 2026-05-18 reviewed

LLM agents need three separate safety layers
Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

S.Bensalem +8
cs.AI 2026-05-18 reviewed

Config choices rival model selection on GIM benchmark
GIM: Evaluating models via tasks that integrate multiple cognitive domains

Rohit Patel +2
cs.AI 2026-05-18 reviewed

AI automates research but struggles with novelty and judgment
AI for Auto-Research: Roadmap & User Guide

Lingdong Kong +19
cs.LG 2026-05-18 reviewed

Dual-memory model lifts time series classification accuracy
KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

Luis Balderas +4
stat.ML 2026-05-18 reviewed

FedNewton matches SGD accuracy with fewer rounds under privacy
Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

Arnab Auddy +2
cs.LG 2026-05-18 reviewed

Distilled trees retain 96.5% of TFM accuracy at 1.9 ms CPU speed
Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

Aditya Tanna +4
cs.LG 2026-05-18 reviewed

Human soft labels improve calibration and training stability
An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

Maja Pavlovic +2
cs.LG 2026-05-18 reviewed

Trained MoE models skip over half their experts after adaptation
Post-Trained MoE Can Skip Half Experts via Self-Distillation

Xingtai Lv +14
cs.LG 2026-05-18 reviewed

Context resampling beats TFM choice in credit risk
Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models

Aditya Tanna +5
cs.LG 2026-05-18 reviewed

Generative models in weight space match fine-tuning performance
Position: Weight Space Should Be a First-Class Generative AI Modality

Zhangyang Wang +2
cs.AI 2026-05-18 reviewed

Benchmark finds LLMs clarify only 52.7% of fluid mechanics cases
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Nithin Somasekharan +7
cs.AI 2026-05-18 reviewed

Partial traces recover lifted STRIPS+ domains
Learning Lifted Action Models from Traces with Minimal Information About Actions and States

Jonas G\"osgens +2
cs.CV 2026-05-18 reviewed

Cross-view data and explicit alignment advance MLLM spatial reasoning
CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

Wei Wang +6
cs.LG 2026-05-18 reviewed

SPBM adds constraints to deep learning with linear overhead
Stochastic Penalty-Barrier Methods for Constrained Machine Learning

Adam Bos\'ak +4
cs.RO 2026-05-18 reviewed

ManiSoft benchmark tests vision-language control on soft robotic arms
ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics

Ziyu Wei +4
cs.SD 2026-05-18 reviewed

Music autoencoder compresses audio 4096 times with quality intact
SAME: A Semantically-Aligned Music Autoencoder

Julian D. Parker +6
cs.CV 2026-05-18 reviewed

Sign-aware aggregation sustains unlearning across sequential VLM requests
CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

Shen Lin +5
cs.AI 2026-05-18 reviewed

Latent actions shorten LLM agent decision horizons
Latent Action Reparameterization for Efficient Agent Inference

Wenhao Huang +13
cs.CR 2026-05-18 reviewed

Typographic attacks make robots grab the wrong objects
Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Ali Iranmanesh +1
cs.LG 2026-05-18 reviewed

Memory of past evaluations improves rubric updates for RL
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

Peilin Wu +6
cs.LG 2026-05-18 reviewed

Randomized iterations turn natural policy gradients into direct backprop
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

Mingfei Sun
cs.SE 2026-05-18 reviewed

Stripping consent declarations raises overeager rate in coding agents
Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Yubin Qu +6
cs.AI 2026-05-18 reviewed

Revenue targets mask pricing discipline failures
When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

Peiying Zhu +1
cs.AI 2026-05-18 reviewed

Query context ranks medical entities across systems
Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning

Yan Jiao +3
cs.CL 2026-05-18 reviewed

Memory systems score 27.9% under fact interference in long contexts
MINTEval: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

Hyunji Lee +5
cs.IR 2026-05-18 reviewed

q-log odds lift BM25 NDCG@10 by 89% on code search
Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

Santosh Kumar Radha +1
cs.RO 2026-05-18 reviewed

Key-Gram memory boosts robot manipulation performance
Key-Gram: Extensible World Knowledge for Embodied Manipulation

Jingjing Fan +3
cs.CV 2026-05-18 reviewed

Quality signals steer flow matching to fix occluded hands in video
StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video

Huajian Zeng +5
cs.CL 2026-05-18 reviewed

Frontier LLMs score under 40% on dynamic tool-use benchmark
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics

Tingfeng Hui +7
cs.AI 2026-05-18 reviewed

Tuning-free VLM steers focus to active speaker for emotion recognition
VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

Linan ZHU +6
cs.LG 2026-05-18 reviewed

Manifold probe reveals how models encode time and space
Probing for Representation Manifolds in Superposition

Alexander Modell
cs.CL 2026-05-18 reviewed

Continuous diffusion scales to 20x compute gap of autoregressive models
Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Zhihan Yang +7
cs.AI 2026-05-18 reviewed

Self-generated hints fix token credit in LLM reinforcement learning
AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Zhenlin Wei +8
cs.CV 2026-05-18 reviewed

Color features alone classify cancer at up to 89% accuracy
Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

Farnaz Kheiri +2
cs.LG 2026-05-18 reviewed

DiPRL trains nearly discrete programmatic policies in RL
DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

Chengpeng Hu +2
cs.LG 2026-05-18 reviewed

LLM outputs hypergraphs to generate editable floor plans
HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

Nikita Klimenko +4
cs.LG 2026-05-18 reviewed

DBES metrics select expert paths for up to 94% domain gains at 15% cost
DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

Jing Wang +4
cs.LG 2026-05-18 reviewed

Morphology drives biological signal classification over model type
Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals

Jordan Tschida +12
cs.AI 2026-05-18 reviewed

Concept removal measures causal roles in black-box vision classifiers
OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

Chiara Maria Russo +5
stat.CO 2026-05-18 reviewed

LLM generates MCMC samplers from natural language descriptions
AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

Jungang Zou +2
cs.LG 2026-05-18 reviewed

One post-training run supports any bit budget for LLM quantization
GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

Zhangyang Yao +5
cs.CR 2026-05-18 reviewed

Generator turns text prompts into LLM fingerprints in one pass
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Sixu Chen +7
stat.ML 2026-05-18 reviewed

Flow models gain per-sample confidence at standard sampling cost
Flowing with Confidence

Friso de Kruiff +3
cs.AI 2026-05-18 reviewed

Firefly algorithm auto-clusters data without preset count
When Fireflies Cluster; Enhancing Automatic Clustering via Centroid-Guided Firefly Optimization

MKA Ariyaratne +3
stat.ML 2026-05-18 reviewed

Markov Chain Decoders Fix Heavy-Tail Limits in VAEs
Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

Abdelhakim Ziani +2
cs.LG 2026-05-18 reviewed

Readable programs match deep RL on job scheduling benchmarks
Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

Chengpeng Hu +2