archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 10
-
DASH discovers strong hybrid attention for LLMs in 20 minutes on one GPU
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
-
Strategy induction from questions alone improves LLM task instructions
Strategy-Induct: Task-Level Strategy Induction for Instruction Generation
-
Vector-clock monitor matches causal-guard semantics locally
Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows
-
Oscillatory network scales to ImageNet with high efficiency
Winfree Oscillatory Neural Network
-
One program decodes bundles at 100% on four frozen embeddings
Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures
-
Sutra compiles VSA programs to tensor graphs with exact decoding
Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures
-
Unlearned models keep low calibration but lean on shortcuts
Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
-
Fighting game AIs learn how long to hold each move
For How Long Should We Be Punching? Learning Action Duration in Fighting Games
-
VISTA wins Ego4D STA challenge by fusing frozen video features into detector
VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026
-
Agent finds hidden threats in 15% of security incidents
GenAI-Driven Threat Detection with Microsoft Security Copilot
-
Agent surfaces novel threats in 15% of security incidents
GenAI-Driven Threat Detection with Microsoft Security Copilot
-
Frequency regularization lifts attack transfer to closed MLLMs
Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs
-
Skill synthesis scales terminal-agent data to beat baselines with 1% of it
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
-
Five checkpoints enforce policy in generalist agents
Governance by Construction for Generalist Agents
-
Taxonomy-based generator yields verifiable planning data for LLMs
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
-
Gradient moment method cuts 3D Gaussian count by 85-97%
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
-
Runtime bounds certify quantized KV attention with exact fallback
Runtime-Certified Bounded-Error Quantized Attention
-
N-step correction tightens PPO bound for RL with verifiable rewards
Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards
-
Multi-metric score spots synthetic narratives more reliably
Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse
-
Hypernetwork generates full robot policies from instructions alone
DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation
-
224K short videos collected by labels support semantic benchmarks
USV: Towards Understanding the User-generated Short-form Videos
-
New benchmark shows VLMs lag trained humans on building layouts
ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models
-
DPO matches RLHF only if optimal policy favors human responses
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
-
7B open LLMs run GraphRAG locally for EHR schema queries
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
-
Preference vector tunes task balance in merged continual learning models
Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning
-
ELSA gives spiking networks 3.4x faster inference than top accelerators
ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing
-
Local writes accumulate into global solutions in recursive reasoners
Interaction Locality in Hierarchical Recursive Reasoning
-
New guidance resolves gradient conflicts in flow models
Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards
-
Bias correction cuts pretraining loss in AdamW and similar optimizers
Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers
-
Distillation from richer pseudo-samples improves sparse glucose estimates
PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG
-
GLU shrinks NTK condition number for faster convergence
The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?
-
Hidden states at paragraph boundaries tune verifier strictness
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
-
Testbed embeds detectable hacks for automatic reward-gaming checks
Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale
-
Text modeling of EV battery signals enables LLM fault diagnosis
VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals
-
RL scores full distributions to fix LLM regression
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
-
Monitor reduces LLM agent covert channels to zero capacity
An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress
-
Designer ratings dataset lifts AI graphic scorer to 0.611 agreement
TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design
-
Aligning task vectors to in-context next-token distributions lifts accuracy 9.2%
Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning
-
Group statistics adapt clipping and temperature to lift LLM math scores
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
-
SAVER selectively activates vision to boost F1 and cut latency in multimodal IE
SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction
-
Categorical error rates beat WER for Indic speech recognition
SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR
-
DAR cuts DiT training iterations by 8.75x while improving FID by 2.11
Rethinking Cross-Layer Information Routing in Diffusion Transformers
-
WebGPU backend cuts LLM memory use by 29-33% in browsers
Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
-
Heartbeat protocol revokes AI swarm credentials within fixed window
Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms
-
Agentic system solves 8 of 10 research math problems
RMA: an Agentic System for Research-Level Mathematical Problems
-
Agreement screening yields clearer text features at full accuracy
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
-
Typed contracts let agents compose data systems reliably
Declarative Data Services: Structured Agentic Discovery for Composing Data Systems
-
Self-limiting losses compress embeddings without overfitting
DIVE: Embedding Compression via Self-Limiting Gradient Updates
-
Dynamic experts cut error on shifting time series
Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting
-
AI reviewer beats top human on Nature papers
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists