archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 21
-
Sparse attention heads let optimized obfuscation jailbreak LLMs
Babel: Jailbreaking Safety Attention via Obfuscation Distribution Optimized Sampling
-
SFT removes noisy token interactions in LLMs then overfits
Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective
-
Agentic RAG reaches 78% top-1 file bug localization
BLAgent: Agentic RAG for File-Level Bug Localization
-
Bézier paths connect adversarial examples for faster attacks
MoCo-EA: Exploiting Adversarial Mode Connectivity for Efficient Evolutionary Attacks
-
Fewer semantic tokens match full multimodal performance
A More Word-like Image Tokenization for MLLMs
-
Agents reach 79% on game video frames
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain
4 Piths -
Benchmark shows agents at 79% on game video questions vs 95% oracle
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain
4 Piths -
Latent access to guard models speeds prompt-injection checks over 3x
ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense
-
Mirrored unlearning boosts data attribution in diffusion models
Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew
-
BacktestBench tests LLMs on 18k backtesting QA pairs from real markets
BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting
-
Prompt compression fails to transfer to diffusion LLMs
Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA
-
Transient expert steers MoE updates to cut forgetting
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
-
Balancer halves imbalance in video diffusion transformer training
AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
-
DHNs capture unary negation fragment and counting extensions
Expressive Power of Deep Homomorphism Networks over Relational Databases
-
One anchor pair identifies domain transfer under Jacobian sparsity
Domain Transfer Becomes Identifiable via a Single Alignment
-
Compiler enforces AI rules in constant time with formal proofs
Ethical Hyper-Velocity (EHV): A Hardware-Rooted Zero-Trust Runtime Enforcement Architecture for Agentic AI Systems
-
One model translates any sensor features to any other without retraining
One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception
-
AI chunking builds maps predicting war in Thucydides model
Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap
-
Literature evidence anchors stochastic degradation model choice
LAST-RAG: Literature-Anchored Stochastic Trajectory Retrieval-Augmented Generation for Knowledge-Conditioned Degradation Model Selection
-
LLM voice agent hits 83.9% success on 400k daily calls
DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition
-
Single forward pass matches AlphaFold3 protein accuracy
DCFold: Efficient Protein Structure Generation with Single Forward Pass
-
Benchmark rates AI agents by child cognitive age
Evaluating Cognitive Age Alignment in Interactive AI Agents
-
Inter-layer null signals cut attention sinks and boost quantized accuracy
Attention Sinks and Outliers in Attention Residuals
-
AI agent teams beat human teams at generating creative ideas
Multi-agent AI systems outperform human teams in creativity
-
Two-phase sampling matches contradictory audio prompts to video
CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation
-
Guard boosts training utilization 1.7x by catching hidden stragglers
Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training
-
Internal probe plus attention head yields step rewards for agents
PAIR: Prefix-Aware Internal Reward Model for Multi-Turn Agent Optimization
-
Hindsight targets fix actions to cut agent training time 2.26x
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents
-
Freshness control lets async distillation match sync results
$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control
-
New multi-accent dataset lowers ASR errors on technical talks
PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions
-
This paper introduces Knowledge Infrastructure (KI)
KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science
-
State-action split lets GRPO train open-world VLM agents
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
-
State-action decomposition adapts GRPO for VLM agents
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
-
SynPro yields 3.7-5.2x more effective tokens from organic data
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling
-
Retrieval system compresses Lean proofs over 70 percent
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search
-
Bilevel optimization adapts distillation loss weights per sample
Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization
-
Temporal pruning speeds video diffusion while preserving fidelity
Temporal Aware Pruning for Efficient Diffusion-based Video Generation
-
Temporal smoothing lets pruning speed up video diffusion
Temporal Aware Pruning for Efficient Diffusion-based Video Generation
-
One-step updates and barriers speed up noisy label correction
Efficient Bilevel Optimization for Meta Label Correction in Noisy Label Learning
-
Memory-equipped agents show rising safety risks over time
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
-
Interactive AI needs new rules for judging trajectories
Interactive Evaluation Requires a Design Science
-
Orthogonal manifold moves identify content and style without independence
Content-Style Identification via Differential Independence
-
VLMs count by prior instead of image when facts clash
CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models
-
Scene understanding training produces human-like fixations in foveated model
Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding
-
TierCheck cuts LLM checkpointing time below 10 seconds
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
-
Topple actions speed up stack rearrangement plans
Virtues of Ordered Chaos: Planning with Topple Actions in Tabletop Stack Rearrangement
-
Accountability boundary decides whether vertical AI firms lose value by going headless
Going Headless? On the Boundaries of Vertical AI Firms
-
Shared Transformer Splits into Proposal and Uncertainty Roles
One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer
-
PuppyChatter unifies SDK simplicity across AI vendors
Accelerating AI-Powered Research: The PuppyChatter Framework for Usable and Flexible Tooling
-
Reward variance selects learnable prompts for T2I training
Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation