archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 21

cs.CR 2026-05-18 reviewed

Sparse attention heads let optimized obfuscation jailbreak LLMs
Babel: Jailbreaking Safety Attention via Obfuscation Distribution Optimized Sampling

Ziwei Wang +8
cs.AI 2026-05-18 reviewed

SFT removes noisy token interactions in LLMs then overfits
Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

Junpeng Zhang +5
cs.SE 2026-05-18 reviewed

Agentic RAG reaches 78% top-1 file bug localization
BLAgent: Agentic RAG for File-Level Bug Localization

Md Afif Al Mamun +1
cs.CR 2026-05-18 reviewed

Bézier paths connect adversarial examples for faster attacks
MoCo-EA: Exploiting Adversarial Mode Connectivity for Efficient Evolutionary Attacks

Hyo Seo Kim +5
cs.CV 2026-05-18 reviewed

Fewer semantic tokens match full multimodal performance
A More Word-like Image Tokenization for MLLMs

Hyun Lee +6
cs.AI 2026-05-18 reviewed

Agents reach 79% on game video frames
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Lingtao Mao +6

4 Piths
cs.AI 2026-05-18 reviewed

Benchmark shows agents at 79% on game video questions vs 95% oracle
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Lingtao Mao +6

4 Piths
cs.CR 2026-05-18 reviewed

Latent access to guard models speeds prompt-injection checks over 3x
ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense

Yash Narendra
cs.LG 2026-05-18 reviewed

Mirrored unlearning boosts data attribution in diffusion models
Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew

Joan Serr\`a +4
cs.CL 2026-05-18 reviewed

BacktestBench tests LLMs on 18k backtesting QA pairs from real markets
BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Zhensheng Wang +5
cs.CL 2026-05-18 reviewed

Prompt compression fails to transfer to diffusion LLMs
Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

Sterling Huang +6
cs.LG 2026-05-18 reviewed

Transient expert steers MoE updates to cut forgetting
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

Yang Liu +2
cs.DC 2026-05-18 reviewed

Balancer halves imbalance in video diffusion transformer training
AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training

Yucheng Guo +8
cs.DB 2026-05-18 reviewed

DHNs capture unary negation fragment and counting extensions
Expressive Power of Deep Homomorphism Networks over Relational Databases

Moritz Sch\"onherr +5
cs.LG 2026-05-18 reviewed

One anchor pair identifies domain transfer under Jacobian sparsity
Domain Transfer Becomes Identifiable via a Single Alignment

Sagar Shrestha +3
cs.AI 2026-05-18 reviewed

Compiler enforces AI rules in constant time with formal proofs
Ethical Hyper-Velocity (EHV): A Hardware-Rooted Zero-Trust Runtime Enforcement Architecture for Agentic AI Systems

Riddhi Mohan Sharma
cs.CV 2026-05-18 reviewed

One model translates any sensor features to any other without retraining
One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception

Yang Li +9
cs.AI 2026-05-18 reviewed

AI chunking builds maps predicting war in Thucydides model
Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap

Akash Kumar Panda +2
cs.AI 2026-05-18 reviewed

Literature evidence anchors stochastic degradation model choice
LAST-RAG: Literature-Anchored Stochastic Trajectory Retrieval-Augmented Generation for Knowledge-Conditioned Degradation Model Selection

Hanbyeol Park +1
cs.AI 2026-05-18 reviewed

LLM voice agent hits 83.9% success on 400k daily calls
DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition

Le Zhang +5
cs.LG 2026-05-18 reviewed

Single forward pass matches AlphaFold3 protein accuracy
DCFold: Efficient Protein Structure Generation with Single Forward Pass

Zhe Zhang +5
cs.AI 2026-05-18 reviewed

Benchmark rates AI agents by child cognitive age
Evaluating Cognitive Age Alignment in Interactive AI Agents

Yifan Shen +6
cs.LG 2026-05-18 reviewed

Inter-layer null signals cut attention sinks and boost quantized accuracy
Attention Sinks and Outliers in Attention Residuals

Haozheng Luo +12
cs.CL 2026-05-18 reviewed

AI agent teams beat human teams at generating creative ideas
Multi-agent AI systems outperform human teams in creativity

Tiancheng Hu +7
cs.MM 2026-05-18 reviewed

Two-phase sampling matches contradictory audio prompts to video
CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

Gyubin Lee +2
cs.DC 2026-05-18 reviewed

Guard boosts training utilization 1.7x by catching hidden stragglers
Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training

Guanliang Liu +16
cs.AI 2026-05-18 reviewed

Internal probe plus attention head yields step rewards for agents
PAIR: Prefix-Aware Internal Reward Model for Multi-Turn Agent Optimization

Wonjoong Kim +4
cs.LG 2026-05-18 reviewed

Hindsight targets fix actions to cut agent training time 2.26x
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Woongyeng Yeo +3
cs.LG 2026-05-18 reviewed

Freshness control lets async distillation match sync results
$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control

Xianwei Chen +2
cs.CL 2026-05-18 reviewed

New multi-accent dataset lowers ASR errors on technical talks
PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions

Sicheng Jin +2
cs.AI 2026-05-18 reviewed

This paper introduces Knowledge Infrastructure (KI)
KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science

Ziwei Li +7
cs.LG 2026-05-18 reviewed

State-action split lets GRPO train open-world VLM agents
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Xiongbin Wu +9
cs.LG 2026-05-18 reviewed

State-action decomposition adapts GRPO for VLM agents
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Xiongbin Wu +9
cs.CL 2026-05-18 reviewed

SynPro yields 3.7-5.2x more effective tokens from organic data
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

Zichun Yu +1
cs.LO 2026-05-18 reviewed

Retrieval system compresses Lean proofs over 70 percent
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Jialin Lu +6
cs.LG 2026-05-18 reviewed

Bilevel optimization adapts distillation loss weights per sample
Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization

Anh B.H. Nguyen +2
cs.CV 2026-05-18 reviewed

Temporal pruning speeds video diffusion while preserving fidelity
Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Sheng Li +5
cs.CV 2026-05-18 reviewed

Temporal smoothing lets pruning speed up video diffusion
Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Sheng Li +5
cs.LG 2026-05-18 reviewed

One-step updates and barriers speed up noisy label correction
Efficient Bilevel Optimization for Meta Label Correction in Noisy Label Learning

Ba Hoang Anh Nguyen +1
cs.AI 2026-05-18 reviewed

Memory-equipped agents show rising safety risks over time
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

Ahmad Al-Tawaha +4
cs.AI 2026-05-18 reviewed

Interactive AI needs new rules for judging trajectories
Interactive Evaluation Requires a Design Science

Keyang Xuan +12
cs.LG 2026-05-18 reviewed

Orthogonal manifold moves identify content and style without independence
Content-Style Identification via Differential Independence

Subash Timilsina +3
cs.CV 2026-05-18 reviewed

VLMs count by prior instead of image when facts clash
CounterCount: A Diagnostic Framework for Counting Bias in Vision Language Models

Reem Alzahrani +5
cs.CV 2026-05-18 reviewed

Scene understanding training produces human-like fixations in foveated model
Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding

Shravan Murlidaran +3
cs.DC 2026-05-18 reviewed

TierCheck cuts LLM checkpointing time below 10 seconds
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training

Shujie Han +7
cs.RO 2026-05-18 reviewed

Topple actions speed up stack rearrangement plans
Virtues of Ordered Chaos: Planning with Topple Actions in Tabletop Stack Rearrangement

Hao Lu +1
cs.AI 2026-05-18 reviewed

Accountability boundary decides whether vertical AI firms lose value by going headless
Going Headless? On the Boundaries of Vertical AI Firms

Muhammad Zia Hydari +1
cs.LG 2026-05-18 reviewed

Shared Transformer Splits into Proposal and Uncertainty Roles
One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

Jucheng Shen +2
cs.AI 2026-05-18 reviewed

PuppyChatter unifies SDK simplicity across AI vendors
Accelerating AI-Powered Research: The PuppyChatter Framework for Usable and Flexible Tooling

Chun-Hsiung Tseng +4
cs.CV 2026-05-18 reviewed

Reward variance selects learnable prompts for T2I training
Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

Baoteng Li +10