archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 14

cs.CL 2026-05-19 reviewed

Long-term medical dialogue benchmark reveals LLM limitations
Synthesis and Evaluation of Long-term History-aware Medical Dialogue

Hebin Hu +3
cs.AI 2026-05-19 reviewed

Dataset records affect at group
GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

Meisam Jamshidi Seikavandi +12
cs.AI 2026-05-19 reviewed

Pure code boosts programming but hurts complex math reasoning
What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

Yuze Zhao +8
cs.LG 2026-05-19 reviewed

Quadratic model handles heavy and light tailed noise
Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning

Zheng Zhai +1
cs.LG 2026-05-19 reviewed

Models distort physical quantity distributions despite plausible paths
Mechanisms of Misgeneralization in Physical Sequence Modeling

Kento Nishi +4
cs.AI 2026-05-19 reviewed

Benchmark shows attention models scale better than RNNs on sequences
CogScale: Scalable Benchmark for Sequence Processing

Yannis Bendi-Ouis (Mnemosyne) +2
cs.AI 2026-05-19 reviewed

Memory RL agent self-corrects complex CAD models
Memory-Augmented Reinforcement Learning Agent for CAD Generation

Yin Xiaolong +6
cs.AI 2026-05-19 reviewed

Multi-agent LLM framework hits 97 percent task completion on engineering benchmarks
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

Gioele Molinari +3
cs.CL 2026-05-19 reviewed

Node topology turned into text improves graph anomaly detection
TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection

Wen Shi +8
cs.CL 2026-05-19 reviewed

Fuzzy concept graph cuts RAG indexing to 30 LLM calls
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation

Roman Prosvirnin +2
cs.CV 2026-05-19 reviewed

Staged distillation keeps tiny diffusion models stable at 1.6 percent teacher size
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

Hyunsoo Han +2
cs.CV 2026-05-19 reviewed

Tiny diffusion models reach FID 15.73 with staged distillation
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

Hyunsoo Han +2
cs.CL 2026-05-19 reviewed

Review of 120 studies maps LLM math reasoning gaps
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Husnain Amjad +3
cs.CR 2026-05-19 reviewed

Measure AI security agent safety beyond refusal rates
Measuring Safety Alignment Effects in Autonomous Security Agents

Isaac David +1
cs.AI 2026-05-19 reviewed

Prospect theory replaces rational assumptions in strategic classification
Beyond Rational Illusion: Behaviorally Realistic Strategic Classification

Xinpeng Lv +13
eess.IV 2026-05-19 reviewed

TADA adapts steganalysis to unknown JPEG pipelines
Tackle CSM in JPEG Steganalysis with Data Adaptation

Rony Abecidan (CRIStAL) +5
cs.AI 2026-05-19 reviewed

Symmetry properties generate local search neighborhoods automatically
Transforming Constraint Programs to Input for Local Search

Jo Devriendt +2
cs.LG 2026-05-19 reviewed

Spectral filter repairs fine-tuning damage without retraining
Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

Aarash Abro +1
cs.SE 2026-05-19 reviewed

Criterion-level pairwise judgments lift code judge accuracy to 66.3%
CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging

Zhenyu Li +3
cs.AI 2026-05-19 reviewed

Pseudocode paths cut hallucinations in vision-language models
Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models

Weicong Ni +2
cs.AI 2026-05-19 reviewed

Strategic alignment fixes tabular foundation model bias under manipulation
When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

Xinpeng Lv +15
cs.LG 2026-05-19 reviewed

Static quantization speeds LLM inference on mobile NPUs
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

Jinghe Zhang +7
cs.HC 2026-05-19 reviewed

Single-file AI tools push accessibility boundaries outward
The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

Rizwan Jahangir +1
cs.CV 2026-05-19 reviewed

Panorama-first split lifts zero-shot navigation success 59 percent
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

Kai Sheng +7
cs.CL 2026-05-19 reviewed

One LLM system optimizes text to beat specialists on six tasks
optimize_anything: A Universal API for Optimizing any Text Parameter

Lakshya A Agrawal +13
cs.LG 2026-05-19 reviewed

Hierarchical Gaussian filters close the gap in deep predictive coding
Closed-form predictive coding via hierarchical Gaussian filters

Aleksandrs Baskakovs +5
cs.AI 2026-05-19 reviewed

Emotion cues lift deepfake detector generalization AUC by 2.1 percent
EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection

Aritra Marik +2
cs.CV 2026-05-19 reviewed

Component style transfer closes satellite sim-to-real gap
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction

Zongwu Xie +4
cs.CV 2026-05-19 reviewed

Part-wise style transfer raises satellite pose accuracy
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction

Zongwu Xie +4
cs.LG 2026-05-19 reviewed

MiMuon reaches O(1/N) generalization bound for matrix models
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

Feihu Huang +2
cs.CV 2026-05-19 reviewed

SVD-ordered paths yield less noisy model attributions
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

Soyeon Kim +3
cs.AI 2026-05-19 reviewed

Formal Skills move agent procedures into executable state machines
Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents

Xi Zhang +8
cs.CV 2026-05-19 reviewed

YOLO26-MoE hits 0.99 mAP for spotting insulator faults in drone photos
A novel YOLO26-MoE optimized by an LLM agent for insulator fault detection considering UAV images

Jo\~ao Pedro Matos-Carvalho +4
cs.AI 2026-05-19 reviewed

Offloading slows smaller LLMs more in mixed serving
Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

Mert Yildiz +4
cs.RO 2026-05-19 reviewed

Dual-window design smooths RL control without expanding action space
Implicit Action Chunking for Smooth Continuous Control

Bosun Liang +7
cs.AI 2026-05-19 reviewed

Code programs generate editable articulated indoor scenes from text
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Puyi Wang +6
cs.CV 2026-05-19 reviewed

Laminating film on lenses blocks identity while keeping action cues
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

Mengyuan Liu +3
cs.CV 2026-05-19 reviewed

Laminating film on lenses hides identities for action recognition
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

Mengyuan Liu +3
cs.AI 2026-05-19 reviewed

Governance recipe lifts LLM skill-library performance from 0.26 to 0.58
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

Xing Zhang +6
cs.LG 2026-05-19 reviewed

Rotations fix MXFP4 activation errors in LLMs
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization

Zukang Xu +2
cs.CV 2026-05-19 reviewed

MLLMs often back correct answers with inconsistent egocentric evidence
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs

Yang Dai +3
cs.CV 2026-05-19 reviewed

RL solver reaches 82.9% on CAPTCHA benchmark
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

Pengcheng Wang +7
cs.AI 2026-05-19 reviewed

LLM adaptive tests recover only half the intended skill variance
Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

Grandee Lee +3
cs.CL 2026-05-19 reviewed

Merging LLMs into VLMs boosts instructions but not math
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

Zhiyu Xu +7
cs.AI 2026-05-19 reviewed

Triplet data needed to measure voter disagreements accurately
Efficient Elicitation of Collective Disagreements

Mohamed Ouaguenouni +4
cs.AI 2026-05-19 reviewed

Benchmark exposes LLM limits in knowledge graph building
BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation

Carla Castedo +5
cs.CL 2026-05-19 reviewed

Base models fool AI detectors into rating text as human
Base Models Look Human To AI Detectors

Yixuan Even Xu +4
cs.AI 2026-05-19 reviewed

Context management determines real-world Transformer Turing-completeness
Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

Guanyu Cui +2
cs.RO 2026-05-19 reviewed

Game creatures become RL testbeds in new MuJoCo suite
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

Carlo Romeo +1
cs.RO 2026-05-19 reviewed

One reward function trains policies for four game robots
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

Carlo Romeo +1