archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 14
-
Long-term medical dialogue benchmark reveals LLM limitations
Synthesis and Evaluation of Long-term History-aware Medical Dialogue
-
Dataset records affect at group
GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction
-
Pure code boosts programming but hurts complex math reasoning
What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code
-
Quadratic model handles heavy and light tailed noise
Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning
-
Models distort physical quantity distributions despite plausible paths
Mechanisms of Misgeneralization in Physical Sequence Modeling
-
Benchmark shows attention models scale better than RNNs on sequences
CogScale: Scalable Benchmark for Sequence Processing
-
Memory RL agent self-corrects complex CAD models
Memory-Augmented Reinforcement Learning Agent for CAD Generation
-
Multi-agent LLM framework hits 97 percent task completion on engineering benchmarks
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
-
Node topology turned into text improves graph anomaly detection
TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection
-
Fuzzy concept graph cuts RAG indexing to 30 LLM calls
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation
-
Staged distillation keeps tiny diffusion models stable at 1.6 percent teacher size
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
-
Tiny diffusion models reach FID 15.73 with staged distillation
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
-
Review of 120 studies maps LLM math reasoning gaps
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges
-
Measure AI security agent safety beyond refusal rates
Measuring Safety Alignment Effects in Autonomous Security Agents
-
Prospect theory replaces rational assumptions in strategic classification
Beyond Rational Illusion: Behaviorally Realistic Strategic Classification
-
TADA adapts steganalysis to unknown JPEG pipelines
Tackle CSM in JPEG Steganalysis with Data Adaptation
-
Symmetry properties generate local search neighborhoods automatically
Transforming Constraint Programs to Input for Local Search
-
Spectral filter repairs fine-tuning damage without retraining
Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining
-
Criterion-level pairwise judgments lift code judge accuracy to 66.3%
CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging
-
Pseudocode paths cut hallucinations in vision-language models
Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models
-
Strategic alignment fixes tabular foundation model bias under manipulation
When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach
-
Static quantization speeds LLM inference on mobile NPUs
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
-
Single-file AI tools push accessibility boundaries outward
The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems
-
Panorama-first split lifts zero-shot navigation success 59 percent
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation
-
One LLM system optimizes text to beat specialists on six tasks
optimize_anything: A Universal API for Optimizing any Text Parameter
-
Hierarchical Gaussian filters close the gap in deep predictive coding
Closed-form predictive coding via hierarchical Gaussian filters
-
Emotion cues lift deepfake detector generalization AUC by 2.1 percent
EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection
-
Component style transfer closes satellite sim-to-real gap
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
-
Part-wise style transfer raises satellite pose accuracy
Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction
-
MiMuon reaches O(1/N) generalization bound for matrix models
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
-
SVD-ordered paths yield less noisy model attributions
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
-
Formal Skills move agent procedures into executable state machines
Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents
-
YOLO26-MoE hits 0.99 mAP for spotting insulator faults in drone photos
A novel YOLO26-MoE optimized by an LLM agent for insulator fault detection considering UAV images
-
Offloading slows smaller LLMs more in mixed serving
Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption
-
Dual-window design smooths RL control without expanding action space
Implicit Action Chunking for Smooth Continuous Control
-
Code programs generate editable articulated indoor scenes from text
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects
-
Laminating film on lenses blocks identity while keeping action cues
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition
-
Laminating film on lenses hides identities for action recognition
Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition
-
Governance recipe lifts LLM skill-library performance from 0.26 to 0.58
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries
-
Rotations fix MXFP4 activation errors in LLMs
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
-
MLLMs often back correct answers with inconsistent egocentric evidence
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs
-
RL solver reaches 82.9% on CAPTCHA benchmark
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision
-
LLM adaptive tests recover only half the intended skill variance
Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment
-
Merging LLMs into VLMs boosts instructions but not math
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters
-
Triplet data needed to measure voter disagreements accurately
Efficient Elicitation of Collective Disagreements
-
Benchmark exposes LLM limits in knowledge graph building
BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation
-
Base models fool AI detectors into rating text as human
Base Models Look Human To AI Detectors
-
Context management determines real-world Transformer Turing-completeness
Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management
-
Game creatures become RL testbeds in new MuJoCo suite
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
-
One reward function trains policies for four game robots
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders