pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 14

  1. cs.CL 2026-05-19 reviewed
    Long-term medical dialogue benchmark reveals LLM limitations

    Synthesis and Evaluation of Long-term History-aware Medical Dialogue

    Hebin Hu +3

  2. cs.AI 2026-05-19 reviewed
    Dataset records affect at group

    GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

    Meisam Jamshidi Seikavandi +12

  3. cs.AI 2026-05-19 reviewed
    Pure code boosts programming but hurts complex math reasoning

    What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

    Yuze Zhao +8

  4. cs.LG 2026-05-19 reviewed
    Quadratic model handles heavy and light tailed noise

    Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning

    Zheng Zhai +1

  5. cs.LG 2026-05-19 reviewed
    Models distort physical quantity distributions despite plausible paths

    Mechanisms of Misgeneralization in Physical Sequence Modeling

    Kento Nishi +4

  6. cs.AI 2026-05-19 reviewed
    Benchmark shows attention models scale better than RNNs on sequences

    CogScale: Scalable Benchmark for Sequence Processing

    Yannis Bendi-Ouis (Mnemosyne) +2

  7. cs.AI 2026-05-19 reviewed
    Memory RL agent self-corrects complex CAD models

    Memory-Augmented Reinforcement Learning Agent for CAD Generation

    Yin Xiaolong +6

  8. cs.AI 2026-05-19 reviewed
    Multi-agent LLM framework hits 97 percent task completion on engineering benchmarks

    EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

    Gioele Molinari +3

  9. cs.CL 2026-05-19 reviewed
    Node topology turned into text improves graph anomaly detection

    TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection

    Wen Shi +8

  10. cs.CL 2026-05-19 reviewed
    Fuzzy concept graph cuts RAG indexing to 30 LLM calls

    ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation

    Roman Prosvirnin +2

  11. cs.CV 2026-05-19 reviewed
    Staged distillation keeps tiny diffusion models stable at 1.6 percent teacher size

    LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

    Hyunsoo Han +2

  12. cs.CV 2026-05-19 reviewed
    Tiny diffusion models reach FID 15.73 with staged distillation

    LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

    Hyunsoo Han +2

  13. cs.CL 2026-05-19 reviewed
    Review of 120 studies maps LLM math reasoning gaps

    Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

    Husnain Amjad +3

  14. cs.CR 2026-05-19 reviewed
    Measure AI security agent safety beyond refusal rates

    Measuring Safety Alignment Effects in Autonomous Security Agents

    Isaac David +1

  15. cs.AI 2026-05-19 reviewed
    Prospect theory replaces rational assumptions in strategic classification

    Beyond Rational Illusion: Behaviorally Realistic Strategic Classification

    Xinpeng Lv +13

  16. eess.IV 2026-05-19 reviewed
    TADA adapts steganalysis to unknown JPEG pipelines

    Tackle CSM in JPEG Steganalysis with Data Adaptation

    Rony Abecidan (CRIStAL) +5

  17. cs.AI 2026-05-19 reviewed
    Symmetry properties generate local search neighborhoods automatically

    Transforming Constraint Programs to Input for Local Search

    Jo Devriendt +2

  18. cs.LG 2026-05-19 reviewed
    Spectral filter repairs fine-tuning damage without retraining

    Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

    Aarash Abro +1

  19. cs.SE 2026-05-19 reviewed
    Criterion-level pairwise judgments lift code judge accuracy to 66.3%

    CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging

    Zhenyu Li +3

  20. cs.AI 2026-05-19 reviewed
    Pseudocode paths cut hallucinations in vision-language models

    Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models

    Weicong Ni +2

  21. cs.AI 2026-05-19 reviewed
    Strategic alignment fixes tabular foundation model bias under manipulation

    When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

    Xinpeng Lv +15

  22. cs.LG 2026-05-19 reviewed
    Static quantization speeds LLM inference on mobile NPUs

    Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

    Jinghe Zhang +7

  23. cs.HC 2026-05-19 reviewed
    Single-file AI tools push accessibility boundaries outward

    The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

    Rizwan Jahangir +1

  24. cs.CV 2026-05-19 reviewed
    Panorama-first split lifts zero-shot navigation success 59 percent

    P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

    Kai Sheng +7

  25. cs.CL 2026-05-19 reviewed
    One LLM system optimizes text to beat specialists on six tasks

    optimize_anything: A Universal API for Optimizing any Text Parameter

    Lakshya A Agrawal +13

  26. cs.LG 2026-05-19 reviewed
    Hierarchical Gaussian filters close the gap in deep predictive coding

    Closed-form predictive coding via hierarchical Gaussian filters

    Aleksandrs Baskakovs +5

  27. cs.AI 2026-05-19 reviewed
    Emotion cues lift deepfake detector generalization AUC by 2.1 percent

    EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection

    Aritra Marik +2

  28. cs.CV 2026-05-19 reviewed
    Component style transfer closes satellite sim-to-real gap

    Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction

    Zongwu Xie +4

  29. cs.CV 2026-05-19 reviewed
    Part-wise style transfer raises satellite pose accuracy

    Component-Aware Structure-Preserving Style Transfer for Satellite Visual Sim2Real Data Construction

    Zongwu Xie +4

  30. cs.LG 2026-05-19 reviewed
    MiMuon reaches O(1/N) generalization bound for matrix models

    MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

    Feihu Huang +2

  31. cs.CV 2026-05-19 reviewed
    SVD-ordered paths yield less noisy model attributions

    Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

    Soyeon Kim +3

  32. cs.AI 2026-05-19 reviewed
    Formal Skills move agent procedures into executable state machines

    Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents

    Xi Zhang +8

  33. cs.CV 2026-05-19 reviewed
    YOLO26-MoE hits 0.99 mAP for spotting insulator faults in drone photos

    A novel YOLO26-MoE optimized by an LLM agent for insulator fault detection considering UAV images

    Jo\~ao Pedro Matos-Carvalho +4

  34. cs.AI 2026-05-19 reviewed
    Offloading slows smaller LLMs more in mixed serving

    Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

    Mert Yildiz +4

  35. cs.RO 2026-05-19 reviewed
    Dual-window design smooths RL control without expanding action space

    Implicit Action Chunking for Smooth Continuous Control

    Bosun Liang +7

  36. cs.AI 2026-05-19 reviewed
    Code programs generate editable articulated indoor scenes from text

    SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

    Puyi Wang +6

  37. cs.CV 2026-05-19 reviewed
    Laminating film on lenses blocks identity while keeping action cues

    Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

    Mengyuan Liu +3

  38. cs.CV 2026-05-19 reviewed
    Laminating film on lenses hides identities for action recognition

    Lens Privacy Sealing: A New Benchmark and Method for Physical Privacy-Preserving Action Recognition

    Mengyuan Liu +3

  39. cs.AI 2026-05-19 reviewed
    Governance recipe lifts LLM skill-library performance from 0.26 to 0.58

    Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

    Xing Zhang +6

  40. cs.LG 2026-05-19 reviewed
    Rotations fix MXFP4 activation errors in LLMs

    TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization

    Zukang Xu +2

  41. cs.CV 2026-05-19 reviewed
    MLLMs often back correct answers with inconsistent egocentric evidence

    EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs

    Yang Dai +3

  42. cs.CV 2026-05-19 reviewed
    RL solver reaches 82.9% on CAPTCHA benchmark

    CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

    Pengcheng Wang +7

  43. cs.AI 2026-05-19 reviewed
    LLM adaptive tests recover only half the intended skill variance

    Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

    Grandee Lee +3

  44. cs.CL 2026-05-19 reviewed
    Merging LLMs into VLMs boosts instructions but not math

    Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

    Zhiyu Xu +7

  45. cs.AI 2026-05-19 reviewed
    Triplet data needed to measure voter disagreements accurately

    Efficient Elicitation of Collective Disagreements

    Mohamed Ouaguenouni +4

  46. cs.AI 2026-05-19 reviewed
    Benchmark exposes LLM limits in knowledge graph building

    BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation

    Carla Castedo +5

  47. cs.CL 2026-05-19 reviewed
    Base models fool AI detectors into rating text as human

    Base Models Look Human To AI Detectors

    Yixuan Even Xu +4

  48. cs.AI 2026-05-19 reviewed
    Context management determines real-world Transformer Turing-completeness

    Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

    Guanyu Cui +2

  49. cs.RO 2026-05-19 reviewed
    Game creatures become RL testbeds in new MuJoCo suite

    ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

    Carlo Romeo +1

  50. cs.RO 2026-05-19 reviewed
    One reward function trains policies for four game robots

    ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

    Carlo Romeo +1