pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 6

  1. cs.AI 2026-05-21 reviewed
    9B model with skill modules beats 32B LLM

    Skill Weaving: Efficient LLM Improvement via Modular Skillpacks

    Zhuo Li +7

  2. cs.CV 2026-05-21 reviewed
    Video models top open suturing skill challenge

    OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

    Hanna Hoffmann +56

  3. cs.RO 2026-05-21 reviewed
    Visual primitives raise robot pick-and-place success by 27%

    Action with Visual Primitives

    Weilong Guo +8

  4. cs.AI 2026-05-21 reviewed
    LLM recall tracks paper citations across 15 models

    LLM-Metrics: Measuring Research Impact Through Large Language Model Memory

    Si Shen +2

  5. cs.SE 2026-05-21 reviewed
    LLMs verify only 10% of test suites on code mutations

    SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?

    Yuxuan Sun +8

  6. cs.AI 2026-05-21 reviewed
    Metric shows VLM explainers miss text synergy

    Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

    Jo\"el Roman Ky +2

  7. cs.AI 2026-05-21 reviewed
    Fixed harness lifts 116 of 126 LLM agent settings without model changes

    Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

    Tianshi Xu +2

  8. cs.AI 2026-05-21 reviewed
    Dual selection prunes video tokens while keeping static scenes and changes

    ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs

    Bingjun Luo +3

  9. cs.LG 2026-05-21 reviewed
    OWPO lets LLMs self-evolve without fixed references

    One-Way Policy Optimization for Self-Evolving LLMs

    Shuo Yang +8

  10. cs.AI 2026-05-21 reviewed
    IdleSpec converts agent wait time into better plans

    IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents

    Daewon Choi +6

  11. cs.AI 2026-05-21 reviewed
    Hygiene rules enable LLM agents to self-improve skills effectively

    Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

    Xing Zhang +6

  12. cs.LG 2026-05-21 reviewed
    Learned transfer keeps relevant facts in long-term KG memory

    Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

    Taewoon Kim +2

  13. cs.AI 2026-05-21 reviewed
    30B agents rival 1T models with 25-95% fewer tokens

    Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

    Mingkai Deng +6

  14. q-bio.BM 2026-05-21 reviewed
    Three-view pretraining lifts protein structure prediction

    Atom-level Protein Representation Learning Improves Protein Structure Prediction

    Taewon Kim +8

  15. q-bio.BM 2026-05-21 reviewed
    Three-view token recovery lifts protein structure tasks

    Atom-level Protein Representation Learning Improves Protein Structure Prediction

    Taewon Kim +8

  16. cs.CR 2026-05-21 reviewed
    Physical objects flip trust to exclude benign vehicles from perception

    Adversarial Trust Poisoning in Vehicular Collaborative Perception

    Yutong Liu +3

  17. cs.AI 2026-05-21 reviewed
    MLLMs get personality scores right but ignore video cues half the time

    Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

    Caixin Kang +10

  18. cs.AI 2026-05-21 reviewed
    Tree-aware KV eviction cuts memory 4x for LLM reasoning

    ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning

    Yeqiu Chen +5

  19. cs.AI 2026-05-21 reviewed
    ExComm resolves agent conflicts to improve test-time scaling

    ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling

    Woomin Song +6

  20. cs.AI 2026-05-21 reviewed
    Benchmark reveals limits in multi-page document parsing

    MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

    Bangbang Zhou +10

  21. cs.CV 2026-05-21 reviewed
    Text embeddings boost ImageNet accuracy by up to 2.7 points

    TextTeacher: What Can Language Teach About Images?

    Tobias Christian Nauen +5

  22. econ.GN 2026-05-21 reviewed
    Humans beat LLMs in Colonel Blotto tournaments

    Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament

    Dmitry Dagaev +4

  23. cs.AI 2026-05-21 reviewed
    Ontological continuum unifies knowledge graph modeling

    Knowledge Graph Re-engineering Along the Ontological Continuum (extended version)

    Enrico Daga +2

  24. cs.AI 2026-05-21 reviewed
    Camera cooperation cuts UAV beam steering overhead by 71 percent

    A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing

    Wenfeng Wu +2

  25. cs.CV 2026-05-21 reviewed
    Latent future scenes improve VLA driving over pixel reconstruction

    LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model

    Xiaodong Mei +5

  26. cs.CV 2026-05-21 reviewed
    General models gain far more from images than medical ones in licensing exams

    JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation

    Yue Xun +12

  27. cs.AI 2026-05-21 reviewed
    Training-free pooling lifts Video LLM accuracy without retraining

    Enhancing Visual Token Representations for Video Large Language Models via Training-Free Spatial-Temporal Pooling and Gridding

    Bingjun Luo +3

  28. cs.LG 2026-05-21 reviewed
    Subproblem curriculum RL improves LLM math reasoning by 4.1 points

    From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

    Xitai Jiang +5

  29. cs.CV 2026-05-21 reviewed
    Framework turns 2D heart ultrasounds into accurate 4D models

    Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos

    Yanan Liu +7

  30. cs.CR 2026-05-21 reviewed
    WaveGuard perturbs API images to block model distillation

    Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

    Yilan Gao +3

  31. cs.LG 2026-05-21 reviewed
    Prototype stages top time series accuracy on 80 of 128 UCR datasets

    Prototype-Guided Classification Sub-Task Decoupling Framework: Enhancing Generalization and Interpretability for Multivariate Time Series

    Xianhao Song +4

  32. cs.AI 2026-05-21 reviewed
    LLM diagnostic accuracy drops 13% in interactive settings

    Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support

    Chen Zhan +10

  33. cs.DC 2026-05-21 reviewed
    Framework computes determinants securely on edge servers

    Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments

    Prajwal Panth

  34. cs.CV 2026-05-21 reviewed
    BEV maps from RGB-D cut tokens yet raise VLN success rates

    GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

    Jiahao Yang +6

  35. cs.CV 2026-05-21 reviewed
    AgroVG benchmark shows top models at 0.35 Set-F1 on farm targets

    AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding

    Haocheng Li +7

  36. cs.CV 2026-05-21 reviewed
    Dataset records real flooded roads for self-driving cars

    FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments

    Connor Malone +2

  37. cs.LG 2026-05-21 reviewed
    Five lines of code expose an LLM's hidden vocabulary secrets

    Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

    Hisashi Miyashita

  38. cs.CL 2026-05-21 reviewed
    RoBERTa reaches 93 percent accuracy on IMDb sentiment task

    From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification

    Dip Biswas Shanto +3

  39. cs.CR 2026-05-21 reviewed
    Camouflaged attacks slash LLM guard detection from 94% to 10%

    Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

    Aaditya Pai

  40. cs.CV 2026-05-21 reviewed
    Method turns BIT phase volumes into realistic 3D H&E stains

    Virtual 3D H&E Staining from Phase-contrast Back-illumination Interference Tomography

    Anthony Song +5

  41. cs.AI 2026-05-21 reviewed
    Event log becomes the agent for replayable and forkable runs

    The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

    Yohei Nakajima

  42. cs.SE 2026-05-21 reviewed
    Patch-guided trajectories raise SWE agent fixes by 10.8 points at 15% lower cost

    From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

    Murong Ma +9

  43. cs.LG 2026-05-21 reviewed
    Auditable encoder reveals semantic nodes are structurally disconnected

    Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs

    Yoav Kor Sade +4

  44. cs.AI 2026-05-21 reviewed
    Coupled optimization yields verifiable evidence in rankings

    ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

    Miaobo Hu +7

  45. cs.CV 2026-05-21 reviewed
    Counterfactual RL raises video LLM dynamic accuracy

    Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

    Dazhao Du +9

  46. cs.AI 2026-05-21 reviewed
    User refinements raise code agent acceptance from 25.7% to 35.7%

    Echo: Learning from Experience Data via User-Driven Refinement

    Hande Dong +17

  47. cs.CV 2026-05-21 reviewed
    LVLMs collect emotional cues in middle layers then translate in deep layers

    Interpreting and Enhancing Emotional Circuits in Large Vision-Language Models via Cross-Modal Information Flow

    Chengsheng Zhang +3

  48. cs.CV 2026-05-21 reviewed
    Video frames close the detection gap between AI images and videos

    Video as Natural Augmentation: Towards Unified AI-Generated Image and Video Detection

    Zhengcen Li +6

  49. cs.AI 2026-05-21 reviewed
    Mismatched schema and CSV format can cut facts below baseline in table-to-graph extraction

    Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

    Jingxuan Qi +2

  50. cs.IR 2026-05-21 reviewed
    LLM semantic retrieval raises ad recommendation stability

    LLM Retrieval for Stable and Predictable Ad Recommendations

    Vinodh Kumar Sunkara +15