pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

14513 papers in cs.AI · page 18

  1. cs.CV 2026-05-18 reviewed
    Multi-task training builds balanced multimodal model

    Lance: Unified Multimodal Modeling by Multi-Task Synergy

    Fengyi Fu +12

  2. cs.CV 2026-05-18 reviewed
    Lance beats prior open models at image and video generation

    Lance: Unified Multimodal Modeling by Multi-Task Synergy

    Fengyi Fu +12

  3. cs.LG 2026-05-18 reviewed
    Cyclic method boosts RL sample efficiency over online baselines

    COOPO: Cyclic Offline-Online Policy Optimization Algorithm

    Qisai Liu +5

  4. cs.AI 2026-05-18 reviewed
    Holistic encoding scales general planning policies to thousands of objects

    Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning

    Michael Aichm\"uller +3

  5. cs.AI 2026-05-18 reviewed
    LLM agents need three separate safety layers

    Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

    S.Bensalem +8

  6. cs.AI 2026-05-18 reviewed
    Config choices rival model selection on GIM benchmark

    GIM: Evaluating models via tasks that integrate multiple cognitive domains

    Rohit Patel +2

  7. cs.AI 2026-05-18 reviewed
    AI automates research but struggles with novelty and judgment

    AI for Auto-Research: Roadmap & User Guide

    Lingdong Kong +19

  8. cs.LG 2026-05-18 reviewed
    Dual-memory model lifts time series classification accuracy

    KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture

    Luis Balderas +4

  9. stat.ML 2026-05-18 reviewed
    FedNewton matches SGD accuracy with fewer rounds under privacy

    Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

    Arnab Auddy +2

  10. cs.LG 2026-05-18 reviewed
    Distilled trees retain 96.5% of TFM accuracy at 1.9 ms CPU speed

    Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

    Aditya Tanna +4

  11. cs.LG 2026-05-18 reviewed
    Human soft labels improve calibration and training stability

    An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

    Maja Pavlovic +2

  12. cs.LG 2026-05-18 reviewed
    Trained MoE models skip over half their experts after adaptation

    Post-Trained MoE Can Skip Half Experts via Self-Distillation

    Xingtai Lv +14

  13. cs.LG 2026-05-18 reviewed
    Context resampling beats TFM choice in credit risk

    Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models

    Aditya Tanna +5

  14. cs.LG 2026-05-18 reviewed
    Generative models in weight space match fine-tuning performance

    Position: Weight Space Should Be a First-Class Generative AI Modality

    Zhangyang Wang +2

  15. cs.AI 2026-05-18 reviewed
    Benchmark finds LLMs clarify only 52.7% of fluid mechanics cases

    SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

    Nithin Somasekharan +7

  16. cs.AI 2026-05-18 reviewed
    Partial traces recover lifted STRIPS+ domains

    Learning Lifted Action Models from Traces with Minimal Information About Actions and States

    Jonas G\"osgens +2

  17. cs.CV 2026-05-18 reviewed
    Cross-view data and explicit alignment advance MLLM spatial reasoning

    CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

    Wei Wang +6

  18. cs.LG 2026-05-18 reviewed
    SPBM adds constraints to deep learning with linear overhead

    Stochastic Penalty-Barrier Methods for Constrained Machine Learning

    Adam Bos\'ak +4

  19. cs.RO 2026-05-18 reviewed
    ManiSoft benchmark tests vision-language control on soft robotic arms

    ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics

    Ziyu Wei +4

  20. cs.SD 2026-05-18 reviewed
    Music autoencoder compresses audio 4096 times with quality intact

    SAME: A Semantically-Aligned Music Autoencoder

    Julian D. Parker +6

  21. cs.CV 2026-05-18 reviewed
    Sign-aware aggregation sustains unlearning across sequential VLM requests

    CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic

    Shen Lin +5

  22. cs.AI 2026-05-18 reviewed
    Latent actions shorten LLM agent decision horizons

    Latent Action Reparameterization for Efficient Agent Inference

    Wenhao Huang +13

  23. cs.CR 2026-05-18 reviewed
    Typographic attacks make robots grab the wrong objects

    Not What You Asked For: Typographic Attacks in Household Robot Manipulation

    Ali Iranmanesh +1

  24. cs.LG 2026-05-18 reviewed
    Memory of past evaluations improves rubric updates for RL

    AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

    Peilin Wu +6

  25. cs.LG 2026-05-18 reviewed
    Randomized iterations turn natural policy gradients into direct backprop

    Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

    Mingfei Sun

  26. cs.SE 2026-05-18 reviewed
    Stripping consent declarations raises overeager rate in coding agents

    Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

    Yubin Qu +6

  27. cs.AI 2026-05-18 reviewed
    Revenue targets mask pricing discipline failures

    When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

    Peiying Zhu +1

  28. cs.AI 2026-05-18 reviewed
    Query context ranks medical entities across systems

    Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning

    Yan Jiao +3

  29. cs.CL 2026-05-18 reviewed
    Memory systems score 27.9% under fact interference in long contexts

    MINTEval: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

    Hyunji Lee +5

  30. cs.IR 2026-05-18 reviewed
    q-log odds lift BM25 NDCG@10 by 89% on code search

    Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

    Santosh Kumar Radha +1

  31. cs.RO 2026-05-18 reviewed
    Key-Gram memory boosts robot manipulation performance

    Key-Gram: Extensible World Knowledge for Embodied Manipulation

    Jingjing Fan +3

  32. cs.CV 2026-05-18 reviewed
    Quality signals steer flow matching to fix occluded hands in video

    StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video

    Huajian Zeng +5

  33. cs.CL 2026-05-18 reviewed
    Frontier LLMs score under 40% on dynamic tool-use benchmark

    STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics

    Tingfeng Hui +7

  34. cs.AI 2026-05-18 reviewed
    Tuning-free VLM steers focus to active speaker for emotion recognition

    VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

    Linan ZHU +6

  35. cs.LG 2026-05-18 reviewed
    Manifold probe reveals how models encode time and space

    Probing for Representation Manifolds in Superposition

    Alexander Modell

  36. cs.CL 2026-05-18 reviewed
    Continuous diffusion scales to 20x compute gap of autoregressive models

    Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

    Zhihan Yang +7

  37. cs.AI 2026-05-18 reviewed
    Self-generated hints fix token credit in LLM reinforcement learning

    AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

    Zhenlin Wei +8

  38. cs.CV 2026-05-18 reviewed
    Color features alone classify cancer at up to 89% accuracy

    Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification

    Farnaz Kheiri +2

  39. cs.LG 2026-05-18 reviewed
    DiPRL trains nearly discrete programmatic policies in RL

    DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

    Chengpeng Hu +2

  40. cs.LG 2026-05-18 reviewed
    LLM outputs hypergraphs to generate editable floor plans

    HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

    Nikita Klimenko +4

  41. cs.LG 2026-05-18 reviewed
    DBES metrics select expert paths for up to 94% domain gains at 15% cost

    DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs

    Jing Wang +4

  42. cs.LG 2026-05-18 reviewed
    Morphology drives biological signal classification over model type

    Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals

    Jordan Tschida +12

  43. cs.AI 2026-05-18 reviewed
    Concept removal measures causal roles in black-box vision classifiers

    OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

    Chiara Maria Russo +5

  44. stat.CO 2026-05-18 reviewed
    LLM generates MCMC samplers from natural language descriptions

    AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

    Jungang Zou +2

  45. cs.LG 2026-05-18 reviewed
    One post-training run supports any bit budget for LLM quantization

    GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

    Zhangyang Yao +5

  46. cs.CR 2026-05-18 reviewed
    Generator turns text prompts into LLM fingerprints in one pass

    Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

    Sixu Chen +7

  47. stat.ML 2026-05-18 reviewed
    Flow models gain per-sample confidence at standard sampling cost

    Flowing with Confidence

    Friso de Kruiff +3

  48. cs.AI 2026-05-18 reviewed
    Firefly algorithm auto-clusters data without preset count

    When Fireflies Cluster; Enhancing Automatic Clustering via Centroid-Guided Firefly Optimization

    MKA Ariyaratne +3

  49. stat.ML 2026-05-18 reviewed
    Markov Chain Decoders Fix Heavy-Tail Limits in VAEs

    Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

    Abdelhakim Ziani +2

  50. cs.LG 2026-05-18 reviewed
    Readable programs match deep RL on job scheduling benchmarks

    Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

    Chengpeng Hu +2