archive
Every paper Pith has read. Search by title, abstract, or pith.
14513 papers in cs.AI · page 15
-
Adaptive coaching speeds learning to use robot guide dogs
CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog
-
Attention-guided RL raises jailbreak success on reasoning models
Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models
-
GUI agents reach only 36% success on media editing tasks
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
-
Finite dynamics samples enforce safety during RL learning
Sampling-Based Safe Reinforcement Learning
-
Pre-training boosts time series detection by 375% but not forecasting
Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models
-
Group reward targets keep solution diversity alive in RL reasoning
Beyond Mode Collapse: Distribution Matching for Diverse Reasoning
-
GUIDE raises ad GMV 4.1% with built-in safety fallback
Generative Auto-Bidding with Unified Modeling and Exploration
-
Predictor accuracy sets exact fault tolerance in Byzantine agreement
Resilient Byzantine Agreement with Predictions
-
Selective feedback reweighting lifts multi-turn agent success to 90%
What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents
-
Targeted attacks succeed on encoders without knowing the task
Targeted Downstream-Agnostic Attack
-
Spiking blocks replace Transformer nonlinearities with <1% accuracy drop
Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
-
Majority vote locks wrong answers after brief correct window in TTRL
Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting
-
Model fuses layout and netlist to predict cell delay at 0.92% error
FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction
-
Prototype-anchored training halves calibration error in place recognition
KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision
-
Backtracking fixes dual biases in LLM reasoning distillation
Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation
-
Output-layer gradient norm gates reuse to cut RLVR samples by 2.93x
When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
-
Pilot-only model beats full-CSI baselines across frequencies
PilotWiMAE: Pilot-Native Representation Learning for Wireless Channels
-
Signed graphs let AI agents resolve conflicts for better reasoning
Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling
-
Feedback prefixing improves LLM scaling by up to 2.8x efficiency
Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages
-
ODE paths limit forgetting when merging models sequentially
Unlocking the Potential of Continual Model Merging: An ODE Perspective
-
ODE traces low-loss paths for sequential model merging
Unlocking the Potential of Continual Model Merging: An ODE Perspective
-
Large models improve with unfiltered low-quality data
A Bitter Lesson for Data Filtering
-
JUDO outperforms GPT-4o on industrial anomaly QA with normal image references
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
-
Rebalancing attention boosts motion in image-to-video models
Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models
-
Rebalancing attention reduces reference dominance and increases video motion
Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models
-
Unlearning methods leave class traces in model representations
Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning
-
Reassembling entity pairs boosts synthetic QA accuracy by 88.9%
EmbGen: Teaching with Reassembled Corpora
-
LLMs run code for videos but miss spatial accuracy
PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning
-
LLM safety benchmarks are orbits under group actions
The Evaluation Game: Beyond Static LLM Benchmarking
-
Stochastic trajectories let recursive models reason with multiple hypotheses
Generative Recursive Reasoning
-
Probabilistic recursion lets models sample many reasoning paths
Generative Recursive Reasoning
-
Concept ontology filters noisy negatives to lift chest X-ray zero-shot tasks
Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings
-
Heat dissipation flow matching outperforms most baselines
Multi-Scale Generative Modeling with Heat Dissipation Flow Matching
-
Few agent skill specs fully disclose capabilities to users
Toward User Comprehension Supports for LLM Agent Skill Specifications
-
Only 19% of cybersecurity skills include example cues for users
Toward User Comprehension Supports for LLM Agent Skill Specifications
-
Repositioned anchors keep motion contacts across body shapes
Skinned Motion Retargeting with Spatially Adaptive Interaction Guidance
-
Action models align asymmetrically with brain action signals
Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay
-
Architecture lets AI agents break rules legitimately when justified
PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies
-
Claim differences as RL rewards balance caption hallucinations and omissions
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison
-
Supreme Court quashes 18 points more matrimonial petitions than Karnataka HC
IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis
-
Integral feedback reduces hallucinations in CT medical reports
Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis
-
Benchmark labels hallucinations via explicit reference worlds
HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models
5 Piths -
STAR-PólyaMath hits perfect scores on Putnam and IMO
STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision
-
Only 2 of 19 LLM trading studies use time-consistent data splits
Agentic Trading: When LLM Agents Meet Financial Markets
-
Protein Thoughts ranks true binders at mean position 11.2
Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery
-
LLMs close 99% of deals but earn low profits in hidden pricing
PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations
-
MOCHA improves agent skill correctness on every task
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
-
Event streams lift VLM captioning and VQA scores in low light and motion
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
-
Event streams improve VLM scene understanding in tough conditions
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
-
Small models flag jailbreaks before large models answer
Exploring and Developing a Pre-Model Safeguard with Draft Models