A new VLA model called SI uses a four-step chain-of-thought to derive driving intent and applies it via classifier-free guidance to a flow-matching trajectory generator, showing competitive Waymo scores and intent-controllable plans.
Advances in Neural Information Processing Systems , year =
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
unclear 1representative citing papers
ProofGrid is a new benchmark for LLM reasoning that uses machine-checkable proofs in minimal formal notation, revealing progress on basic tasks but major gaps in complex combinatorial and synthesis reasoning.
PRISM weights target examples by the current model's preference to build a better representation for influence-function scoring of training samples in efficient LLM fine-tuning.
Presents an audit-constrained protocol for targeted LLM reasoning evaluation using component grammar prompt variants and shows that Component-Adaptive Prompt Sampling does not outperform uniform sampling in audited yield.
Tool identity is linearly readable and steerable in LLMs via mean activation differences, with 77-100% switch accuracy and error prediction from activation gaps.
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.
citing papers explorer
-
Action Emergence from Streaming Intent
A new VLA model called SI uses a four-step chain-of-thought to derive driving intent and applies it via classifier-free guidance to a flow-matching trajectory generator, showing competitive Waymo scores and intent-controllable plans.
-
Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism
ProofGrid is a new benchmark for LLM reasoning that uses machine-checkable proofs in minimal formal notation, revealing progress on basic tasks but major gaps in complex combinatorial and synthesis reasoning.
-
Preference-aware Influence-function-based Data Selection Method for Efficient Fine-Tuning
PRISM weights target examples by the current model's preference to build a better representation for influence-function scoring of training samples in efficient LLM fine-tuning.
-
Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
Presents an audit-constrained protocol for targeted LLM reasoning evaluation using component grammar prompt variants and shows that Component-Adaptive Prompt Sampling does not outperform uniform sampling in audited yield.
-
Tool Calling is Linearly Readable and Steerable in Language Models
Tool identity is linearly readable and steerable in LLMs via mean activation differences, with 77-100% switch accuracy and error prediction from activation gaps.
-
Language models fail at extended rule following
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.