EnsembleCert and ScaLabelCert enable tighter and exact certificates for neural network robustness against label-flipping attacks by leveraging white-box information and neural tangent kernel equivalence.
Title resolution pending
25 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
CocoaBench shows the best tested unified digital agents succeed on only 45.1% of human-designed tasks that demand integrated vision, search, and coding.
HiL-Bench shows frontier AI agents fail to ask for help on incomplete tasks, recovering only a fraction of full-information performance, but RL training on Ask-F1 reward improves judgment and transfers across domains.
M2-Verify is a new multidomain benchmark dataset for multimodal scientific claim consistency that reveals state-of-the-art models drop from 85.8% to 61.6% Micro-F1 on complex perturbations and produce hallucinated explanations.
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
Accuracy-based rewards outperform SFT and other reward variants in GRPO training of VLMs on the PhyX physics benchmark, with attention-weight rewards raising spatial reasoning accuracy from 0.27 to 0.50.
An adaptive conformal prediction approach for LLMs enables prompt-dependent calibration that improves conditional coverage for factuality while preserving marginal guarantees and supporting selective prediction.
CTD trains a lightweight DV probe to predict escalation benefits and calibrates its threshold via multiple hypothesis testing on held-out data to deliver finite-sample guarantees on delegation rate while outperforming uncertainty-based cascades on safety tasks.
I-DLM matches same-scale autoregressive model quality in diffusion language models by enforcing introspective consistency via strided decoding, outperforming prior DLMs on 15 benchmarks including 69.6 on AIME-24.
OASIS tracks an evolving low-dimensional activation subspace to project activations, gradients, and optimizer states, cutting peak memory up to 2x versus full fine-tuning while matching performance on finetuning and pretraining tasks.
Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.
DiADEM learns demographic importance weights to model annotator disagreement distributions and outperforms LLM and neural baselines on disagreement tracking in DICES and VOICED benchmarks.
ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.
Beam search for candidate generation in consistency-based UQ for LLMs reduces variance and improves performance over multinomial sampling on six QA datasets, supported by a theoretical lower bound on beam-set probability mass.
SkillWrapper learns human-interpretable symbolic representations of robot skills from images via foundation models, yielding operators for provably sound and complete planning on long-horizon tasks.
Turbo-DDCM accelerates DDCM-based zero-shot image compression by batching noise vectors per step while preserving performance and adding priority-aware and PSNR-targeted variants.
Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific
A new expert-principle preference dataset enables an 8B LLM to reach 87% win rate vs GPT-4o on counseling responses through standard preference optimization.
PGT optimizes latent goal embeddings for frozen policies via trajectory-level preference objectives, reporting 72-81.6% relative gains on 17 Minecraft tasks and 13.4% better OOD performance than fine-tuning.
A dictionary learning method optimizes weighted kernels via gradients for kEDMD to approximate Koopman operators, with pruning of unimportant kernels based on learned weights.
A multi-objective LLM unlearning approach standardizes data into unified domain representations and applies bidirectional logit distillation to align objectives and achieve balanced state-of-the-art results across efficacy, utility, boundary preservation, and robustness.
AVR trains vision-language models to adaptively select among full reasoning, perception-only, or direct-answer formats using a modified policy optimization method, reducing token use by 50-90% with little accuracy loss.
Context-sensitive similarity computation from embeddings improves odd-one-out accuracy by up to 15% over context-insensitive baselines for human visual alignment.
citing papers explorer
-
Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning
EnsembleCert and ScaLabelCert enable tighter and exact certificates for neural network robustness against label-flipping attacks by leveraging white-box information and neural tangent kernel equivalence.
-
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
-
CocoaBench: Evaluating Unified Digital Agents in the Wild
CocoaBench shows the best tested unified digital agents succeed on only 45.1% of human-designed tasks that demand integrated vision, search, and coding.
-
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
HiL-Bench shows frontier AI agents fail to ask for help on incomplete tasks, recovering only a fraction of full-information performance, but RL training on Ask-F1 reward improves judgment and transfers across domains.
-
M2-Verify: A Large-Scale Multidomain Benchmark for Checking Multimodal Claim Consistency
M2-Verify is a new multidomain benchmark dataset for multimodal scientific claim consistency that reveals state-of-the-art models drop from 85.8% to 61.6% Micro-F1 on complex perturbations and produce hallucinated explanations.
-
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
-
Reward Design for Physical Reasoning in Vision-Language Models
Accuracy-based rewards outperform SFT and other reward variants in GRPO training of VLMs on the PhyX physics benchmark, with attention-weight rewards raising spatial reasoning accuracy from 0.27 to 0.50.
-
Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models
An adaptive conformal prediction approach for LLMs enables prompt-dependent calibration that improves conditional coverage for factuality while preserving marginal guarantees and supporting selective prediction.
-
Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades
CTD trains a lightweight DV probe to predict escalation benefits and calibrates its threshold via multiple hypothesis testing on held-out data to deliver finite-sample guarantees on delegation rate while outperforming uncertainty-based cascades on safety tasks.
-
Introspective Diffusion Language Models
I-DLM matches same-scale autoregressive model quality in diffusion language models by enforcing introspective consistency via strided decoding, outperforming prior DLMs on 15 benchmarks including 69.6 on AIME-24.
-
OASIS: Online Activation Subspace Learning for Memory-Efficient Training
OASIS tracks an evolving low-dimensional activation subspace to project activations, gradients, and optimizer states, cutting peak memory up to 2x versus full fine-tuning while matching performance on finetuning and pretraining tasks.
-
MixFlow: Mixed Source Distributions Improve Rectified Flows
Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.
-
Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM
DiADEM learns demographic importance weights to model annotator disagreement distributions and outperforms LLM and neural baselines on disagreement tracking in DICES and VOICED benchmarks.
-
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis
ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.
-
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
Beam search for candidate generation in consistency-based UQ for LLMs reduces variance and improves performance over multinomial sampling on six QA datasets, supported by a theoretical lower bound on beam-set probability mass.
-
SkillWrapper: Generative Predicate Invention for Task-level Planning
SkillWrapper learns human-interpretable symbolic representations of robot skills from images via foundation models, yielding operators for provably sound and complete planning on long-horizon tasks.
-
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
Turbo-DDCM accelerates DDCM-based zero-shot image compression by batching noise vectors per step while preserving performance and adding priority-aware and PSNR-targeted variants.
-
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific
-
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
A new expert-principle preference dataset enables an 8B LLM to reach 87% win rate vs GPT-4o on counseling responses through standard preference optimization.
-
Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies
PGT optimizes latent goal embeddings for frozen policies via trajectory-level preference objectives, reporting 72-81.6% relative gains on 17 Minecraft tasks and 13.4% better OOD performance than fine-tuning.
-
Dictionary learning for Kernel EDMD
A dictionary learning method optimizes weighted kernels via gradients for kEDMD to approximate Koopman operators, with pruning of unimportant kernels based on learned weights.
-
Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation
A multi-objective LLM unlearning approach standardizes data into unified domain representations and applies bidirectional logit distillation to align objectives and achieve balanced state-of-the-art results across efficacy, utility, boundary preservation, and robustness.
-
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
AVR trains vision-language models to adaptively select among full reasoning, perception-only, or direct-answer formats using a modified policy optimization method, reducing token use by 50-90% with little accuracy loss.
-
Context Sensitivity Improves Human-Machine Visual Alignment
Context-sensitive similarity computation from embeddings improves odd-one-out accuracy by up to 15% over context-insensitive baselines for human visual alignment.
-
Self-Aligned Reward: Towards Effective and Efficient Reasoners
Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.