DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
hub
Proceedings of the 26th annual international conference on machine learning , pages=
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 13roles
background 1polarities
background 1representative citing papers
Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
A new first-order algorithm for multi-task learning with shared linear representation achieves near-optimal error rates in constant iterations, improving existing methods by a factor of k.
StruMPL is a multi-task dense regression model that jointly addresses disjoint partial supervision, MNAR labels, and inter-task physical constraints for improved forest biomass estimation from Earth observation.
ST-TGExplainer disentangles stability and transition patterns in temporal graphs via a self-explainable TGNN guided by a disentangled information bottleneck objective to produce more faithful explanations.
QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
SwAIther-Precip uses lead-time-conditioned U-Net bias correction followed by diffusion-based super-resolution to downscale AIFS forecasts, achieving 48% CRPS reduction and ~4 km effective resolution up to 5 days lead time.
TAP couples a learner-conditioned policy with diffusion inpainting to generate and selectively inject high-utility tabular augmentations, yielding up to 15.6 pp accuracy gains and 32% RMSE reduction on seven datasets under severe scarcity.
Synthetic pre-pre-training on structured data improves LLM robustness to noisy pre-training, matching baseline loss with up to 49% fewer natural tokens for a 1B model.
GST uses gradient-based affinity metrics to form dataset groups and applies progressive scheduling, achieving 30-40% faster convergence than uniform mixture training on 14 AudioQA datasets while matching or exceeding performance.
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
ARGUS uses a Prosecutor-Defender-Umpire multi-agent setup plus RAG and chain-of-thought rewards to adapt ad policy enforcement to new regulations using minimal fresh labels.
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
citing papers explorer
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently
Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
-
Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation
A new first-order algorithm for multi-task learning with shared linear representation achieves near-optimal error rates in constant iterations, improving existing methods by a factor of k.
-
StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels
StruMPL is a multi-task dense regression model that jointly addresses disjoint partial supervision, MNAR labels, and inter-task physical constraints for improved forest biomass estimation from Earth observation.
-
ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability
ST-TGExplainer disentangles stability and transition patterns in temporal graphs via a self-explainable TGNN guided by a disentangled information bottleneck objective to produce more faithful explanations.
-
QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning
QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
-
SwAIther-Precip: Lead-Time-Aware Bias Correction Enables Kilometer-Scale Downscaling of Global AI Precipitation Forecasts over Switzerland
SwAIther-Precip uses lead-time-conditioned U-Net bias correction followed by diffusion-based super-resolution to downscale AIFS forecasts, achieving 48% CRPS reduction and ~4 km effective resolution up to 5 days lead time.
-
Active Tabular Augmentation via Policy-Guided Diffusion Inpainting
TAP couples a learner-conditioned policy with diffusion inpainting to generate and selectively inject high-utility tabular augmentations, yielding up to 15.6 pp accuracy gains and 32% RMSE reduction on seven datasets under severe scarcity.
-
Synthetic Pre-Pre-Training Improves Language Model Robustness to Noisy Pre-Training Data
Synthetic pre-pre-training on structured data improves LLM robustness to noisy pre-training, matching baseline loss with up to 49% fewer natural tokens for a 1B model.
-
Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
GST uses gradient-based affinity metrics to form dataset groups and applies progressive scheduling, achieving 30-40% faster convergence than uniform mixture training on 14 AudioQA datasets while matching or exceeding performance.
-
Temporal Aware Pruning for Efficient Diffusion-based Video Generation
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
-
ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring
ARGUS uses a Prosecutor-Defender-Umpire multi-agent setup plus RAG and chain-of-thought rewards to adapt ad policy enforcement to new regulations using minimal fresh labels.
-
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.