L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.
TRACE removes 47.3% of text from clinical notes by targeting bloat and preserves performance on information extraction and outcome prediction tasks.
LLMs primed with verified data reports predict agent solution quality at 61.5% accuracy, powering a Predict-then-Verify agent that converges 6x faster than execution-only baselines.
Formalizes proxy tasks and a protocol for CSAI detection model design that avoids direct use of sensitive data, demonstrated via few-shot indoor scene classification with reported success on real CSAI imagery.
Experiments on real datasets find that balancing methods increase predictive multiplicity in Rashomon sets of models, measured via ambiguity, discrepancy, and a new obscurity metric.
citing papers explorer
-
Prior-Aligned Data Cleaning for Tabular Foundation Models
L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
-
Pioneer Agent: Continual Improvement of Small Language Models in Production
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.
-
Clinical Note Bloat Reduction for Efficient LLM Use
TRACE removes 47.3% of text from clinical notes by targeting bloat and preserves performance on information extraction and outcome prediction tasks.
-
Can We Predict Before Executing Machine Learning Agents?
LLMs primed with verified data reports predict agent solution quality at 61.5% accuracy, powering a Predict-then-Verify agent that converges 6x faster than execution-only baselines.
-
Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models
Formalizes proxy tasks and a protocol for CSAI detection model design that avoids direct use of sensitive data, demonstrated via few-shot indoor scene classification with reported success on real CSAI imagery.
-
An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification
Experiments on real datasets find that balancing methods increase predictive multiplicity in Rashomon sets of models, measured via ambiguity, discrepancy, and a new obscurity metric.