Formalizes proxy tasks and a protocol for CSAI detection model design that avoids direct use of sensitive data, demonstrated via few-shot indoor scene classification with reported success on real CSAI imagery.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
citing papers explorer
-
Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models
Formalizes proxy tasks and a protocol for CSAI detection model design that avoids direct use of sensitive data, demonstrated via few-shot indoor scene classification with reported success on real CSAI imagery.
-
DataComp-LM: In search of the next generation of training sets for language models
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
-
Medical Model Synthesis Architectures: A Case Study
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.