The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.
Smooth Adversarial Training
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.LG 2roles
background 2polarities
background 2representative citing papers
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.
citing papers explorer
-
Unsolved Problems in ML Safety
The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.
-
A Composite Activation Function for Learning Stable Binary Representations
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.