Title resolution pending

Correlation with General Aptitude

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

cs.LG · 2024-03-05 · unverdicted · novelty 6.0

WMDP is a public benchmark measuring hazardous LLM knowledge across biosecurity, cybersecurity, and chemical security, paired with RMU unlearning that reduces WMDP performance without degrading general capabilities.

Representation Engineering: A Top-Down Approach to AI Transparency

cs.LG · 2023-10-02 · unverdicted · novelty 6.0

Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.

citing papers explorer

Showing 2 of 2 citing papers.

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024-03-05 · unverdicted · none · ref 14
WMDP is a public benchmark measuring hazardous LLM knowledge across biosecurity, cybersecurity, and chemical security, paired with RMU unlearning that reduces WMDP performance without degrading general capabilities.
Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023-10-02 · unverdicted · none · ref 21
Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer