Discovering latent knowledge in language models without supervision

Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

extension 1

citation-polarity summary

extend 1

representative citing papers

Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Projecting LLM hidden states onto F2 algebra with 42 pairs yields 93% zero-shot accuracy on logical relations and identifies prompt-preventable late-layer collapse.

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

Attention sharpness barely predicts VLM correctness while hidden-state probes and self-consistency strongly do, with late-fusion models showing fragile reliability bottlenecks unlike early-fusion ones.

citing papers explorer

Showing 2 of 2 citing papers.

Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2 cs.LG · 2026-05-13 · unverdicted · none · ref 3
Projecting LLM hidden states onto F2 algebra with 42 pairs yields 93% zero-shot accuracy on logical relations and identifies prompt-preventable late-layer collapse.
Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits cs.AI · 2026-05-05 · unverdicted · none · ref 4
Attention sharpness barely predicts VLM correctness while hidden-state probes and self-consistency strongly do, with late-fusion models showing fragile reliability bottlenecks unlike early-fusion ones.

Discovering latent knowledge in language models without supervision

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer