{"work":{"id":"400e017f-8643-4166-b6da-a75d4446da80","openalex_id":null,"doi":null,"arxiv_id":"2310.06824","raw_key":null,"title":"The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets","authors":null,"authors_text":"Samuel Marks, Max Tegmark","year":2023,"venue":"cs.AI","abstract":"Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. However, this line of work is controversial, with some authors pointing out failures of these probes to generalize in basic ways, among other conceptual issues. In this work, we use high-quality datasets of simple true/false statements to study in detail the structure of LLM representations of truth, drawing on three lines of evidence: 1. Visualizations of LLM true/false statement representations, which reveal clear linear structure. 2. Transfer experiments in which probes trained on one dataset generalize to different datasets. 3. Causal evidence obtained by surgically intervening in a LLM's forward pass, causing it to treat false statements as true and vice versa. Overall, we present evidence that at sufficient scale, LLMs linearly represent the truth or falsehood of factual statements. We also show that simple difference-in-mean probes generalize as well as other probing techniques while identifying directions which are more causally implicated in model outputs.","external_url":"https://arxiv.org/abs/2310.06824","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T05:45:23.952071+00:00","pith_arxiv_id":"2310.06824","created_at":"2026-05-09T01:59:34.671014+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets","render_title":"The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets"},"hub":{"state":{"work_id":"400e017f-8643-4166-b6da-a75d4446da80","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":53,"external_cited_by_count":null,"distinct_field_count":6,"first_pith_cited_at":"2024-06-17T16:36:12+00:00","last_pith_cited_at":"2026-05-21T14:22:27+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-09T02:54:38.768811+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":7},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":5},{"context_polarity":"unclear","n":2},{"context_polarity":"use_method","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T20:26:27.130271+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Representation Engineering: A Top-Down Approach to AI Transparency","work_id":"45b326e2-e962-41a5-a542-2559e103a19b","shared_citers":15},{"title":"Steering Language Models With Activation Engineering","work_id":"d525fe06-5560-4e97-86fc-7a0e551f5b17","shared_citers":14},{"title":"The Linear Representation Hypothesis and the Geometry of Large Language Models","work_id":"a7b44adc-f2c2-4420-a27d-8ade97dd3b75","shared_citers":13},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":11},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":10},{"title":"J., Geiger, A., and Nanda, N","work_id":"6cb3c7a7-3301-449f-97b9-7e047edafdf9","shared_citers":8},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":8},{"title":"Discovering latent knowledge in language models without supervision","work_id":"a12b68bd-76a4-4837-ac7c-3ed5a60010d3","shared_citers":7},{"title":"Eliciting Latent Predictions from Transformers with the Tuned Lens","work_id":"a127314f-7424-488f-b6d7-8214650c420f","shared_citers":7},{"title":"In-context Learning and Induction Heads","work_id":"db2b0911-2758-4a2a-99dc-15b14b91bd5e","shared_citers":7},{"title":"Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small","work_id":"d1167c73-3f2a-472b-8bf5-0ec282d7988a","shared_citers":7},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":7},{"title":"Sparse Autoencoders Find Highly Interpretable Features in Language Models","work_id":"51960d72-c69f-4db8-8efd-e90e8b4d9524","shared_citers":7},{"title":"Toy Models of Superposition","work_id":"43875dbe-bc2d-4ab5-af63-744411533ff7","shared_citers":7},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":6},{"title":"HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal","work_id":"b0b0303f-2444-4789-a979-8153624312ff","shared_citers":6},{"title":"Refusal in Language Models Is Mediated by a Single Direction","work_id":"fbb9538d-8e58-4902-9fbd-b11f044bc2d5","shared_citers":6},{"title":"Steering Llama 2 via Contrastive Activation Addition","work_id":"3317feaa-e788-45fc-95aa-4ea20028b55b","shared_citers":6},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":5},{"title":"Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models","work_id":"fb24e7e7-f336-4706-bc2d-62d656b28d74","shared_citers":5},{"title":"Understanding intermediate layers using linear classifier probes","work_id":"bdc944db-4be2-44f7-950b-eaef12fab00e","shared_citers":5},{"title":"arXiv preprint arXiv:2308.09124 , year=","work_id":"25f5f724-b6d7-427f-a2f3-e8b72fd3b5e2","shared_citers":4},{"title":"Dola: Decoding by contrasting layers improves factuality in large language models","work_id":"0f1b9a7a-0623-4efc-bed3-6dd997054681","shared_citers":4},{"title":"Emergent linear representations in world models of self-supervised sequence models","work_id":"8c5160aa-2615-481f-919a-43849e0ef44d","shared_citers":4}],"time_series":[{"n":1,"year":2024},{"n":34,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T20:26:28.645276+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T20:26:33.111823+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets","claims":[{"claim_text":"Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. However, this line of work is controversial, with some authors pointing out failures of these probes to generalize in basic ways, among other conceptual issues. In this work, we use high-quality datasets of simple true/false statements to study in detail the structure of LLM representations of truth, drawing on three lines of evidence: 1. Visualizations of LL","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T20:26:25.486622+00:00"}},"summary":{"title":"The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets","claims":[{"claim_text":"Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. However, this line of work is controversial, with some authors pointing out failures of these probes to generalize in basic ways, among other conceptual issues. In this work, we use high-quality datasets of simple true/false statements to study in detail the structure of LLM representations of truth, drawing on three lines of evidence: 1. Visualizations of LL","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Representation Engineering: A Top-Down Approach to AI Transparency","work_id":"45b326e2-e962-41a5-a542-2559e103a19b","shared_citers":15},{"title":"Steering Language Models With Activation Engineering","work_id":"d525fe06-5560-4e97-86fc-7a0e551f5b17","shared_citers":14},{"title":"The Linear Representation Hypothesis and the Geometry of Large Language Models","work_id":"a7b44adc-f2c2-4420-a27d-8ade97dd3b75","shared_citers":13},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":11},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":10},{"title":"J., Geiger, A., and Nanda, N","work_id":"6cb3c7a7-3301-449f-97b9-7e047edafdf9","shared_citers":8},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":8},{"title":"Discovering latent knowledge in language models without supervision","work_id":"a12b68bd-76a4-4837-ac7c-3ed5a60010d3","shared_citers":7},{"title":"Eliciting Latent Predictions from Transformers with the Tuned Lens","work_id":"a127314f-7424-488f-b6d7-8214650c420f","shared_citers":7},{"title":"In-context Learning and Induction Heads","work_id":"db2b0911-2758-4a2a-99dc-15b14b91bd5e","shared_citers":7},{"title":"Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small","work_id":"d1167c73-3f2a-472b-8bf5-0ec282d7988a","shared_citers":7},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":7},{"title":"Sparse Autoencoders Find Highly Interpretable Features in Language Models","work_id":"51960d72-c69f-4db8-8efd-e90e8b4d9524","shared_citers":7},{"title":"Toy Models of Superposition","work_id":"43875dbe-bc2d-4ab5-af63-744411533ff7","shared_citers":7},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":6},{"title":"HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal","work_id":"b0b0303f-2444-4789-a979-8153624312ff","shared_citers":6},{"title":"Refusal in Language Models Is Mediated by a Single Direction","work_id":"fbb9538d-8e58-4902-9fbd-b11f044bc2d5","shared_citers":6},{"title":"Steering Llama 2 via Contrastive Activation Addition","work_id":"3317feaa-e788-45fc-95aa-4ea20028b55b","shared_citers":6},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":5},{"title":"Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models","work_id":"fb24e7e7-f336-4706-bc2d-62d656b28d74","shared_citers":5},{"title":"Understanding intermediate layers using linear classifier probes","work_id":"bdc944db-4be2-44f7-950b-eaef12fab00e","shared_citers":5},{"title":"arXiv preprint arXiv:2308.09124 , year=","work_id":"25f5f724-b6d7-427f-a2f3-e8b72fd3b5e2","shared_citers":4},{"title":"Dola: Decoding by contrasting layers improves factuality in large language models","work_id":"0f1b9a7a-0623-4efc-bed3-6dd997054681","shared_citers":4},{"title":"Emergent linear representations in world models of self-supervised sequence models","work_id":"8c5160aa-2615-481f-919a-43849e0ef44d","shared_citers":4}],"time_series":[{"n":1,"year":2024},{"n":34,"year":2026}],"dependency_candidates":[]},"authors":[]}}