{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:O5FGSQF52JYQJG6M4EJUGWR4RU","short_pith_number":"pith:O5FGSQF5","schema_version":"1.0","canonical_sha256":"774a6940bdd271049bcce113435a3c8d3c31947ca6c396b22ef91cc32f9ea2f9","source":{"kind":"arxiv","id":"2311.03658","version":2},"attestation_state":"computed","paper":{"title":"The Linear Representation Hypothesis and the Geometry of Large Language Models","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"High-level concepts in large language models are linear directions under a causal inner product built from counterfactual pairs.","cross_cats":["cs.AI","cs.LG","stat.ML"],"primary_cat":"cs.CL","authors_text":"Kiho Park, Victor Veitch, Yo Joong Choe","submitted_at":"2023-11-07T01:59:11Z","abstract_excerpt":"Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does \"linear representation\" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of \"linear representation\", one in the output (word) representation space, and one in the input (sentence) space. We then prove these conn"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2311.03658","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","primary_cat":"cs.CL","submitted_at":"2023-11-07T01:59:11Z","cross_cats_sorted":["cs.AI","cs.LG","stat.ML"],"title_canon_sha256":"4f38a4423afec1c2192b83aca612444daf27dedf0b6ae368025085f67b69be7f","abstract_canon_sha256":"52fc5acc1032b265edd952df1098bf1b816a082f7237490d8038ab7862d67fac"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-20T00:00:14.504209Z","signature_b64":"wBApiqTmHRDwzDH+9ZvDbgH8kjym8zG8SL64vMY0eup2GmR6En7+MR5Hs7KGsmE0VbxVe/YgVwe0akSABWtICw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"774a6940bdd271049bcce113435a3c8d3c31947ca6c396b22ef91cc32f9ea2f9","last_reissued_at":"2026-05-20T00:00:14.503329Z","signature_status":"signed_v1","first_computed_at":"2026-05-20T00:00:14.503329Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"The Linear Representation Hypothesis and the Geometry of Large Language Models","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"High-level concepts in large language models are linear directions under a causal inner product built from counterfactual pairs.","cross_cats":["cs.AI","cs.LG","stat.ML"],"primary_cat":"cs.CL","authors_text":"Kiho Park, Victor Veitch, Yo Joong Choe","submitted_at":"2023-11-07T01:59:11Z","abstract_excerpt":"Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does \"linear representation\" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of \"linear representation\", one in the output (word) representation space, and one in the input (sentence) space. We then prove these conn"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that the identified non-Euclidean inner product respects language structure in the precise sense required to unify probing and steering, and that counterfactual pairs can be reliably constructed or approximated in the model.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Linear representations of high-level concepts in LLMs are formalized via counterfactuals in input and output spaces, unified under a causal inner product that enables consistent probing and steering.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"High-level concepts in large language models are linear directions under a causal inner product built from counterfactual pairs.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c93988df41b25e5342f6f134c613f3024f88c1bf67e49eb4bbd03da16ec2c409"},"source":{"id":"2311.03658","kind":"arxiv","version":2},"verdict":{"id":"86afa7eb-3a40-40d5-86e8-56e7949c4e8a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-11T21:37:51.886037Z","strongest_claim":"Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs.","one_line_summary":"Linear representations of high-level concepts in LLMs are formalized via counterfactuals in input and output spaces, unified under a causal inner product that enables consistent probing and steering.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that the identified non-Euclidean inner product respects language structure in the precise sense required to unify probing and steering, and that counterfactual pairs can be reliably constructed or approximated in the model.","pith_extraction_headline":"High-level concepts in large language models are linear directions under a causal inner product built from counterfactual pairs."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2311.03658/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":30,"sample":[{"doi":"10.18653/v1/k16-1002","year":2022,"title":"doi: 10.18653/v1/K16-1002","work_id":"54f0083e-6c5a-47d8-9d80-a6ed0da3f854","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Word embed- dings, analogies, and machine learning: Beyond king - man + woman = queen","work_id":"884ff28a-7e39-45f3-a903-eb6950c0b799","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Toy Models of Superposition","work_id":"43875dbe-bc2d-4ab5-af63-744411533ff7","ref_index":3,"cited_arxiv_id":"2209.10652","is_internal_anchor":true},{"doi":"","year":2019,"title":"How contextual are contextualized word rep- resentations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings","work_id":"d68d8fa1-a3bb-459b-ab72-d52f506d7a78","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2020.conll-1.29","year":2020,"title":"doi: 10.18653/v1/2020.conll-1.29","work_id":"1eb9b7d9-cd8a-45e6-9636-c1dab1d9f4d5","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":30,"snapshot_sha256":"bd1b684c9f9c68682f6f9b55a65b560783157ab45073589cc574a93476b78276","internal_anchors":8},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2311.03658","created_at":"2026-05-20T00:00:14.503451+00:00"},{"alias_kind":"arxiv_version","alias_value":"2311.03658v2","created_at":"2026-05-20T00:00:14.503451+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2311.03658","created_at":"2026-05-20T00:00:14.503451+00:00"},{"alias_kind":"pith_short_12","alias_value":"O5FGSQF52JYQ","created_at":"2026-05-20T00:00:14.503451+00:00"},{"alias_kind":"pith_short_16","alias_value":"O5FGSQF52JYQJG6M","created_at":"2026-05-20T00:00:14.503451+00:00"},{"alias_kind":"pith_short_8","alias_value":"O5FGSQF5","created_at":"2026-05-20T00:00:14.503451+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":45,"internal_anchor_count":45,"sample":[{"citing_arxiv_id":"2605.23040","citing_title":"Steered Generation via Gradient-Based Optimization on Sparse Query Features","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23556","citing_title":"Is Dimensionality a Barrier for Retrieval Models?","ref_index":149,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21770","citing_title":"Manifold-Guided Attention Steering","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22532","citing_title":"Relational Linear Properties in Language Models: An Empirical Investigation","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20202","citing_title":"Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20745","citing_title":"The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20693","citing_title":"Interpretable Discriminative Text Representations via Agreement and Label Disentanglement","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15687","citing_title":"ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03258","citing_title":"The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2510.18184","citing_title":"ActivationReasoning: Logical Reasoning in Latent Activation Spaces","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12809","citing_title":"Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces","ref_index":248,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12813","citing_title":"REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02608","citing_title":"Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2406.11717","citing_title":"Refusal in Language Models Is Mediated by a Single Direction","ref_index":169,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12412","citing_title":"Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space","ref_index":86,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27169","citing_title":"Semantic Structure of Feature Space in Large Language Models","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09011","citing_title":"A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07922","citing_title":"Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09195","citing_title":"The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18901","citing_title":"Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03258","citing_title":"The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2312.06681","citing_title":"Steering Llama 2 via Contrastive Activation Addition","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05653","citing_title":"Negative Before Positive: Asymmetric Valence Processing in Large Language Models","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05715","citing_title":"Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes","ref_index":53,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05443","citing_title":"SLAM: Structural Linguistic Activation Marking for Language Models","ref_index":19,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU","json":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU.json","graph_json":"https://pith.science/api/pith-number/O5FGSQF52JYQJG6M4EJUGWR4RU/graph.json","events_json":"https://pith.science/api/pith-number/O5FGSQF52JYQJG6M4EJUGWR4RU/events.json","paper":"https://pith.science/paper/O5FGSQF5"},"agent_actions":{"view_html":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU","download_json":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU.json","view_paper":"https://pith.science/paper/O5FGSQF5","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2311.03658&json=true","fetch_graph":"https://pith.science/api/pith-number/O5FGSQF52JYQJG6M4EJUGWR4RU/graph.json","fetch_events":"https://pith.science/api/pith-number/O5FGSQF52JYQJG6M4EJUGWR4RU/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU/action/timestamp_anchor","attest_storage":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU/action/storage_attestation","attest_author":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU/action/author_attestation","sign_citation":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU/action/citation_signature","submit_replication":"https://pith.science/pith/O5FGSQF52JYQJG6M4EJUGWR4RU/action/replication_record"}},"created_at":"2026-05-20T00:00:14.503451+00:00","updated_at":"2026-05-20T00:00:14.503451+00:00"}