Investigating Concept Alignment Using Implausible Category Members

Brenden M. Lake; Sunayana Rane; Thomas L. Griffiths

arxiv: 2605.21683 · v1 · pith:NMPFU372new · submitted 2026-05-20 · 💻 cs.AI

Investigating Concept Alignment Using Implausible Category Members

Sunayana Rane , Brenden M. Lake , Thomas L. Griffiths This is my paper

Pith reviewed 2026-05-22 09:08 UTC · model grok-4.3

classification 💻 cs.AI

keywords concept alignmentAI safetycategory membershipimplausible examplescognitive psychologylarge language modelsconcept boundariesRosch Mervis

0 comments

The pith

AI models assign implausible objects to categories differently from humans, such as treating words as vehicles or vegetables as fruit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests AI concept understanding by asking models to judge implausible members of everyday categories, like whether an olive belongs to vehicles. This avoids relying on training data patterns from plausible examples and instead probes the boundaries humans take for granted. They draw objects from a classic psychological study and compare model answers to human judgments on both matching and mismatched categories. The work shows clear differences for several concepts and links those differences to potential safety problems in deployed systems.

Core claim

By presenting models and humans with the same objects assigned to both correct and mismatched superordinate categories drawn from Rosch and Mervis, the study finds that current AI systems place certain implausible items into categories in ways that diverge from human patterns, including words into vehicles or clothing, vegetable exemplars into fruit, and non-weapon items into weapons.

What carries the argument

Implausible category members used as probes to map concept boundaries, contrasted with human assignments on within-category and cross-category tasks.

If this is right

Misaligned concept boundaries can produce unsafe or unexpected behavior in downstream applications.
Probing with implausible examples provides a practical way to detect gaps before deployment.
Alignment efforts must address not only typical cases but also the edges of categories.
Human-like concept understanding requires more than pattern matching on common examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to test whether additional training or architectural changes reduce specific mismatches.
Similar probes might reveal alignment issues in other modalities such as vision-language models.
The observed differences suggest limits to how well current systems capture the graded structure of human categories.

Load-bearing premise

Assignments given to implausible members reflect genuine concept-level knowledge rather than training artifacts or prompt effects.

What would settle it

A controlled experiment in which the same models produce human-like assignment patterns across the full set of implausible within-category and cross-category items.

Figures

Figures reproduced from arXiv: 2605.21683 by Brenden M. Lake, Sunayana Rane, Thomas L. Griffiths.

**Figure 2.** Figure 2: Top 28 questions producing the highest collective human-AI disagreement, ranked by [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The idiosyncratic responses reveal interesting differences between models that may reflect variation in training data. GPT-4o is willing to consider a watermelon to be a vegetable and a train clothing. In fact, watermelons were declared to be a vegetable by the Oklahoma state legislature in order to be named the official state vegetable, and a train can be long attachment to a dress. Why the model consider… view at source ↗

**Figure 3.** Figure 3: Examples of questions for which individual models produced idiosyncratic responses. In [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Developing AI systems with a human-like understanding of everyday concepts is a key step towards developing safe, reliable systems whose behavior makes sense to humans. When probing concept understanding, asking questions about plausible category members (e.g., "Is a car a vehicle?") is likely to recall patterns in the model's vast training data. We pursue an alternative strategy, characterizing the boundaries of conceptual categories by asking about implausible category members (e.g., "Is an olive a vehicle?") to probe the kind of concept-level knowledge we take for granted in fellow humans. We characterize concept boundaries for a set of fundamental concepts by studying AI systems' assignments of objects to superordinate categories from a classic psychological study by Rosch and Mervis, as well as their assignments of the same objects to mismatched superordinate categories. We compare these assignments to those made by human participants on the full range of within-category and cross-category assignment tasks. Our results reveal a range of concepts for which which models differ in meaningful and surprising ways from humans, including treating "words" as belonging to categories like "vehicles" and "clothing," identifying several "vegetable" category members as "fruit," and assigning exemplars from non-weapon categories to the "weapons" category. We also demonstrate how these instances of concept misalignment translate into problematic downstream behavior with implications for AI safety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes probing AI models' conceptual category boundaries using implausible category members (e.g., 'Is an olive a vehicle?') drawn from Rosch and Mervis's classic psychological study, rather than plausible members that may reflect training data. It compares model assignments of objects to superordinate categories (and mismatched categories) against human judgments, reports specific misalignments such as models treating 'words' as vehicles/clothing, vegetables as fruit, and non-weapon exemplars as weapons, and discusses downstream implications for AI safety.

Significance. If the empirical findings are robust, the work offers a psychologically grounded method for identifying concept-level differences between current AI systems and humans that could inform safer and more interpretable AI. The implausible-member strategy is a clear strength for avoiding direct recall of training patterns, and the direct human-model comparison provides concrete examples of misalignment with potential safety relevance.

major comments (2)

[Methods / Experimental Setup] The manuscript provides no description of the specific models tested, prompt templates, temperature or sampling settings, or controls for response bias and prompt sensitivity. This is load-bearing for the central claim because, without such details, the reported misalignments (e.g., words assigned to vehicles) cannot be distinguished from training-data artifacts or default response tendencies under the chosen query format.
[Results] Results are presented via selected qualitative examples without statistical tests, inter-rater reliability measures, quantitative agreement scores with human data across all categories, or error analysis. This undermines the claim of 'meaningful and surprising' differences because it leaves open whether the observed patterns are systematic or attributable to a small number of prompt-dependent cases.

minor comments (2)

[Abstract] Typographical error in the abstract: 'for which which models' contains a duplicated word.
[Introduction] The abstract and introduction refer to 'human participants on the full range of within-category and cross-category assignment tasks' but do not clarify whether new human data were collected or whether the comparison relies on the original Rosch & Mervis norms; this should be stated explicitly for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and have revised the manuscript to provide greater transparency and rigor where the points raised are valid.

read point-by-point responses

Referee: [Methods / Experimental Setup] The manuscript provides no description of the specific models tested, prompt templates, temperature or sampling settings, or controls for response bias and prompt sensitivity. This is load-bearing for the central claim because, without such details, the reported misalignments (e.g., words assigned to vehicles) cannot be distinguished from training-data artifacts or default response tendencies under the chosen query format.

Authors: We agree that the original manuscript omitted key methodological details. The revised version now includes a dedicated Methods section that specifies the exact models and versions tested, the full prompt templates, temperature and sampling parameters, and the controls used to assess prompt sensitivity and response bias (including multiple prompt phrasings and consistency checks). These additions directly address the concern that observed misalignments could be artifacts of the query format. revision: yes
Referee: [Results] Results are presented via selected qualitative examples without statistical tests, inter-rater reliability measures, quantitative agreement scores with human data across all categories, or error analysis. This undermines the claim of 'meaningful and surprising' differences because it leaves open whether the observed patterns are systematic or attributable to a small number of prompt-dependent cases.

Authors: We accept that the initial presentation was primarily qualitative. The revised manuscript adds quantitative agreement metrics (e.g., category-level accuracy and correlation with human judgments across the full stimulus set), reports inter-rater reliability for the human data, includes basic statistical comparisons where sample sizes permit, and provides a systematic error analysis to show that the reported misalignments are not limited to isolated prompt-dependent cases. These changes strengthen the evidence that the differences are systematic. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical comparison of model and human category judgments

full rationale

The paper is a purely empirical study that elicits category membership judgments from language models on plausible and implausible exemplars drawn from Rosch & Mervis (1975) and directly compares those judgments to new human data collected under the same protocol. No equations, parameters, or derivations appear; the central results are raw assignment frequencies and qualitative differences between models and humans. The cited Rosch & Mervis work is an independent, decades-old external reference rather than a self-citation, and no fitted inputs are relabeled as predictions. The derivation chain is therefore self-contained and does not reduce to its own outputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the work is a direct empirical comparison without mathematical modeling or new postulated constructs.

pith-pipeline@v0.9.0 · 5770 in / 997 out tokens · 39723 ms · 2026-05-22T09:08:28.725250+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We characterize concept boundaries for a set of fundamental concepts by studying AI systems' assignments of objects to superordinate categories from a classic psychological study by Rosch and Mervis, as well as their assignments of the same objects to mismatched superordinate categories.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our results reveal a range of concepts for which models differ in meaningful and surprising ways from humans, including treating 'words' as belonging to categories like 'vehicles' and 'clothing,' identifying several 'vegetable' category members as 'fruit,' and assigning exemplars from non-weapon categories to the 'weapons' category.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 5 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam et al. “GPT-4 technical report”. In:arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Towards robust interpretability with self- explaining neural networks

David Alvarez Melis and Tommi Jaakkola. “Towards robust interpretability with self- explaining neural networks”. In:Advances in Neural Information Processing Systems31 (2018)

work page 2018
[3]

Capturing human cate- gorization of natural images by combining deep networks and cognitive models

Ruairidh M Battleday, Joshua C Peterson, and Thomas L Griffiths. “Capturing human cate- gorization of natural images by combining deep networks and cognitive models”. In:Nature Communications11.1 (2020), p. 5418

work page 2020
[4]

Using cognitive psychology to understand GPT-3

Marcel Binz and Eric Schulz. “Using cognitive psychology to understand GPT-3”. In:Pro- ceedings of the National Academy of Sciences120.6 (2023), e2218523120

work page 2023
[5]

New York: John Wiley & Sons, 1956

Jerome S Bruner, Jacqueline J Goodnow, and George Austin.A study of thinking. New York: John Wiley & Sons, 1956

work page 1956
[6]

This looks like that: deep learning for interpretable image recognition

Chaofan Chen et al. “This looks like that: deep learning for interpretable image recognition”. In:Advances in Neural Information Processing Systems32 (2019)

work page 2019
[7]

Concept whitening for interpretable image recogni- tion

Zhi Chen, Yijie Bei, and Cynthia Rudin. “Concept whitening for interpretable image recogni- tion”. In:Nature Machine Intelligence2.12 (2020), pp. 772–782

work page 2020
[8]

Distinguishing rule and exemplar-based generalization in learning systems

Ishita Dasgupta, Erin Grant, and Thomas L. Griffiths. “Distinguishing rule and exemplar-based generalization in learning systems”. In:International Conference on Machine Learning. 2022, pp. 4816–4830

work page 2022
[9]

Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality

Fabrizio Dell’Acqua et al. “Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality”. In:Harvard Business School Technology & Operations Management Unit Working Paper24-013 (2023)

work page 2023
[10]

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable machine learning”. In:arXiv preprint arXiv:1702.08608(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team et al. “Gemini: a family of highly capable multimodal models”. In:arXiv preprint arXiv:2312.11805(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Addressing leakage in concept bottle- neck models

Marton Havasi, Sonali Parbhoo, and Finale Doshi-Velez. “Addressing leakage in concept bottle- neck models”. In:Advances in Neural Information Processing Systems35 (2022), pp. 23386– 23397

work page 2022
[13]

Self-destructing models: Increasing the costs of harmful dual uses of foundation models

Peter Henderson et al. “Self-destructing models: Increasing the costs of harmful dual uses of foundation models”. In:Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 2023, pp. 287–296

work page 2023
[14]

Auxiliary task demands mask the capabilities of smaller language models

Jennifer Hu and Michael C Frank. “Auxiliary task demands mask the capabilities of smaller language models”. In:arXiv preprint arXiv:2404.02418(2024)

work page arXiv 2024
[15]

Quantitative aspects of evolution of concepts: An experimental study

Clark L Hull. “Quantitative aspects of evolution of concepts: An experimental study.” In: Psychological monographs28 (1920). 10

work page 1920
[16]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

Been Kim et al. “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)”. In:International Conference on Machine Learning. 2018, pp. 2668– 2677

work page 2018
[17]

Concept bottleneck models

Pang Wei Koh et al. “Concept bottleneck models”. In:International Conference on Machine Learning. 2020, pp. 5338–5348

work page 2020
[18]

Levels of Analysis for Large Language Models

Alexander Ku et al. “Levels of Analysis for Large Language Models”. In:arXiv preprint arXiv:2503.13401(2025)

work page arXiv 2025
[19]

Word meaning in minds and machines

Brenden M. Lake and Gregory L. Murphy. “Word meaning in minds and machines”. In: Psychological Review130 (2023), pp. 401–431

work page 2023
[20]

Interpretability Beyond Classification Output: Semantic Bottleneck Networks

Max Losch, Mario Fritz, and Bernt Schiele. “Interpretability beyond classification output: Semantic bottleneck networks”. In:arXiv preprint arXiv:1907.10882(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[21]

Glancenets: Interpretable, leak- proof concept-based models

Emanuele Marconato, Andrea Passerini, and Stefano Teso. “Glancenets: Interpretable, leak- proof concept-based models”. In:Advances in Neural Information Processing Systems35 (2022), pp. 21212–21227

work page 2022
[22]

Thomas McCoy and Shunyu Yao and Dan Friedman and Matthew Hardy and Thomas L

R Thomas McCoy et al. “Embers of autoregression: Understanding large language models through the problem they are trained to solve”. In:arXiv preprint arXiv:2309.13638(2023)

work page arXiv 2023
[23]

MIT Press, 2024

Gregory L Murphy.Categories we live by: How we classify everyone and everything. MIT Press, 2024

work page 2024
[24]

What are categories and concepts

Gregory L Murphy. “What are categories and concepts”. In:The making of human concepts (2010), pp. 11–28

work page 2010
[25]

On the genesis of abstract ideas

Michael I Posner and Steven W Keele. “On the genesis of abstract ideas.” In:Journal of Experimental Psychology77.3p1 (1968), p. 353

work page 1968
[26]

Concept alignment

Sunayana Rane et al. “Concept alignment”. In:arXiv preprint arXiv:2401.08672(2024)

work page arXiv 2024
[27]

Concept Alignment as a Prerequisite for Value Alignment

Sunayana Rane et al. “Concept Alignment as a Prerequisite for Value Alignment”. In:Pro- ceedings of the Annual Meeting of the Cognitive Science Society. V ol. 46. 2024

work page 2024
[28]

Position: Principles of Animal Cognition to Improve LLM Evaluations

Sunayana Rane et al. “Position: Principles of Animal Cognition to Improve LLM Evaluations”. In:F orty-second International Conference on Machine Learning Position Paper Track. 2025

work page 2025
[29]

Family resemblances: Studies in the internal structure of categories

Eleanor Rosch and Carolyn B Mervis. “Family resemblances: Studies in the internal structure of categories”. In:Cognitive Psychology7.4 (1975), pp. 573–605

work page 1975
[30]

Basic objects in natural categories

Eleanor Rosch et al. “Basic objects in natural categories”. In:Cognitive Psychology8.3 (1976), pp. 382–439

work page 1976
[31]

Categories, concepts, and conceptual development

Vladimir M Sloutsky and Wei Deng. “Categories, concepts, and conceptual development”. In: Language, cognition and neuroscience34.10 (2019), pp. 1284–1297

work page 2019
[32]

Getting aligned on representational alignment

Ilia Sucholutsky et al. “Getting aligned on representational alignment”. In:Transactions on Machine Learning Research(2025)

work page 2025
[33]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

Do large language models per- form the way people expect? Measuring the human generalization function

Keyon Vafa, Ashesh Rambachan, and Sendhil Mullainathan. “Do large language models per- form the way people expect? Measuring the human generalization function”. In:International Conference on Machine Learning. 2024, pp. 48919–48937

work page 2024
[35]

Categories and concepts

Iven Van Mechelen et al. “Categories and concepts”. In:Academic Press New York(1993). 11

work page 1993

[1] [1]

GPT-4 Technical Report

Josh Achiam et al. “GPT-4 technical report”. In:arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Towards robust interpretability with self- explaining neural networks

David Alvarez Melis and Tommi Jaakkola. “Towards robust interpretability with self- explaining neural networks”. In:Advances in Neural Information Processing Systems31 (2018)

work page 2018

[3] [3]

Capturing human cate- gorization of natural images by combining deep networks and cognitive models

Ruairidh M Battleday, Joshua C Peterson, and Thomas L Griffiths. “Capturing human cate- gorization of natural images by combining deep networks and cognitive models”. In:Nature Communications11.1 (2020), p. 5418

work page 2020

[4] [4]

Using cognitive psychology to understand GPT-3

Marcel Binz and Eric Schulz. “Using cognitive psychology to understand GPT-3”. In:Pro- ceedings of the National Academy of Sciences120.6 (2023), e2218523120

work page 2023

[5] [5]

New York: John Wiley & Sons, 1956

Jerome S Bruner, Jacqueline J Goodnow, and George Austin.A study of thinking. New York: John Wiley & Sons, 1956

work page 1956

[6] [6]

This looks like that: deep learning for interpretable image recognition

Chaofan Chen et al. “This looks like that: deep learning for interpretable image recognition”. In:Advances in Neural Information Processing Systems32 (2019)

work page 2019

[7] [7]

Concept whitening for interpretable image recogni- tion

Zhi Chen, Yijie Bei, and Cynthia Rudin. “Concept whitening for interpretable image recogni- tion”. In:Nature Machine Intelligence2.12 (2020), pp. 772–782

work page 2020

[8] [8]

Distinguishing rule and exemplar-based generalization in learning systems

Ishita Dasgupta, Erin Grant, and Thomas L. Griffiths. “Distinguishing rule and exemplar-based generalization in learning systems”. In:International Conference on Machine Learning. 2022, pp. 4816–4830

work page 2022

[9] [9]

Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality

Fabrizio Dell’Acqua et al. “Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality”. In:Harvard Business School Technology & Operations Management Unit Working Paper24-013 (2023)

work page 2023

[10] [10]

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable machine learning”. In:arXiv preprint arXiv:1702.08608(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team et al. “Gemini: a family of highly capable multimodal models”. In:arXiv preprint arXiv:2312.11805(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Addressing leakage in concept bottle- neck models

Marton Havasi, Sonali Parbhoo, and Finale Doshi-Velez. “Addressing leakage in concept bottle- neck models”. In:Advances in Neural Information Processing Systems35 (2022), pp. 23386– 23397

work page 2022

[13] [13]

Self-destructing models: Increasing the costs of harmful dual uses of foundation models

Peter Henderson et al. “Self-destructing models: Increasing the costs of harmful dual uses of foundation models”. In:Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 2023, pp. 287–296

work page 2023

[14] [14]

Auxiliary task demands mask the capabilities of smaller language models

Jennifer Hu and Michael C Frank. “Auxiliary task demands mask the capabilities of smaller language models”. In:arXiv preprint arXiv:2404.02418(2024)

work page arXiv 2024

[15] [15]

Quantitative aspects of evolution of concepts: An experimental study

Clark L Hull. “Quantitative aspects of evolution of concepts: An experimental study.” In: Psychological monographs28 (1920). 10

work page 1920

[16] [16]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

Been Kim et al. “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)”. In:International Conference on Machine Learning. 2018, pp. 2668– 2677

work page 2018

[17] [17]

Concept bottleneck models

Pang Wei Koh et al. “Concept bottleneck models”. In:International Conference on Machine Learning. 2020, pp. 5338–5348

work page 2020

[18] [18]

Levels of Analysis for Large Language Models

Alexander Ku et al. “Levels of Analysis for Large Language Models”. In:arXiv preprint arXiv:2503.13401(2025)

work page arXiv 2025

[19] [19]

Word meaning in minds and machines

Brenden M. Lake and Gregory L. Murphy. “Word meaning in minds and machines”. In: Psychological Review130 (2023), pp. 401–431

work page 2023

[20] [20]

Interpretability Beyond Classification Output: Semantic Bottleneck Networks

Max Losch, Mario Fritz, and Bernt Schiele. “Interpretability beyond classification output: Semantic bottleneck networks”. In:arXiv preprint arXiv:1907.10882(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907

[21] [21]

Glancenets: Interpretable, leak- proof concept-based models

Emanuele Marconato, Andrea Passerini, and Stefano Teso. “Glancenets: Interpretable, leak- proof concept-based models”. In:Advances in Neural Information Processing Systems35 (2022), pp. 21212–21227

work page 2022

[22] [22]

Thomas McCoy and Shunyu Yao and Dan Friedman and Matthew Hardy and Thomas L

R Thomas McCoy et al. “Embers of autoregression: Understanding large language models through the problem they are trained to solve”. In:arXiv preprint arXiv:2309.13638(2023)

work page arXiv 2023

[23] [23]

MIT Press, 2024

Gregory L Murphy.Categories we live by: How we classify everyone and everything. MIT Press, 2024

work page 2024

[24] [24]

What are categories and concepts

Gregory L Murphy. “What are categories and concepts”. In:The making of human concepts (2010), pp. 11–28

work page 2010

[25] [25]

On the genesis of abstract ideas

Michael I Posner and Steven W Keele. “On the genesis of abstract ideas.” In:Journal of Experimental Psychology77.3p1 (1968), p. 353

work page 1968

[26] [26]

Concept alignment

Sunayana Rane et al. “Concept alignment”. In:arXiv preprint arXiv:2401.08672(2024)

work page arXiv 2024

[27] [27]

Concept Alignment as a Prerequisite for Value Alignment

Sunayana Rane et al. “Concept Alignment as a Prerequisite for Value Alignment”. In:Pro- ceedings of the Annual Meeting of the Cognitive Science Society. V ol. 46. 2024

work page 2024

[28] [28]

Position: Principles of Animal Cognition to Improve LLM Evaluations

Sunayana Rane et al. “Position: Principles of Animal Cognition to Improve LLM Evaluations”. In:F orty-second International Conference on Machine Learning Position Paper Track. 2025

work page 2025

[29] [29]

Family resemblances: Studies in the internal structure of categories

Eleanor Rosch and Carolyn B Mervis. “Family resemblances: Studies in the internal structure of categories”. In:Cognitive Psychology7.4 (1975), pp. 573–605

work page 1975

[30] [30]

Basic objects in natural categories

Eleanor Rosch et al. “Basic objects in natural categories”. In:Cognitive Psychology8.3 (1976), pp. 382–439

work page 1976

[31] [31]

Categories, concepts, and conceptual development

Vladimir M Sloutsky and Wei Deng. “Categories, concepts, and conceptual development”. In: Language, cognition and neuroscience34.10 (2019), pp. 1284–1297

work page 2019

[32] [32]

Getting aligned on representational alignment

Ilia Sucholutsky et al. “Getting aligned on representational alignment”. In:Transactions on Machine Learning Research(2025)

work page 2025

[33] [33]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[34] [34]

Do large language models per- form the way people expect? Measuring the human generalization function

Keyon Vafa, Ashesh Rambachan, and Sendhil Mullainathan. “Do large language models per- form the way people expect? Measuring the human generalization function”. In:International Conference on Machine Learning. 2024, pp. 48919–48937

work page 2024

[35] [35]

Categories and concepts

Iven Van Mechelen et al. “Categories and concepts”. In:Academic Press New York(1993). 11

work page 1993