Recognition: no theorem link
'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family
Pith reviewed 2026-05-13 17:31 UTC · model grok-4.3
The pith
BERT's contextual embeddings encode information about Italian NPN constructions across its layers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Contextual vector representations from BERT encode the Italian NPN constructional family, and layer-wise probing classifiers can systematically evaluate the information encoded across the model's internal layers to show the extent to which constructional form and meaning are reflected in these embeddings.
What carries the argument
Layer-wise probing classifiers that take contextual vector representations from BERT as input to detect and disambiguate NPN constructions.
If this is right
- Constructional form and meaning can be detected in specific layers of BERT.
- Empirical evidence supports links between constructionist theory and neural language models.
- Similar probing can be extended to other constructions and languages.
- Disambiguation of NPN meanings benefits from contextual embeddings.
Where Pith is reading between the lines
- Models might be improved by explicitly training on constructional patterns if they are not fully captured.
- Human-like processing of constructions could be tested by comparing probe results to psycholinguistic data.
- Layer-specific encoding suggests potential for targeted interventions in model fine-tuning.
Load-bearing premise
That the probing classifiers provide a reliable measure of what the model actually encodes without being affected by the choice of probe or data.
What would settle it
Observing that probing accuracy remains at chance level for all layers would show that the embeddings do not encode the NPN construction.
Figures
read the original abstract
Interpretability research has highlighted the importance of evaluating Pretrained Language Models (PLMs) and in particular contextual embeddings against explicit linguistic theories to determine what linguistic information they encode. This study focuses on the Italian NPN (noun-preposition-noun) constructional family, challenging some of the theoretical and methodological assumptions underlying previous experimental designs and extending this type of research to a lesser-investigated language. Contextual vector representations are extracted from BERT and used as input to layer-wise probing classifiers, systematically evaluating information encoded across the model's internal layers. The results shed light on the extent to which constructional form and meaning are reflected in contextual embeddings, contributing empirical evidence to the dialogue between constructionist theory and neural language modelling
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates the Italian NPN (noun-preposition-noun) construction family by extracting contextual embeddings from BERT and related models, then training layer-wise probing classifiers to assess the encoding of constructional form and meaning across layers. It challenges prior experimental assumptions, extends the approach to Italian, and aims to provide empirical evidence linking constructionist theory with neural language modeling.
Significance. If the probing results hold after appropriate controls, the work would supply useful data on how PLMs represent a specific constructional pattern in a lesser-studied language, adding to the body of interpretability studies that test linguistic theories against model representations.
major comments (2)
- [Methods] The central claim that layer-wise probes reveal the extent of constructional encoding rests on the untested assumption that classifier accuracy indexes abstract information rather than lexical co-occurrence or positional biases. No control conditions are described that hold lexical items fixed while disrupting the NPN template (e.g., noun-noun-preposition or preposition-noun-noun orders), which is required to isolate construction-specific features.
- [Abstract] The abstract supplies no quantitative results, performance metrics, error analysis, data splits, or baseline comparisons, so the soundness of the claim that the results 'shed light on' constructional encoding cannot be evaluated from the provided text.
minor comments (1)
- [Title] The phrase 'BERT's family' in the title is imprecise; it should be clarified as 'BERT-family models' or 'models in the BERT family' for consistency with standard terminology.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below and describe the revisions that will be incorporated in the next version of the manuscript.
read point-by-point responses
-
Referee: [Methods] The central claim that layer-wise probes reveal the extent of constructional encoding rests on the untested assumption that classifier accuracy indexes abstract information rather than lexical co-occurrence or positional biases. No control conditions are described that hold lexical items fixed while disrupting the NPN template (e.g., noun-noun-preposition or preposition-noun-noun orders), which is required to isolate construction-specific features.
Authors: We agree that the absence of such controls is a limitation of the current design. In the revised manuscript we will introduce control conditions that hold the lexical items fixed while disrupting the canonical NPN order (specifically NNP and PNN permutations). Probing accuracies on these controls will be reported alongside the main results to demonstrate that the classifiers are sensitive to constructional structure rather than lexical or positional cues alone. revision: yes
-
Referee: [Abstract] The abstract supplies no quantitative results, performance metrics, error analysis, data splits, or baseline comparisons, so the soundness of the claim that the results 'shed light on' constructional encoding cannot be evaluated from the provided text.
Authors: We accept that the abstract is currently too high-level. The revised abstract will include key quantitative details: the highest layer-wise probing accuracies for both form and meaning, the train/test split sizes, a lexical baseline comparison, and a brief reference to the error analysis performed. These additions will allow readers to evaluate the strength of the claims directly from the abstract. revision: yes
Circularity Check
No circularity: standard probing applied to new data without self-referential derivations
full rationale
The paper extracts contextual embeddings from BERT-family models and trains layer-wise probing classifiers to assess encoding of the Italian NPN construction. No equations, parameter fittings, or derivations appear in the work. Results are presented as empirical measurements from established probing methods on novel Italian data, without any reduction of outputs to inputs by construction, self-defined quantities, or load-bearing self-citations. The central claims rest on classifier accuracy as a direct (if imperfect) index of encoded information, which is an external methodological choice rather than a tautological re-labeling of the authors' own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family
Introduction The remarkable empirical performance obtained by Pretrained Language Models (PLMs) across a wide range of tasks has fueled enthusiasm in both computational approaches and theoretical debates aboutlanguage(Brownetal.,2020). Despitethese successes, PLMs remain largely opaque (Rogers et al., 2020). High predictive accuracy does not automatically...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[2]
Formally, the pattern consists of nominal reduplication interrupted by a preposition
ThenpnConstruction npnexpressions challenge traditional grammatical categories and motivate a model capable of cap- turing phenomena along the lexicon–syntax con- tinuum. Formally, the pattern consists of nominal reduplication interrupted by a preposition. Construction Schema N ouni PrepositionN oun i Treatingnpnexpressionsassemi-specifiedCxns accountsfor...
work page 1996
-
[3]
Related work Recent work has investigated whether LLMs en- code constructional knowledge using a variety of experimental designs. One line of research (Tay- yar Madabushi et al., 2020; Tayyar Madabushi and Bonial, 2025) examines multiple Cxns orga- nized along a gradient of schematicity, testing whether models generalize across instantiations and whether ...
work page 2020
-
[4]
Research Questions and Methodological Design As Cxns are assumed to be inherently language- specific,probingconstructionalknowledgerequires moving beyond the English-centric focus that char- acterises much of the existing literature. Moreover, thenpnCxn occupies an intermediate position on the lexicon–syntax continuum, making it a suitable test case for a...
work page 2025
-
[5]
Methods 5.1. Data Thedatasetusedinthisstudy(Gorzonietal.,2026) is derived from the Italiannpndataset presented in Masini (2024a), extended with full sentential con- texts extracted from CORIS3. The full dataset contains 3,256 attested in- stances of the Italiannpnconstructional pat- tern instantiated by the prepositionsa‘at/to’ andsu‘on’. Following the an...
-
[6]
rather than GloVe as a static baseline be- causeitssubword-basedrepresentationsarebetter suited to morphologically rich languages such as Italian, allowing us to control for lexical and inflec- tional variation. 5.3. Experimental setup For the identification task, we perform binary clas- sification (Constructionvs.Distractor). For the disambiguation task,...
work page 2025
-
[7]
Identification task The first experiment evaluates whether contextual embeddingsextractedfromBERT’smodelsencode sufficient information to distinguishnpnconstruc- tions from distractors, and analyzes how the nature of the distractor patterns affect the probing classi- fier’s behaviour. In Scivetti and Schneider (2025)’s implementa- tion, in fact, the ident...
work page 2025
-
[8]
Disambiguation task Given the very high performance achieved in the experiment about the identification ofnpnCxn, ex- tending the analysis beyond form, we now turn to examining the semantic dimension of the Cxn. Our setup is a multinomial three-class disambigua- tion problem: we only focus on the Cxn (1), (2), (3) and Cxn (4) in Table 1, which are associ-...
-
[9]
Conclusion We presented two probing experiments address- ing the identification and semantic disambiguation of Italiannpnconstructions. To this end, we intro- ducedanextendeddatasetincludingbothconstruc- tional instances and carefully designed distractors, allowing for a controlled evaluation of construction- sensitive encoding. We extended and enriched t...
work page 2025
-
[10]
First, the analysis is restricted to a single con- structional family, namely the ItaliannpnCxs
Limitations The present study is subject to several limitations. First, the analysis is restricted to a single con- structional family, namely the ItaliannpnCxs. Al- though multiple prepositions (a‘at/to’,su‘on’,per ‘by’,dopo‘after’) are included, they instantiate closely related constructions within the same con- structional network, differing primarily ...
work page 2025
-
[11]
Ethics Statement Annotators were recruited within an advanced Master’s-level course as part of structured educa- tional activities. Participation was entirely volun- tary and had no impact on students’ evaluation or academic standing. All participants were informed about the objectives of the study and the intended use of the collected data
-
[12]
Unsupervised Cross-lingual Representation Learning at Scale
Bibliographical References Ron Artstein and Massimo Poesio. 2008. Inter- coder Agreement for Computational Linguistics. Computational Linguistics, 34(4):555–596. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vec- tors with subword information.Transactions of the Association for Computational Linguistics, 5:135–146...
work page internal anchor Pith review arXiv 2008
-
[13]
Wesley Scivetti and Nathan Schneider
A primer in BERTology: What we know about how BERT works.Transactions of the AssociationforComputationalLinguistics,8:842– 866. Wesley Scivetti and Nathan Schneider. 2025. Con- struction identification and disambiguation using bert: A case study of npn. InProceedings of the 29th Conference on Computational Natural Lan- guage Learning, pages 365–376. Assoc...
work page 2025
-
[14]
bert-base-italian-cased (revision 843e404). Harish Tayyar Madabushi and Claire Bonial. 2025. Construction grammar evidence for how LLMs use context-directed extrapolation to solve tasks. InProceedingsoftheSecondInternationalWork- shoponConstructionGrammarsandNLP,pages 190–201, Düsseldorf, Germany. Association for Computational Linguistics. Harish Tayyar M...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.