arxiv: 2604.03673 · v1 · submitted 2026-04-04 · 💻 cs.CL

Recognition: no theorem link

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Greta Gorzoni , Ludovica Pannitto , Francesca Masini

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:31 UTC · model grok-4.3

classification 💻 cs.CL

keywords NPN constructionBERTprobingcontextual embeddingsItalianconstruction grammarinterpretabilitydisambiguation

0 comments

The pith

BERT's contextual embeddings encode information about Italian NPN constructions across its layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether BERT captures the Italian NPN construction, which combines two nouns with a preposition in between. Researchers extract contextual vectors from different layers of the model and use them to train probing classifiers that identify these constructions. This reveals how much of the construction's form and meaning is present in the embeddings. The study applies this to Italian, a less-studied language in such work, and questions some common assumptions in probing experiments. A sympathetic reader would care because it bridges linguistic theory on constructions with how neural models represent language.

Core claim

Contextual vector representations from BERT encode the Italian NPN constructional family, and layer-wise probing classifiers can systematically evaluate the information encoded across the model's internal layers to show the extent to which constructional form and meaning are reflected in these embeddings.

What carries the argument

Layer-wise probing classifiers that take contextual vector representations from BERT as input to detect and disambiguate NPN constructions.

If this is right

Constructional form and meaning can be detected in specific layers of BERT.
Empirical evidence supports links between constructionist theory and neural language models.
Similar probing can be extended to other constructions and languages.
Disambiguation of NPN meanings benefits from contextual embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models might be improved by explicitly training on constructional patterns if they are not fully captured.
Human-like processing of constructions could be tested by comparing probe results to psycholinguistic data.
Layer-specific encoding suggests potential for targeted interventions in model fine-tuning.

Load-bearing premise

That the probing classifiers provide a reliable measure of what the model actually encodes without being affected by the choice of probe or data.

What would settle it

Observing that probing accuracy remains at chance level for all layers would show that the embeddings do not encode the NPN construction.

Figures

Figures reproduced from arXiv: 2604.03673 by Francesca Masini, Greta Gorzoni, Ludovica Pannitto.

**Figure 1.** Figure 1: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on Construction identification for the SIMPLE configuration. As in the following plots, the accuracy of the five probing classifiers resulting from the five random splits is averaged. Dashed grey line represents FastText baseline. Continuous grey lines refer to control classifiers. Figure (1a) includes decremental trainin… view at source ↗

**Figure 2.** Figure 2: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on Construction identification for the OTHER and PSEUDO configurations. Dashed grey line represents FastText baseline. The distribution of misclassifications (see Appendix D) highlights three main patterns, which are consistently observed across all models in the BERT family. In the SIMPLE configuration, all models exhib… view at source ↗

**Figure 3.** Figure 3: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on both experiments on English data from Scivetti and Schneider (2025). Dashed grey line represents FastText baseline. Dotted grey line represents GloVe baseline. 7. Disambiguation task Given the very high performance achieved in the experiment about the identification of npn Cxn, ex- [PITH_FULL_IMAGE:figures/full_fig_p006… view at source ↗

**Figure 4.** Figure 4: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on Construction disambiguation task. Dashed grey line represents FastText baseline. Dotted grey line represents morphological FastText baseline. Continuous grey lines refer to control classifiers. Model (4a) includes decremental training configurations, line shading becomes progressively lighter as the number of training i… view at source ↗

**Figure 5.** Figure 5: reports accuracy across layers for [UNK] and PREP embeddings, together with static baselines. We can observe that [UNK] and PREP representations support robust generalisation to unseen prepositions, with performance reaching high accuracy in late layers. The task is intrinsically harder, as it is demonstrated by the drop in performance for both baselines. Nonetheless, results are consistent across differe… view at source ↗

read the original abstract

Interpretability research has highlighted the importance of evaluating Pretrained Language Models (PLMs) and in particular contextual embeddings against explicit linguistic theories to determine what linguistic information they encode. This study focuses on the Italian NPN (noun-preposition-noun) constructional family, challenging some of the theoretical and methodological assumptions underlying previous experimental designs and extending this type of research to a lesser-investigated language. Contextual vector representations are extracted from BERT and used as input to layer-wise probing classifiers, systematically evaluating information encoded across the model's internal layers. The results shed light on the extent to which constructional form and meaning are reflected in contextual embeddings, contributing empirical evidence to the dialogue between constructionist theory and neural language modelling

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript investigates the Italian NPN (noun-preposition-noun) construction family by extracting contextual embeddings from BERT and related models, then training layer-wise probing classifiers to assess the encoding of constructional form and meaning across layers. It challenges prior experimental assumptions, extends the approach to Italian, and aims to provide empirical evidence linking constructionist theory with neural language modeling.

Significance. If the probing results hold after appropriate controls, the work would supply useful data on how PLMs represent a specific constructional pattern in a lesser-studied language, adding to the body of interpretability studies that test linguistic theories against model representations.

major comments (2)

[Methods] The central claim that layer-wise probes reveal the extent of constructional encoding rests on the untested assumption that classifier accuracy indexes abstract information rather than lexical co-occurrence or positional biases. No control conditions are described that hold lexical items fixed while disrupting the NPN template (e.g., noun-noun-preposition or preposition-noun-noun orders), which is required to isolate construction-specific features.
[Abstract] The abstract supplies no quantitative results, performance metrics, error analysis, data splits, or baseline comparisons, so the soundness of the claim that the results 'shed light on' constructional encoding cannot be evaluated from the provided text.

minor comments (1)

[Title] The phrase 'BERT's family' in the title is imprecise; it should be clarified as 'BERT-family models' or 'models in the BERT family' for consistency with standard terminology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and describe the revisions that will be incorporated in the next version of the manuscript.

read point-by-point responses

Referee: [Methods] The central claim that layer-wise probes reveal the extent of constructional encoding rests on the untested assumption that classifier accuracy indexes abstract information rather than lexical co-occurrence or positional biases. No control conditions are described that hold lexical items fixed while disrupting the NPN template (e.g., noun-noun-preposition or preposition-noun-noun orders), which is required to isolate construction-specific features.

Authors: We agree that the absence of such controls is a limitation of the current design. In the revised manuscript we will introduce control conditions that hold the lexical items fixed while disrupting the canonical NPN order (specifically NNP and PNN permutations). Probing accuracies on these controls will be reported alongside the main results to demonstrate that the classifiers are sensitive to constructional structure rather than lexical or positional cues alone. revision: yes
Referee: [Abstract] The abstract supplies no quantitative results, performance metrics, error analysis, data splits, or baseline comparisons, so the soundness of the claim that the results 'shed light on' constructional encoding cannot be evaluated from the provided text.

Authors: We accept that the abstract is currently too high-level. The revised abstract will include key quantitative details: the highest layer-wise probing accuracies for both form and meaning, the train/test split sizes, a lexical baseline comparison, and a brief reference to the error analysis performed. These additions will allow readers to evaluate the strength of the claims directly from the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: standard probing applied to new data without self-referential derivations

full rationale

The paper extracts contextual embeddings from BERT-family models and trains layer-wise probing classifiers to assess encoding of the Italian NPN construction. No equations, parameter fittings, or derivations appear in the work. Results are presented as empirical measurements from established probing methods on novel Italian data, without any reduction of outputs to inputs by construction, self-defined quantities, or load-bearing self-citations. The central claims rest on classifier accuracy as a direct (if imperfect) index of encoded information, which is an external methodological choice rather than a tautological re-labeling of the authors' own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; standard assumptions of probing research (e.g., that linear classifiers can extract encoded features) are implicit but unstated.

pith-pipeline@v0.9.0 · 5422 in / 1002 out tokens · 39669 ms · 2026-05-13T17:31:17.719294+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 2 internal anchors

[1]

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Introduction The remarkable empirical performance obtained by Pretrained Language Models (PLMs) across a wide range of tasks has fueled enthusiasm in both computational approaches and theoretical debates aboutlanguage(Brownetal.,2020). Despitethese successes, PLMs remain largely opaque (Rogers et al., 2020). High predictive accuracy does not automatically...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[2]

Formally, the pattern consists of nominal reduplication interrupted by a preposition

ThenpnConstruction npnexpressions challenge traditional grammatical categories and motivate a model capable of cap- turing phenomena along the lexicon–syntax con- tinuum. Formally, the pattern consists of nominal reduplication interrupted by a preposition. Construction Schema N ouni PrepositionN oun i Treatingnpnexpressionsassemi-specifiedCxns accountsfor...

work page 1996
[3]

Related work Recent work has investigated whether LLMs en- code constructional knowledge using a variety of experimental designs. One line of research (Tay- yar Madabushi et al., 2020; Tayyar Madabushi and Bonial, 2025) examines multiple Cxns orga- nized along a gradient of schematicity, testing whether models generalize across instantiations and whether ...

work page 2020
[4]

Research Questions and Methodological Design As Cxns are assumed to be inherently language- specific,probingconstructionalknowledgerequires moving beyond the English-centric focus that char- acterises much of the existing literature. Moreover, thenpnCxn occupies an intermediate position on the lexicon–syntax continuum, making it a suitable test case for a...

work page 2025
[5]

Data Thedatasetusedinthisstudy(Gorzonietal.,2026) is derived from the Italiannpndataset presented in Masini (2024a), extended with full sentential con- texts extracted from CORIS3

Methods 5.1. Data Thedatasetusedinthisstudy(Gorzonietal.,2026) is derived from the Italiannpndataset presented in Masini (2024a), extended with full sentential con- texts extracted from CORIS3. The full dataset contains 3,256 attested in- stances of the Italiannpnconstructional pat- tern instantiated by the prepositionsa‘at/to’ andsu‘on’. Following the an...

work page arXiv 2026
[6]

rather than GloVe as a static baseline be- causeitssubword-basedrepresentationsarebetter suited to morphologically rich languages such as Italian, allowing us to control for lexical and inflec- tional variation. 5.3. Experimental setup For the identification task, we perform binary clas- sification (Constructionvs.Distractor). For the disambiguation task,...

work page 2025
[7]

In Scivetti and Schneider (2025)’s implementa- tion, in fact, the identification task contrasts actual npninstances with surface-isomorphic patterns

Identification task The first experiment evaluates whether contextual embeddingsextractedfromBERT’smodelsencode sufficient information to distinguishnpnconstruc- tions from distractors, and analyzes how the nature of the distractor patterns affect the probing classi- fier’s behaviour. In Scivetti and Schneider (2025)’s implementa- tion, in fact, the ident...

work page 2025
[8]

Disambiguation task Given the very high performance achieved in the experiment about the identification ofnpnCxn, ex- tending the analysis beyond form, we now turn to examining the semantic dimension of the Cxn. Our setup is a multinomial three-class disambigua- tion problem: we only focus on the Cxn (1), (2), (3) and Cxn (4) in Table 1, which are associ-...

work page
[9]

Conclusion We presented two probing experiments address- ing the identification and semantic disambiguation of Italiannpnconstructions. To this end, we intro- ducedanextendeddatasetincludingbothconstruc- tional instances and carefully designed distractors, allowing for a controlled evaluation of construction- sensitive encoding. We extended and enriched t...

work page 2025
[10]

First, the analysis is restricted to a single con- structional family, namely the ItaliannpnCxs

Limitations The present study is subject to several limitations. First, the analysis is restricted to a single con- structional family, namely the ItaliannpnCxs. Al- though multiple prepositions (a‘at/to’,su‘on’,per ‘by’,dopo‘after’) are included, they instantiate closely related constructions within the same con- structional network, differing primarily ...

work page 2025
[11]

Participation was entirely volun- tary and had no impact on students’ evaluation or academic standing

Ethics Statement Annotators were recruited within an advanced Master’s-level course as part of structured educa- tional activities. Participation was entirely volun- tary and had no impact on students’ evaluation or academic standing. All participants were informed about the objectives of the study and the intended use of the collected data

work page
[12]

Unsupervised Cross-lingual Representation Learning at Scale

Bibliographical References Ron Artstein and Massimo Poesio. 2008. Inter- coder Agreement for Computational Linguistics. Computational Linguistics, 34(4):555–596. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vec- tors with subword information.Transactions of the Association for Computational Linguistics, 5:135–146...

work page internal anchor Pith review arXiv 2008
[13]

Wesley Scivetti and Nathan Schneider

A primer in BERTology: What we know about how BERT works.Transactions of the AssociationforComputationalLinguistics,8:842– 866. Wesley Scivetti and Nathan Schneider. 2025. Con- struction identification and disambiguation using bert: A case study of npn. InProceedings of the 29th Conference on Computational Natural Lan- guage Learning, pages 365–376. Assoc...

work page 2025
[14]

bandiere su bandiere giù

bert-base-italian-cased (revision 843e404). Harish Tayyar Madabushi and Claire Bonial. 2025. Construction grammar evidence for how LLMs use context-directed extrapolation to solve tasks. InProceedingsoftheSecondInternationalWork- shoponConstructionGrammarsandNLP,pages 190–201, Düsseldorf, Germany. Association for Computational Linguistics. Harish Tayyar M...

work page 2025