pith. machine review for the scientific record. sign in

arxiv: 2604.03673 · v1 · submitted 2026-04-04 · 💻 cs.CL

Recognition: no theorem link

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:31 UTC · model grok-4.3

classification 💻 cs.CL
keywords NPN constructionBERTprobingcontextual embeddingsItalianconstruction grammarinterpretabilitydisambiguation
0
0 comments X

The pith

BERT's contextual embeddings encode information about Italian NPN constructions across its layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether BERT captures the Italian NPN construction, which combines two nouns with a preposition in between. Researchers extract contextual vectors from different layers of the model and use them to train probing classifiers that identify these constructions. This reveals how much of the construction's form and meaning is present in the embeddings. The study applies this to Italian, a less-studied language in such work, and questions some common assumptions in probing experiments. A sympathetic reader would care because it bridges linguistic theory on constructions with how neural models represent language.

Core claim

Contextual vector representations from BERT encode the Italian NPN constructional family, and layer-wise probing classifiers can systematically evaluate the information encoded across the model's internal layers to show the extent to which constructional form and meaning are reflected in these embeddings.

What carries the argument

Layer-wise probing classifiers that take contextual vector representations from BERT as input to detect and disambiguate NPN constructions.

If this is right

  • Constructional form and meaning can be detected in specific layers of BERT.
  • Empirical evidence supports links between constructionist theory and neural language models.
  • Similar probing can be extended to other constructions and languages.
  • Disambiguation of NPN meanings benefits from contextual embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Models might be improved by explicitly training on constructional patterns if they are not fully captured.
  • Human-like processing of constructions could be tested by comparing probe results to psycholinguistic data.
  • Layer-specific encoding suggests potential for targeted interventions in model fine-tuning.

Load-bearing premise

That the probing classifiers provide a reliable measure of what the model actually encodes without being affected by the choice of probe or data.

What would settle it

Observing that probing accuracy remains at chance level for all layers would show that the embeddings do not encode the NPN construction.

Figures

Figures reproduced from arXiv: 2604.03673 by Francesca Masini, Greta Gorzoni, Ludovica Pannitto.

Figure 1
Figure 1. Figure 1: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on Construction identification for the SIMPLE config￾uration. As in the following plots, the accuracy of the five probing classifiers resulting from the five random splits is averaged. Dashed grey line rep￾resents FastText baseline. Continuous grey lines refer to control classifiers. Figure (1a) includes decremental trainin… view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on Construction identification for the OTHER and PSEUDO configurations. Dashed grey line repre￾sents FastText baseline. The distribution of misclassifications (see Ap￾pendix D) highlights three main patterns, which are consistently observed across all models in the BERT family. In the SIMPLE configuration, all mod￾els exhib… view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on both experiments on English data from Scivetti and Schneider (2025). Dashed grey line represents FastText baseline. Dotted grey line represents GloVe baseline. 7. Disambiguation task Given the very high performance achieved in the experiment about the identification of npn Cxn, ex- [PITH_FULL_IMAGE:figures/full_fig_p006… view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy of [UNK] (red lines, square dots) and PREP (orange lines, triangular dots) on Construction disambiguation task. Dashed grey line represents FastText baseline. Dotted grey line represents morphological FastText baseline. Con￾tinuous grey lines refer to control classifiers. Model (4a) includes decremental training configurations, line shading becomes progressively lighter as the number of training i… view at source ↗
Figure 5
Figure 5. Figure 5: reports accuracy across layers for [UNK] and PREP embeddings, together with static baselines. We can observe that [UNK] and PREP representations support robust generalisation to unseen prepositions, with performance reaching high accuracy in late layers. The task is intrinsically harder, as it is demonstrated by the drop in perfor￾mance for both baselines. Nonetheless, results are consistent across differe… view at source ↗
read the original abstract

Interpretability research has highlighted the importance of evaluating Pretrained Language Models (PLMs) and in particular contextual embeddings against explicit linguistic theories to determine what linguistic information they encode. This study focuses on the Italian NPN (noun-preposition-noun) constructional family, challenging some of the theoretical and methodological assumptions underlying previous experimental designs and extending this type of research to a lesser-investigated language. Contextual vector representations are extracted from BERT and used as input to layer-wise probing classifiers, systematically evaluating information encoded across the model's internal layers. The results shed light on the extent to which constructional form and meaning are reflected in contextual embeddings, contributing empirical evidence to the dialogue between constructionist theory and neural language modelling

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript investigates the Italian NPN (noun-preposition-noun) construction family by extracting contextual embeddings from BERT and related models, then training layer-wise probing classifiers to assess the encoding of constructional form and meaning across layers. It challenges prior experimental assumptions, extends the approach to Italian, and aims to provide empirical evidence linking constructionist theory with neural language modeling.

Significance. If the probing results hold after appropriate controls, the work would supply useful data on how PLMs represent a specific constructional pattern in a lesser-studied language, adding to the body of interpretability studies that test linguistic theories against model representations.

major comments (2)
  1. [Methods] The central claim that layer-wise probes reveal the extent of constructional encoding rests on the untested assumption that classifier accuracy indexes abstract information rather than lexical co-occurrence or positional biases. No control conditions are described that hold lexical items fixed while disrupting the NPN template (e.g., noun-noun-preposition or preposition-noun-noun orders), which is required to isolate construction-specific features.
  2. [Abstract] The abstract supplies no quantitative results, performance metrics, error analysis, data splits, or baseline comparisons, so the soundness of the claim that the results 'shed light on' constructional encoding cannot be evaluated from the provided text.
minor comments (1)
  1. [Title] The phrase 'BERT's family' in the title is imprecise; it should be clarified as 'BERT-family models' or 'models in the BERT family' for consistency with standard terminology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and describe the revisions that will be incorporated in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Methods] The central claim that layer-wise probes reveal the extent of constructional encoding rests on the untested assumption that classifier accuracy indexes abstract information rather than lexical co-occurrence or positional biases. No control conditions are described that hold lexical items fixed while disrupting the NPN template (e.g., noun-noun-preposition or preposition-noun-noun orders), which is required to isolate construction-specific features.

    Authors: We agree that the absence of such controls is a limitation of the current design. In the revised manuscript we will introduce control conditions that hold the lexical items fixed while disrupting the canonical NPN order (specifically NNP and PNN permutations). Probing accuracies on these controls will be reported alongside the main results to demonstrate that the classifiers are sensitive to constructional structure rather than lexical or positional cues alone. revision: yes

  2. Referee: [Abstract] The abstract supplies no quantitative results, performance metrics, error analysis, data splits, or baseline comparisons, so the soundness of the claim that the results 'shed light on' constructional encoding cannot be evaluated from the provided text.

    Authors: We accept that the abstract is currently too high-level. The revised abstract will include key quantitative details: the highest layer-wise probing accuracies for both form and meaning, the train/test split sizes, a lexical baseline comparison, and a brief reference to the error analysis performed. These additions will allow readers to evaluate the strength of the claims directly from the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: standard probing applied to new data without self-referential derivations

full rationale

The paper extracts contextual embeddings from BERT-family models and trains layer-wise probing classifiers to assess encoding of the Italian NPN construction. No equations, parameter fittings, or derivations appear in the work. Results are presented as empirical measurements from established probing methods on novel Italian data, without any reduction of outputs to inputs by construction, self-defined quantities, or load-bearing self-citations. The central claims rest on classifier accuracy as a direct (if imperfect) index of encoded information, which is an external methodological choice rather than a tautological re-labeling of the authors' own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; standard assumptions of probing research (e.g., that linear classifiers can extract encoded features) are implicit but unstated.

pith-pipeline@v0.9.0 · 5422 in / 1002 out tokens · 39669 ms · 2026-05-13T17:31:17.719294+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 2 internal anchors

  1. [1]

    'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

    Introduction The remarkable empirical performance obtained by Pretrained Language Models (PLMs) across a wide range of tasks has fueled enthusiasm in both computational approaches and theoretical debates aboutlanguage(Brownetal.,2020). Despitethese successes, PLMs remain largely opaque (Rogers et al., 2020). High predictive accuracy does not automatically...

  2. [2]

    Formally, the pattern consists of nominal reduplication interrupted by a preposition

    ThenpnConstruction npnexpressions challenge traditional grammatical categories and motivate a model capable of cap- turing phenomena along the lexicon–syntax con- tinuum. Formally, the pattern consists of nominal reduplication interrupted by a preposition. Construction Schema N ouni PrepositionN oun i Treatingnpnexpressionsassemi-specifiedCxns accountsfor...

  3. [3]

    Related work Recent work has investigated whether LLMs en- code constructional knowledge using a variety of experimental designs. One line of research (Tay- yar Madabushi et al., 2020; Tayyar Madabushi and Bonial, 2025) examines multiple Cxns orga- nized along a gradient of schematicity, testing whether models generalize across instantiations and whether ...

  4. [4]

    Research Questions and Methodological Design As Cxns are assumed to be inherently language- specific,probingconstructionalknowledgerequires moving beyond the English-centric focus that char- acterises much of the existing literature. Moreover, thenpnCxn occupies an intermediate position on the lexicon–syntax continuum, making it a suitable test case for a...

  5. [5]

    Data Thedatasetusedinthisstudy(Gorzonietal.,2026) is derived from the Italiannpndataset presented in Masini (2024a), extended with full sentential con- texts extracted from CORIS3

    Methods 5.1. Data Thedatasetusedinthisstudy(Gorzonietal.,2026) is derived from the Italiannpndataset presented in Masini (2024a), extended with full sentential con- texts extracted from CORIS3. The full dataset contains 3,256 attested in- stances of the Italiannpnconstructional pat- tern instantiated by the prepositionsa‘at/to’ andsu‘on’. Following the an...

  6. [6]

    rather than GloVe as a static baseline be- causeitssubword-basedrepresentationsarebetter suited to morphologically rich languages such as Italian, allowing us to control for lexical and inflec- tional variation. 5.3. Experimental setup For the identification task, we perform binary clas- sification (Constructionvs.Distractor). For the disambiguation task,...

  7. [7]

    In Scivetti and Schneider (2025)’s implementa- tion, in fact, the identification task contrasts actual npninstances with surface-isomorphic patterns

    Identification task The first experiment evaluates whether contextual embeddingsextractedfromBERT’smodelsencode sufficient information to distinguishnpnconstruc- tions from distractors, and analyzes how the nature of the distractor patterns affect the probing classi- fier’s behaviour. In Scivetti and Schneider (2025)’s implementa- tion, in fact, the ident...

  8. [8]

    Disambiguation task Given the very high performance achieved in the experiment about the identification ofnpnCxn, ex- tending the analysis beyond form, we now turn to examining the semantic dimension of the Cxn. Our setup is a multinomial three-class disambigua- tion problem: we only focus on the Cxn (1), (2), (3) and Cxn (4) in Table 1, which are associ-...

  9. [9]

    Conclusion We presented two probing experiments address- ing the identification and semantic disambiguation of Italiannpnconstructions. To this end, we intro- ducedanextendeddatasetincludingbothconstruc- tional instances and carefully designed distractors, allowing for a controlled evaluation of construction- sensitive encoding. We extended and enriched t...

  10. [10]

    First, the analysis is restricted to a single con- structional family, namely the ItaliannpnCxs

    Limitations The present study is subject to several limitations. First, the analysis is restricted to a single con- structional family, namely the ItaliannpnCxs. Al- though multiple prepositions (a‘at/to’,su‘on’,per ‘by’,dopo‘after’) are included, they instantiate closely related constructions within the same con- structional network, differing primarily ...

  11. [11]

    Participation was entirely volun- tary and had no impact on students’ evaluation or academic standing

    Ethics Statement Annotators were recruited within an advanced Master’s-level course as part of structured educa- tional activities. Participation was entirely volun- tary and had no impact on students’ evaluation or academic standing. All participants were informed about the objectives of the study and the intended use of the collected data

  12. [12]

    Unsupervised Cross-lingual Representation Learning at Scale

    Bibliographical References Ron Artstein and Massimo Poesio. 2008. Inter- coder Agreement for Computational Linguistics. Computational Linguistics, 34(4):555–596. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vec- tors with subword information.Transactions of the Association for Computational Linguistics, 5:135–146...

  13. [13]

    Wesley Scivetti and Nathan Schneider

    A primer in BERTology: What we know about how BERT works.Transactions of the AssociationforComputationalLinguistics,8:842– 866. Wesley Scivetti and Nathan Schneider. 2025. Con- struction identification and disambiguation using bert: A case study of npn. InProceedings of the 29th Conference on Computational Natural Lan- guage Learning, pages 365–376. Assoc...

  14. [14]

    bandiere su bandiere giù

    bert-base-italian-cased (revision 843e404). Harish Tayyar Madabushi and Claire Bonial. 2025. Construction grammar evidence for how LLMs use context-directed extrapolation to solve tasks. InProceedingsoftheSecondInternationalWork- shoponConstructionGrammarsandNLP,pages 190–201, Düsseldorf, Germany. Association for Computational Linguistics. Harish Tayyar M...