The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse
Pith reviewed 2026-06-26 17:38 UTC · model grok-4.3
The pith
A nine-dimension schema raises AI register classification accuracy on Nigerian discourse from 33.3 percent to 73.3 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Zero-shot register classification accuracy sits at 33.3 percent and rises to 73.3 percent when models receive the MIF schema in context. Model capability and cultural competence are decoupled: GPT-5 and Gemini 2.5 Pro score lower than Gemini 2.5 Flash on the meaning intelligence score, and neither larger model benefits from schema-informed prompting.
What carries the argument
The Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema that separates surface sentiment from true communicative intent across register, irony, coded subtext, risk tier, and related dimensions.
If this is right
- Schema-informed prompting can close much of the register gap for models operating on Nigerian discourse.
- General model scale does not guarantee better handling of pragmatic or cultural intent in this domain.
- The released framework, guidelines, and calibration set enable direct reproducibility checks and further annotation work.
Where Pith is reading between the lines
- Comparable dimension-based schemas could be tested on other context-sensitive discourse domains where literal and intended meanings diverge.
- The observed decoupling suggests that targeted cultural or pragmatic fine-tuning may be more effective than scaling alone for register-sensitive tasks.
Load-bearing premise
The 30-item calibration dataset is representative enough of Nigerian public discourse to support the claims about the Register Gap and model decoupling.
What would settle it
A new test collection of at least 100 utterances drawn from Nigerian public discourse, scored by multiple annotators, would show whether the 40-point accuracy gain from in-context MIF prompting still appears when the models are re-evaluated.
read the original abstract
We introduce the Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema for Nigerian public discourse that separates surface sentiment from true communicative intent. Existing benchmarks for Nigerian languages, including NaijaSenti and AfriSenti, treat sentiment classification as a three-way polarity task. We argue that the dominant failure mode of AI systems on Nigerian discourse is not translation failure but context failure: the same utterance carries opposite pragmatic force depending on speaker, audience, and situation. The MIF operationalises this insight across nine scored dimensions: register, surface sentiment, true intent, irony, coded subtext, risk tier, annotator confidence, speaker emotion, and recommended communications action. We construct a 30-item calibration dataset spanning Standard English, Nigerian English, Nigerian Pidgin, and code-mixed registers, and evaluate three frontier language models (Gemini 2.5 Flash, GPT-5, and Gemini 2.5 Pro) under zero-shot and schema-informed prompting conditions. Two headline findings emerge. First, the Register Gap: zero-shot register classification accuracy is 33.3%, rising to 73.3% (+40 points) when the model receives the MIF schema in-context. Second, model capability and cultural competence are decoupled: GPT-5 (MIS 67.8) and Gemini 2.5 Pro (MIS 65.4) score lower than Flash (MIS 78.6), and neither benefits from schema-informed prompting. We release the framework specification, annotation guidelines, and calibration set to support reproducibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Meaning Intelligence Framework (MIF), a nine-dimension schema for annotating Nigerian public discourse that distinguishes surface sentiment from pragmatic intent across registers including Standard English, Nigerian English, Pidgin, and code-mixed forms. It constructs a 30-item calibration dataset and evaluates three LLMs (Gemini 2.5 Flash, GPT-5, Gemini 2.5 Pro) in zero-shot versus schema-informed conditions, reporting a Register Gap with register classification accuracy rising from 33.3% to 73.3% (+40 points) when MIF is provided in-context, plus a decoupling result where Flash achieves a higher Meaning Intelligence Score (78.6) than the larger models (67.8 and 65.4) and does not benefit from the schema. The framework specification, guidelines, and calibration set are released.
Significance. If the central claims hold after validation, the work would identify a practically important failure mode in frontier LLMs—context and register sensitivity rather than translation per se—in Nigerian discourse, and supply a reusable nine-dimension schema plus open calibration materials that could support more targeted evaluation and fine-tuning in low-resource cultural NLP. The explicit release of the dataset and guidelines is a clear strength that enables external checks and extensions.
major comments (3)
- [Abstract] Abstract (Register Gap claim): the reported improvement from 10/30 to 22/30 correct register classifications is presented without binomial confidence intervals, stratification by register or language, or any description of how the ground-truth intent labels were established or validated.
- [Abstract] Abstract (decoupling claim): the finding that Gemini 2.5 Flash (MIS 78.6) outperforms GPT-5 and Gemini 2.5 Pro is computed on the identical 30-item set; any sampling instability or annotation noise therefore propagates directly to both headline results.
- [Abstract] Abstract (evaluation protocol): no inter-annotator agreement is reported for the 30-item set, and the manuscript supplies no sampling frame or external validation that would establish the set as representative of Nigerian public discourse.
minor comments (1)
- [Abstract] The abstract references NaijaSenti and AfriSenti as existing benchmarks but does not supply citations for them.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on the abstract and evaluation protocol. We address each point below and will revise the manuscript to incorporate additional statistical reporting, clarifications on annotation, and explicit discussion of limitations.
read point-by-point responses
-
Referee: [Abstract] Abstract (Register Gap claim): the reported improvement from 10/30 to 22/30 correct register classifications is presented without binomial confidence intervals, stratification by register or language, or any description of how the ground-truth intent labels were established or validated.
Authors: We will add binomial confidence intervals for the 33.3% and 73.3% figures using the Clopper-Pearson method. The ground-truth labels were established by the lead author (a native Nigerian English speaker with training in pragmatics) via iterative contextual analysis of speaker, audience, and situation for each of the 30 items; we will expand the methods section to describe this process explicitly. A supplementary table will provide stratification by register. These additions will be made in the revision. revision: yes
-
Referee: [Abstract] Abstract (decoupling claim): the finding that Gemini 2.5 Flash (MIS 78.6) outperforms GPT-5 and Gemini 2.5 Pro is computed on the identical 30-item set; any sampling instability or annotation noise therefore propagates directly to both headline results.
Authors: We agree that both headline results derive from the same 30-item calibration set and therefore share any sampling or annotation effects. We will revise the abstract, results, and discussion sections to state this limitation explicitly and to frame the decoupling observation as preliminary, recommending independent validation sets in future work. revision: yes
-
Referee: [Abstract] Abstract (evaluation protocol): no inter-annotator agreement is reported for the 30-item set, and the manuscript supplies no sampling frame or external validation that would establish the set as representative of Nigerian public discourse.
Authors: The 30-item collection is explicitly positioned as a calibration set constructed purposively to cover the four registers rather than as a statistically representative sample; no formal sampling frame was used. Annotation was performed by a single expert annotator, so inter-annotator agreement metrics do not apply. We will add a dedicated limitations paragraph clarifying these points and noting the absence of external validation beyond author expertise. revision: partial
Circularity Check
No circularity in derivation chain
full rationale
The paper constructs a new nine-dimension MIF schema and a 30-item calibration set, then reports direct accuracy counts (10/30 zero-shot, 22/30 schema-informed) on register classification. These are empirical measurements on the authors' own labeled items rather than any fitted parameter renamed as a prediction, self-definitional loop, or load-bearing self-citation. No equations, uniqueness theorems, or ansatzes are invoked that reduce the headline Register Gap or model-decoupling claims to the inputs by construction. The evaluation is self-contained against the stated benchmark; external validity concerns exist but do not constitute circularity under the specified criteria.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The dominant failure mode of AI systems on Nigerian discourse is context failure rather than translation failure.
invented entities (1)
-
Meaning Intelligence Framework (MIF)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
I., et al
Adelani, D. I., et al. (2021). MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics, 9, 1116–1131
2021
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
-
[8]
Oyewusi, W., et al. (2021). Semantic enrichment of Nigerian Pidgin English for contextual sentiment classification. In Proceedings of the AfricaNLP Workshop
2021
- [9]
- [10]
-
[11]
Yu, H., Alabi, J. O., et al. (2025). INJONGO: A multicultural intent detection and slot-filling dataset for 16 African languages. arXiv preprint arXiv:2502.09814. Supplementary materials: The MIF Master Specification v2.0, Annotation Guidelines v1.0, and the 30-item public calibration set (with gold labels) are available as companion documents. The privat...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.