The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Celestine Achi

arxiv: 2606.20255 · v2 · pith:6245XPQ6new · submitted 2026-06-18 · 💻 cs.CL · cs.AI

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Celestine Achi This is my paper

Pith reviewed 2026-06-26 17:38 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Meaning Intelligence FrameworkNigerian public discourseregister classificationpragmatic intentcontext failureAI evaluationsentiment analysiscode-mixed language

0 comments

The pith

A nine-dimension schema raises AI register classification accuracy on Nigerian discourse from 33.3 percent to 73.3 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Meaning Intelligence Framework, a nine-dimension schema that scores register, surface sentiment, true intent, irony, coded subtext, risk tier, annotator confidence, speaker emotion, and recommended action to capture pragmatic force in Nigerian public discourse. It argues that AI failures on this material stem from missing context rather than translation issues, and that the same utterance can carry opposite meanings depending on speaker, audience, and situation. Tests on three frontier models show a 40-point jump in register accuracy once the schema is supplied in prompts, yet larger models do not outperform smaller ones and gain nothing from the schema. The work supplies a 30-item calibration set across Standard English, Nigerian English, Pidgin, and code-mixed registers.

Core claim

Zero-shot register classification accuracy sits at 33.3 percent and rises to 73.3 percent when models receive the MIF schema in context. Model capability and cultural competence are decoupled: GPT-5 and Gemini 2.5 Pro score lower than Gemini 2.5 Flash on the meaning intelligence score, and neither larger model benefits from schema-informed prompting.

What carries the argument

The Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema that separates surface sentiment from true communicative intent across register, irony, coded subtext, risk tier, and related dimensions.

If this is right

Schema-informed prompting can close much of the register gap for models operating on Nigerian discourse.
General model scale does not guarantee better handling of pragmatic or cultural intent in this domain.
The released framework, guidelines, and calibration set enable direct reproducibility checks and further annotation work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Comparable dimension-based schemas could be tested on other context-sensitive discourse domains where literal and intended meanings diverge.
The observed decoupling suggests that targeted cultural or pragmatic fine-tuning may be more effective than scaling alone for register-sensitive tasks.

Load-bearing premise

The 30-item calibration dataset is representative enough of Nigerian public discourse to support the claims about the Register Gap and model decoupling.

What would settle it

A new test collection of at least 100 utterances drawn from Nigerian public discourse, scored by multiple annotators, would show whether the 40-point accuracy gain from in-context MIF prompting still appears when the models are re-evaluated.

read the original abstract

We introduce the Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema for Nigerian public discourse that separates surface sentiment from true communicative intent. Existing benchmarks for Nigerian languages, including NaijaSenti and AfriSenti, treat sentiment classification as a three-way polarity task. We argue that the dominant failure mode of AI systems on Nigerian discourse is not translation failure but context failure: the same utterance carries opposite pragmatic force depending on speaker, audience, and situation. The MIF operationalises this insight across nine scored dimensions: register, surface sentiment, true intent, irony, coded subtext, risk tier, annotator confidence, speaker emotion, and recommended communications action. We construct a 30-item calibration dataset spanning Standard English, Nigerian English, Nigerian Pidgin, and code-mixed registers, and evaluate three frontier language models (Gemini 2.5 Flash, GPT-5, and Gemini 2.5 Pro) under zero-shot and schema-informed prompting conditions. Two headline findings emerge. First, the Register Gap: zero-shot register classification accuracy is 33.3%, rising to 73.3% (+40 points) when the model receives the MIF schema in-context. Second, model capability and cultural competence are decoupled: GPT-5 (MIS 67.8) and Gemini 2.5 Pro (MIS 65.4) score lower than Flash (MIS 78.6), and neither benefits from schema-informed prompting. We release the framework specification, annotation guidelines, and calibration set to support reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The MIF framework targets a real gap in Nigerian discourse pragmatics but its headline claims rest on an unvalidated 30-item set with no agreement stats or error bars.

read the letter

The paper introduces a nine-dimension schema called the Meaning Intelligence Framework to separate surface sentiment from true intent in Nigerian registers, and it reports a 40-point jump in register classification accuracy when models get the schema in context. It also claims that Gemini 2.5 Flash outperforms larger models on this task.

What is new is the concrete application of multi-aspect annotation to Standard English, Nigerian English, Pidgin, and code-mixed data, plus the public release of the framework, guidelines, and the 30-item calibration set. The observation that existing sentiment benchmarks miss pragmatic force depending on speaker and situation is a fair point.

The work does a reasonable job spelling out why context failure matters more than translation failure for these registers. Releasing the materials is the most useful part for anyone who wants to test or extend the idea.

The main weakness is the evaluation. All numbers come from the same 30 items, with no inter-annotator agreement, no confidence intervals, and no description of how the true labels were fixed or how the items were sampled. The +40 point Register Gap and the claim that model size and cultural competence are decoupled both sit on this single small set, so modest changes in selection or annotation noise could erase the differences.

This is for people working on discourse annotation or applied NLP for African languages who need something beyond three-way polarity labels. A reader could pull the released set and try the dimensions even if the current numbers are treated as exploratory.

It deserves peer review because the underlying problem is legitimate and the framework could be strengthened with larger data and basic statistical checks. I would send it out rather than desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces the Meaning Intelligence Framework (MIF), a nine-dimension schema for annotating Nigerian public discourse that distinguishes surface sentiment from pragmatic intent across registers including Standard English, Nigerian English, Pidgin, and code-mixed forms. It constructs a 30-item calibration dataset and evaluates three LLMs (Gemini 2.5 Flash, GPT-5, Gemini 2.5 Pro) in zero-shot versus schema-informed conditions, reporting a Register Gap with register classification accuracy rising from 33.3% to 73.3% (+40 points) when MIF is provided in-context, plus a decoupling result where Flash achieves a higher Meaning Intelligence Score (78.6) than the larger models (67.8 and 65.4) and does not benefit from the schema. The framework specification, guidelines, and calibration set are released.

Significance. If the central claims hold after validation, the work would identify a practically important failure mode in frontier LLMs—context and register sensitivity rather than translation per se—in Nigerian discourse, and supply a reusable nine-dimension schema plus open calibration materials that could support more targeted evaluation and fine-tuning in low-resource cultural NLP. The explicit release of the dataset and guidelines is a clear strength that enables external checks and extensions.

major comments (3)

[Abstract] Abstract (Register Gap claim): the reported improvement from 10/30 to 22/30 correct register classifications is presented without binomial confidence intervals, stratification by register or language, or any description of how the ground-truth intent labels were established or validated.
[Abstract] Abstract (decoupling claim): the finding that Gemini 2.5 Flash (MIS 78.6) outperforms GPT-5 and Gemini 2.5 Pro is computed on the identical 30-item set; any sampling instability or annotation noise therefore propagates directly to both headline results.
[Abstract] Abstract (evaluation protocol): no inter-annotator agreement is reported for the 30-item set, and the manuscript supplies no sampling frame or external validation that would establish the set as representative of Nigerian public discourse.

minor comments (1)

[Abstract] The abstract references NaijaSenti and AfriSenti as existing benchmarks but does not supply citations for them.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on the abstract and evaluation protocol. We address each point below and will revise the manuscript to incorporate additional statistical reporting, clarifications on annotation, and explicit discussion of limitations.

read point-by-point responses

Referee: [Abstract] Abstract (Register Gap claim): the reported improvement from 10/30 to 22/30 correct register classifications is presented without binomial confidence intervals, stratification by register or language, or any description of how the ground-truth intent labels were established or validated.

Authors: We will add binomial confidence intervals for the 33.3% and 73.3% figures using the Clopper-Pearson method. The ground-truth labels were established by the lead author (a native Nigerian English speaker with training in pragmatics) via iterative contextual analysis of speaker, audience, and situation for each of the 30 items; we will expand the methods section to describe this process explicitly. A supplementary table will provide stratification by register. These additions will be made in the revision. revision: yes
Referee: [Abstract] Abstract (decoupling claim): the finding that Gemini 2.5 Flash (MIS 78.6) outperforms GPT-5 and Gemini 2.5 Pro is computed on the identical 30-item set; any sampling instability or annotation noise therefore propagates directly to both headline results.

Authors: We agree that both headline results derive from the same 30-item calibration set and therefore share any sampling or annotation effects. We will revise the abstract, results, and discussion sections to state this limitation explicitly and to frame the decoupling observation as preliminary, recommending independent validation sets in future work. revision: yes
Referee: [Abstract] Abstract (evaluation protocol): no inter-annotator agreement is reported for the 30-item set, and the manuscript supplies no sampling frame or external validation that would establish the set as representative of Nigerian public discourse.

Authors: The 30-item collection is explicitly positioned as a calibration set constructed purposively to cover the four registers rather than as a statistically representative sample; no formal sampling frame was used. Annotation was performed by a single expert annotator, so inter-annotator agreement metrics do not apply. We will add a dedicated limitations paragraph clarifying these points and noting the absence of external validation beyond author expertise. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper constructs a new nine-dimension MIF schema and a 30-item calibration set, then reports direct accuracy counts (10/30 zero-shot, 22/30 schema-informed) on register classification. These are empirical measurements on the authors' own labeled items rather than any fitted parameter renamed as a prediction, self-definitional loop, or load-bearing self-citation. No equations, uniqueness theorems, or ansatzes are invoked that reduce the headline Register Gap or model-decoupling claims to the inputs by construction. The evaluation is self-contained against the stated benchmark; external validity concerns exist but do not constitute circularity under the specified criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that context failure is the dominant error mode and that a 30-item set can demonstrate general improvement; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption The dominant failure mode of AI systems on Nigerian discourse is context failure rather than translation failure.
Explicitly stated in the abstract as the motivation for separating surface sentiment from true intent.

invented entities (1)

Meaning Intelligence Framework (MIF) no independent evidence
purpose: Operationalises nine scored dimensions to capture pragmatic force in Nigerian discourse.
Newly defined schema introduced in the paper; no independent evidence outside this work is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5804 in / 1275 out tokens · 23385 ms · 2026-06-26T17:38:53.267454+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 9 canonical work pages

[1]

I., et al

Adelani, D. I., et al. (2021). MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics, 9, 1116–1131

2021
[2]

I., et al

Adelani, D. I., et al. (2023). AfroBench: How good are large language models on African languages? arXiv preprint arXiv:2311.07978

work page arXiv 2023
[3]

Iskandardinata, M., Christian, W., & Suhartono, D. (2025). Context -aware pragmatic metacognitive prompting for sarcasm detection. arXiv preprint arXiv:2511.21066

work page arXiv 2025
[4]

Lee, J., et al. (2024). Pragmatic metacognitive prompting improves LLM performance on sarcasm detection. arXiv preprint arXiv:2412.04509

work page arXiv 2024
[5]

H., et al

Muhammad, S. H., et al. (2022). NaijaSenti: A Nigerian Twitter sentiment corpus for multilingual sentiment analysis. arXiv preprint arXiv:2201.08277

work page arXiv 2022
[6]

H., et al

Muhammad, S. H., et al. (2023). AfriSenti: A Twitter sentiment analysis benchmark for African languages. arXiv preprint arXiv:2302.08956

work page arXiv 2023
[7]

Ochieng, M., et al. (2025). Reasoning beyond labels: Measuring LLM sentiment in low -resource, culturally nuanced contexts. arXiv preprint arXiv:2508.04199

work page arXiv 2025
[8]

Oyewusi, W., et al. (2021). Semantic enrichment of Nigerian Pidgin English for contextual sentiment classification. In Proceedings of the AfricaNLP Workshop

2021
[9]

Saeed, M., Bourgonje, P., & Demberg, V. (2024). Implicit discourse relation classification for Nigerian Pidgin. arXiv preprint arXiv:2406.18776

work page arXiv 2024
[10]

Shode, I., et al. (2023). NollySenti: Leveraging transfer learning and machine translation for Nigerian movie sentiment classification. arXiv preprint arXiv:2305.10971. 10

work page arXiv 2023
[11]

O., et al

Yu, H., Alabi, J. O., et al. (2025). INJONGO: A multicultural intent detection and slot-filling dataset for 16 African languages. arXiv preprint arXiv:2502.09814. Supplementary materials: The MIF Master Specification v2.0, Annotation Guidelines v1.0, and the 30-item public calibration set (with gold labels) are available as companion documents. The privat...

work page arXiv 2025

[1] [1]

I., et al

Adelani, D. I., et al. (2021). MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics, 9, 1116–1131

2021

[2] [2]

I., et al

Adelani, D. I., et al. (2023). AfroBench: How good are large language models on African languages? arXiv preprint arXiv:2311.07978

work page arXiv 2023

[3] [3]

Iskandardinata, M., Christian, W., & Suhartono, D. (2025). Context -aware pragmatic metacognitive prompting for sarcasm detection. arXiv preprint arXiv:2511.21066

work page arXiv 2025

[4] [4]

Lee, J., et al. (2024). Pragmatic metacognitive prompting improves LLM performance on sarcasm detection. arXiv preprint arXiv:2412.04509

work page arXiv 2024

[5] [5]

H., et al

Muhammad, S. H., et al. (2022). NaijaSenti: A Nigerian Twitter sentiment corpus for multilingual sentiment analysis. arXiv preprint arXiv:2201.08277

work page arXiv 2022

[6] [6]

H., et al

Muhammad, S. H., et al. (2023). AfriSenti: A Twitter sentiment analysis benchmark for African languages. arXiv preprint arXiv:2302.08956

work page arXiv 2023

[7] [7]

Ochieng, M., et al. (2025). Reasoning beyond labels: Measuring LLM sentiment in low -resource, culturally nuanced contexts. arXiv preprint arXiv:2508.04199

work page arXiv 2025

[8] [8]

Oyewusi, W., et al. (2021). Semantic enrichment of Nigerian Pidgin English for contextual sentiment classification. In Proceedings of the AfricaNLP Workshop

2021

[9] [9]

Saeed, M., Bourgonje, P., & Demberg, V. (2024). Implicit discourse relation classification for Nigerian Pidgin. arXiv preprint arXiv:2406.18776

work page arXiv 2024

[10] [10]

Shode, I., et al. (2023). NollySenti: Leveraging transfer learning and machine translation for Nigerian movie sentiment classification. arXiv preprint arXiv:2305.10971. 10

work page arXiv 2023

[11] [11]

O., et al

Yu, H., Alabi, J. O., et al. (2025). INJONGO: A multicultural intent detection and slot-filling dataset for 16 African languages. arXiv preprint arXiv:2502.09814. Supplementary materials: The MIF Master Specification v2.0, Annotation Guidelines v1.0, and the 30-item public calibration set (with gold labels) are available as companion documents. The privat...

work page arXiv 2025