How Proposal Novelty, Topical Diversity, and Theory-Practice Balance Shape Scholarly Outcomes in Funded Education Research

Jiaming Zhang; Yang Ding; Yunfeng Gao; Yuxuan Xiao

arxiv: 2606.01127 · v1 · pith:WVABUAMAnew · submitted 2026-05-31 · 💻 cs.DL · cs.CY

How Proposal Novelty, Topical Diversity, and Theory-Practice Balance Shape Scholarly Outcomes in Funded Education Research

Yunfeng Gao , Yuxuan Xiao , Jiaming Zhang , Yang Ding This is my paper

Pith reviewed 2026-06-28 16:05 UTC · model grok-4.3

classification 💻 cs.DL cs.CY

keywords NSF education fundingproposal noveltytopical diversitytheory-practice balancescholarly outcomespublication outputcitation performance

0 comments

The pith

Balanced proposals that integrate theoretical and practical aims show the most favorable profile of publication gains with fewer citation drawbacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper links 8715 NSF education awards from 1990 to 2020 with 84519 publications by principal investigators across four divisions. It measures proposal novelty as semantic distance from earlier funded work in the same division, topical diversity as breadth across latent themes, and intellectual orientation as theoretical, practical, or balanced. Funding raises publication counts but frequently pairs with lower citation performance and journal visibility, especially in later decades. Novelty shows limited uneven ties to outcomes while diversity produces mixed division-specific patterns, yet balanced proposals combine positive publication associations with fewer negative citation patterns. A sympathetic reader would care because education research funding is expected to advance both scholarly knowledge and practical applications, so identifying proposal features that predict better post-award results can guide more effective allocation.

Core claim

NSF education funding is consistently associated with higher publication output across divisions, but this increase is not accompanied by stronger citation performance or higher journal-level visibility, with citation and CiteScore estimates often negative particularly in later decades. Proposal novelty shows limited and uneven associations with post-award outcomes, whereas topical diversity is more clearly related to publication growth in some divisions but weaker citation-based performance in others. Balanced proposals that integrate theoretical and practical aims display the most favourable overall profile, combining positive publication associations with fewer negative citation-based pat

What carries the argument

Three text-derived proposal features—novelty as semantic distance from prior funded projects, topical diversity as breadth across latent research themes, and intellectual orientation as theoretical, practical, or balanced—treated as predictors of publication output, citation performance, and journal visibility.

If this is right

NSF education funding raises publication output across the examined divisions.
Citation performance and CiteScore estimates are often negative, especially in later decades.
Proposal novelty has limited and uneven associations with post-award scholarly outcomes.
Topical diversity relates to publication growth in some divisions but weaker citation performance in others.
Balanced theory-practice proposals combine positive publication associations with fewer negative citation-based patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Funders could experiment with explicit signals that reward balanced theory-practice framing when scoring education proposals.
The pattern may generalize to other domains where public funding expects both knowledge production and practical application.
Division-specific differences imply that uniform evaluation criteria across programs could overlook context-dependent effects.
Longer-term tracking of the same awards might reveal whether the observed citation shortfalls persist or reverse after additional years.

Load-bearing premise

Semantic distance from prior projects and breadth across latent themes accurately capture novelty and diversity, and these features can be treated as predictors of outcomes rather than being confounded by unmeasured applicant or institutional factors.

What would settle it

A study that adds controls for principal investigator prior record or uses alternative text measures and finds no remaining link between balanced orientation and the favorable publication-citation profile would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.01127 by Jiaming Zhang, Yang Ding, Yunfeng Gao, Yuxuan Xiao.

**Figure 2.** Figure 2: Estimated results of education research funding on the academic performance of funded education scholars across decades [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

**Figure 3.** Figure 3: Estimated effects of NSF education research funding for projects in the top and bottom 30% of [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

read the original abstract

Education research occupies a distinctive position in public science because it is expected to advance scholarly knowledge while also informing learning, teaching, participation, and workforce development. This study examines how the intellectual characteristics of NSF-funded education proposals are associated with the subsequent academic performance of funded scholars. Linking 8,715 NSF education awards from 1990 to 2020 with 84,519 publications by principal investigators, the analysis focuses on four major NSF education divisions that collectively span undergraduate and graduate levels, formal and informal learning environments, and inclusive educational initiatives. Proposal novelty is measured as semantic distance from prior funded projects within the same division, topical diversity as breadth across latent research themes, and intellectual orientation as theoretical, practical, or balanced. The results show that NSF education funding is consistently associated with higher publication output across divisions. However, this increase is not accompanied by stronger citation performance or higher journal-level visibility; citation and CiteScore estimates are often negative, particularly in later decades. Proposal novelty shows limited and uneven associations with post-award outcomes, whereas topical diversity is more clearly related to publication growth in some divisions but weaker citation-based performance in others. Balanced proposals that integrate theoretical and practical aims display the most favourable overall profile, combining positive publication associations with fewer negative citation-based patterns. These findings highlight the importance of evaluating education research funding through multiple academic outcomes and division-specific research contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links a large NSF education grant dataset to PI publications and reports that balanced theory-practice proposals associate with higher output but not better citations, though unmeasured PI and institutional quality likely drive much of the pattern.

read the letter

The main point is that this paper builds a linked dataset of 8715 NSF education awards from 1990-2020 and 84519 PI publications, then measures proposal novelty via semantic distance, topical diversity via latent themes, and orientation as theoretical, practical, or balanced. It finds funding raises publication counts across four divisions but shows flat or negative citation and CiteScore patterns, with balanced proposals showing the least negative impact profile and novelty having limited ties to outcomes.

What works is the scale and the division-specific breakdowns. Applying established text methods to this previously unexamined NSF education corpus produces concrete descriptive associations that were not available before.

The soft spot is selection. The analysis conditions only on funded awards and does not report controls for pre-award PI productivity, citation history, or institution fixed effects. Without those, the favorable pattern for balanced proposals could simply reflect that stronger PIs or departments both write more balanced proposals and produce more output afterward. The abstract supplies no model details or robustness checks on this point, so the associations stay hard to interpret beyond raw correlations.

This is for people who track science funding policy or education research evaluation. A reader focused on grant outcomes in one agency would find the data volume useful, but anyone needing causal or predictive claims would need more.

It deserves peer review because the dataset is new and the questions are relevant to funders, but any referee would have to require explicit checks on applicant quality and alternative specifications.

Referee Report

2 major / 2 minor

Summary. The paper links 8,715 NSF education awards (1990–2020) across four divisions to 84,519 PI publications and examines associations between three text-derived proposal features—semantic novelty (distance from prior funded projects), topical diversity (breadth across latent themes), and intellectual orientation (theoretical/practical/balanced)—and post-award outcomes (publication counts, citations, CiteScore). It reports that funding raises publication output but is often linked to weaker citation performance, with novelty showing limited associations, diversity showing mixed patterns, and balanced proposals displaying the most favorable overall profile.

Significance. If the reported associations prove robust, the work would provide division-specific evidence that proposal intellectual orientation matters for scholarly productivity in education research and could inform how funders evaluate balance between theory and practice. The large linked dataset and multi-outcome design are strengths, but the observational nature and absence of pre-award controls limit the ability to attribute outcomes to proposal features rather than applicant or institutional selection.

major comments (2)

[Abstract] Abstract and methods description: the reported associations between proposal features and outcomes do not mention inclusion of pre-award PI productivity, citation history, or institution fixed effects as covariates. Without these, the favorable profile for balanced proposals cannot be distinguished from selection of stronger applicants or departments, directly undermining interpretation of the central claim.
[Abstract] Results on citation and CiteScore outcomes: the paper reports often-negative estimates for novelty and diversity but supplies no information on model specifications, robustness checks, or alternative text measures. This leaves open whether the patterns are sensitive to analytic choices or to unmeasured confounders noted in the skeptic assessment.

minor comments (2)

[Abstract] Clarify how the four NSF divisions are defined and whether division-specific models include interaction terms or separate estimations.
[Abstract] The time span 1990–2020 spans major changes in publication practices; note whether decade fixed effects or period-specific models are used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on controls, model transparency, and interpretation. We address each point below and will revise the manuscript to strengthen clarity while remaining faithful to the observational design and available data.

read point-by-point responses

Referee: [Abstract] Abstract and methods description: the reported associations between proposal features and outcomes do not mention inclusion of pre-award PI productivity, citation history, or institution fixed effects as covariates. Without these, the favorable profile for balanced proposals cannot be distinguished from selection of stronger applicants or departments, directly undermining interpretation of the central claim.

Authors: We agree that pre-award controls are important for interpreting selection versus treatment effects. The manuscript does not currently include them. In revision we will (1) explicitly state in the abstract and methods that the models lack pre-award PI productivity and institution fixed effects, (2) add a limitations paragraph discussing how this constrains causal claims, and (3) include supplementary analyses that add available pre-award publication counts for the subset of PIs where such data can be reliably linked. Full institution fixed effects across all divisions and decades are not feasible with the current linkage, but we will test division-level clustering and discuss residual selection concerns. revision: partial
Referee: [Abstract] Results on citation and CiteScore outcomes: the paper reports often-negative estimates for novelty and diversity but supplies no information on model specifications, robustness checks, or alternative text measures. This leaves open whether the patterns are sensitive to analytic choices or to unmeasured confounders noted in the skeptic assessment.

Authors: The full methods section already specifies the regression families (negative binomial for publication counts, linear models for log-citations and CiteScore) and the construction of the three text features. However, we accept that the abstract and main results tables do not foreground these details or robustness checks. We will revise the abstract to reference the model specifications, add an explicit robustness subsection reporting (a) alternative embeddings for novelty, (b) different topic-model specifications for diversity, and (c) sensitivity to sample restrictions. We will also expand the discussion of unmeasured confounders in the limitations section. revision: yes

Circularity Check

0 steps flagged

No circularity; observational study with external data sources

full rationale

This is an observational empirical study linking external NSF grant records (1990-2020) to PI publication data. Novelty (semantic distance), topical diversity (latent theme breadth), and intellectual orientation (theoretical/practical/balanced) are computed from proposal text and treated as predictors of post-award counts, citations, and CiteScore. No equations, fitted parameters, or self-citations reduce any reported association to a quantity defined by the study's own inputs. The design is self-contained against external benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5789 in / 1109 out tokens · 34449 ms · 2026-06-28T16:05:53.398576+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references

[1]

The linkage strategy was designed to minimise errors from name -based matching

Purpose and Overview This appendix documents the procedure used to link NSF education awards to publication and author records in Web of Science (WoS) and Microsoft Academic Graph (MAG). The linkage strategy was designed to minimise errors from name -based matching. Instead of directly matching NSF principal investigators to bibliometric authors through n...

1990
[2]

DUE -1234567

These records included award numbers, titles, abstracts, start and end years, funding amounts, award mechanisms, division identifiers, institutional affiliatio ns, and principal investigator information. The sample was restricted to awards administered through DGE, DRL, DUE, and HRD. These divisions were retained because they represent the core education ...
[3]

DOI strings were cleaned by converting to lowercase, removing URL prefixes, trimming whitespace, and standardising punctuation

DOI-Based Linkage from WoS to MAG The second linkage used DOI identifiers to connect WoS publications to MAG records. DOI strings were cleaned by converting to lowercase, removing URL prefixes, trimming whitespace, and standardising punctuation. WoS records without valid DOI information we re retained for audit purposes but could not be used for the WoS-M...
[4]

The numbers are reported at the award or PI -linkage stage indicated by each row

Sample Exclusion and Retention Table S1 reports the sample cleaning and matching process. The numbers are reported at the award or PI -linkage stage indicated by each row. The final analytical sample contains 8,715 NSF education awards linked to principal investigators with usable proposal text and bibliometric histories. Table S1. Sample exclusion and re...

1990
[5]

Table S2 reports publication -level retention from the initial WoS match to the final MAG publication portfolio

Publication-Level Retention The award-level linkage generated a publication-level file used to identify MAG author profiles 40 and construct annual outcomes. Table S2 reports publication -level retention from the initial WoS match to the final MAG publication portfolio. Table S2. Publication-level linkage and retention. No. Step Excluded Remaining 1 WoS p...
[6]

A random sample of 150 NSF-WoS award-number matches was manually inspected

Quality Assurance and Manual Verification Several validation checks were conducted to assess linkage reliability. A random sample of 150 NSF-WoS award-number matches was manually inspected. In 148 cases, the WoS funding acknowledgement clearly referred to the same NSF award number and project. Two cases contained multiple NSF awards in the same acknowledg...
[7]

Overview This study examines education -related awards made by the NSF between 1990 and 2020. During this period, most NSF education research and capacity -building activities were administered through the Directorate for Education and Human Resources (EHR), which wa s later reorganised as the Directorate for STEM Education. The analysis focuses on four m...

1990
[8]

Results This note, in conjunction with the topic distribution figures, further elucidates the heterogeneity in knowledge structures across different National Science Foundation funding divisions. The left panel of the figure displays the average topical diversity score of individual proposals within each division, while the right panel quantifies the tota...
[9]

Purpose of the Classification This appendix describes the procedure used to classify NSF education research proposals into three intellectual orientations: theoretical, practical, and balanced. The classification was conducted to identify whether each proposal primarily emphasised conc eptual or methodological development, practical educational interventi...
[10]

Classification Categories Theoretical orientation refers to proposals whose primary contribution is conceptual, explanatory, or methodological. These proposals typically aim to develop theory, refine constructs, explain learning processes, advance measurement frameworks, or contrib ute to generalisable understanding of education systems. 45 Practical orie...
[11]

Each record contained an award title, abstract, award year, and division identifier

Input Corpus and Pre-Processing The initial corpus contained 8,715 NSF education awards from DGE, DRL, DUE, and HRD between 1990 and 2020. Each record contained an award title, abstract, award year, and division identifier. Before classification, the texts were processed as follows. First, titles and abstracts were concatenated. Second, boilerplate admini...

1990
[12]

The prompt therefore instructed the model to classify the proposal according to its primary intellectual purpose rather than isolated terms

Prompt Design The prompt was designed to minimise three common classification risks: over -weighting 46 individual keywords, confusing applied empirical research with practical orientation, and treating any mention of theory as evidence of theoretical orientation. The prompt therefore instructed the model to classify the proposal according to its primary ...
[13]

THEORETICAL The proposal primarily develops, tests, extends, or refines concepts, theories, explanatory models, measurement frameworks, or methods for understanding learning, teaching, educational systems, or STEM education processes
[14]

PRACTICAL The proposal primarily designs, implements, evaluates, scales, or improves an educational intervention, curriculum, instructional tool, professional development programme, institutional reform, assessment practice, or student-support programme
[15]

It uses theory to guide intervention or programme design, or uses practical implementation evidence to refine theory, models, or conceptual understanding

BALANCED The proposal explicitly integrates theoretical development and practical implementation. It uses theory to guide intervention or programme design, or uses practical implementation evidence to refine theory, models, or conceptual understanding
[16]

framework

UNCLEAR Use this only if the title and abstract do not provide enough information to classify the proposal reliably. Important rules: - Do not classify a proposal as theoretical merely because it uses words such as “framework”, “model”, or “theory”. - Do not classify a proposal as practical merely because it studies students, teachers, classrooms, or inst...
[17]

Each pilot used a stratified sample of proposals across the four NSF divisions

Pilot Testing and Prompt Revision The prompt was developed through three pilot rounds. Each pilot used a stratified sample of proposals across the four NSF divisions. In Pilot Round 1, 240 proposals were classified. Manual inspection showed that the model over-classified proposals as balanced when abstracts contained both conceptual and applied vocabulary...
[18]

The initial results were (see Table S3): Table S3

Automated Classification Results The first full GPT-4 classification was applied to 8,642 eligible proposals. The initial results were (see Table S3): Table S3. Classification results. Classification Number of proposals Percentage Theoretical 2,184 25.3% Practical 3,746 43.3% Balanced 2,318 26.8% Unclear 394 4.6% Total 8,642 100.0% The distribution was co...
[19]

Classifications with confidence scores of 0.80 or above were retained provisionally

Confidence Thresholding and Review Routing The model returned a confidence score between 0 and 1 for each classification. Classifications with confidence scores of 0.80 or above were retained provisionally. Classifications below 0.80, together with all unclear cases, were routed to additional revie w. Of the 8,642 classified proposals, 7,681 received conf...
[20]

This represents 98.8 percent of the 8,715 original NSF award records and 99.7 percent of the 8,642 records eligible for LLM classification

Final Orientation Sample After automated classification, second-pass review, and human adjudication, 8,614 proposals received final orientation labels, as shown in Table S7. This represents 98.8 percent of the 8,715 original NSF award records and 99.7 percent of the 8,642 records eligible for LLM classification. The final orientation distribution was: Tab...
[21]

The sample was stratified by division and orientation to ensure that smaller categories were represented

Validation Sample To assess classification quality, a validation sample of 600 labelled proposals was drawn from the final classified corpus. The sample was stratified by division and orientation to ensure that smaller categories were represented. Two expert reviewers independently coded the validation sample without seeing the GPT-4 labels. Agreement bet...
[22]

First, classification confidence was similar across divisions

Bias and Robustness Checks Several checks were conducted to assess whether the classification was sensitive to division, text length, or confidence threshold. First, classification confidence was similar across divisions. Mean confidence was 0.887 for DGE, 0.892 for DRL, 0.901 for DUE, and 0.884 for HRD. This suggests that no single division was systemati...
[23]

pseudo-award

Use of the Orientation Variable in the Empirical Analysis The final orientation label was merged back into the award-level dataset and then linked to the 53 scholar-year analytical panel through the principal investigator’s first observed NSF education award within the relevant division. For scholars with multiple awards, the orientation of the first obse...

[1] [1]

The linkage strategy was designed to minimise errors from name -based matching

Purpose and Overview This appendix documents the procedure used to link NSF education awards to publication and author records in Web of Science (WoS) and Microsoft Academic Graph (MAG). The linkage strategy was designed to minimise errors from name -based matching. Instead of directly matching NSF principal investigators to bibliometric authors through n...

1990

[2] [2]

DUE -1234567

These records included award numbers, titles, abstracts, start and end years, funding amounts, award mechanisms, division identifiers, institutional affiliatio ns, and principal investigator information. The sample was restricted to awards administered through DGE, DRL, DUE, and HRD. These divisions were retained because they represent the core education ...

[3] [3]

DOI strings were cleaned by converting to lowercase, removing URL prefixes, trimming whitespace, and standardising punctuation

DOI-Based Linkage from WoS to MAG The second linkage used DOI identifiers to connect WoS publications to MAG records. DOI strings were cleaned by converting to lowercase, removing URL prefixes, trimming whitespace, and standardising punctuation. WoS records without valid DOI information we re retained for audit purposes but could not be used for the WoS-M...

[4] [4]

The numbers are reported at the award or PI -linkage stage indicated by each row

Sample Exclusion and Retention Table S1 reports the sample cleaning and matching process. The numbers are reported at the award or PI -linkage stage indicated by each row. The final analytical sample contains 8,715 NSF education awards linked to principal investigators with usable proposal text and bibliometric histories. Table S1. Sample exclusion and re...

1990

[5] [5]

Table S2 reports publication -level retention from the initial WoS match to the final MAG publication portfolio

Publication-Level Retention The award-level linkage generated a publication-level file used to identify MAG author profiles 40 and construct annual outcomes. Table S2 reports publication -level retention from the initial WoS match to the final MAG publication portfolio. Table S2. Publication-level linkage and retention. No. Step Excluded Remaining 1 WoS p...

[6] [6]

A random sample of 150 NSF-WoS award-number matches was manually inspected

Quality Assurance and Manual Verification Several validation checks were conducted to assess linkage reliability. A random sample of 150 NSF-WoS award-number matches was manually inspected. In 148 cases, the WoS funding acknowledgement clearly referred to the same NSF award number and project. Two cases contained multiple NSF awards in the same acknowledg...

[7] [7]

Overview This study examines education -related awards made by the NSF between 1990 and 2020. During this period, most NSF education research and capacity -building activities were administered through the Directorate for Education and Human Resources (EHR), which wa s later reorganised as the Directorate for STEM Education. The analysis focuses on four m...

1990

[8] [8]

Results This note, in conjunction with the topic distribution figures, further elucidates the heterogeneity in knowledge structures across different National Science Foundation funding divisions. The left panel of the figure displays the average topical diversity score of individual proposals within each division, while the right panel quantifies the tota...

[9] [9]

Purpose of the Classification This appendix describes the procedure used to classify NSF education research proposals into three intellectual orientations: theoretical, practical, and balanced. The classification was conducted to identify whether each proposal primarily emphasised conc eptual or methodological development, practical educational interventi...

[10] [10]

Classification Categories Theoretical orientation refers to proposals whose primary contribution is conceptual, explanatory, or methodological. These proposals typically aim to develop theory, refine constructs, explain learning processes, advance measurement frameworks, or contrib ute to generalisable understanding of education systems. 45 Practical orie...

[11] [11]

Each record contained an award title, abstract, award year, and division identifier

Input Corpus and Pre-Processing The initial corpus contained 8,715 NSF education awards from DGE, DRL, DUE, and HRD between 1990 and 2020. Each record contained an award title, abstract, award year, and division identifier. Before classification, the texts were processed as follows. First, titles and abstracts were concatenated. Second, boilerplate admini...

1990

[12] [12]

The prompt therefore instructed the model to classify the proposal according to its primary intellectual purpose rather than isolated terms

Prompt Design The prompt was designed to minimise three common classification risks: over -weighting 46 individual keywords, confusing applied empirical research with practical orientation, and treating any mention of theory as evidence of theoretical orientation. The prompt therefore instructed the model to classify the proposal according to its primary ...

[13] [13]

THEORETICAL The proposal primarily develops, tests, extends, or refines concepts, theories, explanatory models, measurement frameworks, or methods for understanding learning, teaching, educational systems, or STEM education processes

[14] [14]

PRACTICAL The proposal primarily designs, implements, evaluates, scales, or improves an educational intervention, curriculum, instructional tool, professional development programme, institutional reform, assessment practice, or student-support programme

[15] [15]

It uses theory to guide intervention or programme design, or uses practical implementation evidence to refine theory, models, or conceptual understanding

BALANCED The proposal explicitly integrates theoretical development and practical implementation. It uses theory to guide intervention or programme design, or uses practical implementation evidence to refine theory, models, or conceptual understanding

[16] [16]

framework

UNCLEAR Use this only if the title and abstract do not provide enough information to classify the proposal reliably. Important rules: - Do not classify a proposal as theoretical merely because it uses words such as “framework”, “model”, or “theory”. - Do not classify a proposal as practical merely because it studies students, teachers, classrooms, or inst...

[17] [17]

Each pilot used a stratified sample of proposals across the four NSF divisions

Pilot Testing and Prompt Revision The prompt was developed through three pilot rounds. Each pilot used a stratified sample of proposals across the four NSF divisions. In Pilot Round 1, 240 proposals were classified. Manual inspection showed that the model over-classified proposals as balanced when abstracts contained both conceptual and applied vocabulary...

[18] [18]

The initial results were (see Table S3): Table S3

Automated Classification Results The first full GPT-4 classification was applied to 8,642 eligible proposals. The initial results were (see Table S3): Table S3. Classification results. Classification Number of proposals Percentage Theoretical 2,184 25.3% Practical 3,746 43.3% Balanced 2,318 26.8% Unclear 394 4.6% Total 8,642 100.0% The distribution was co...

[19] [19]

Classifications with confidence scores of 0.80 or above were retained provisionally

Confidence Thresholding and Review Routing The model returned a confidence score between 0 and 1 for each classification. Classifications with confidence scores of 0.80 or above were retained provisionally. Classifications below 0.80, together with all unclear cases, were routed to additional revie w. Of the 8,642 classified proposals, 7,681 received conf...

[20] [20]

This represents 98.8 percent of the 8,715 original NSF award records and 99.7 percent of the 8,642 records eligible for LLM classification

Final Orientation Sample After automated classification, second-pass review, and human adjudication, 8,614 proposals received final orientation labels, as shown in Table S7. This represents 98.8 percent of the 8,715 original NSF award records and 99.7 percent of the 8,642 records eligible for LLM classification. The final orientation distribution was: Tab...

[21] [21]

The sample was stratified by division and orientation to ensure that smaller categories were represented

Validation Sample To assess classification quality, a validation sample of 600 labelled proposals was drawn from the final classified corpus. The sample was stratified by division and orientation to ensure that smaller categories were represented. Two expert reviewers independently coded the validation sample without seeing the GPT-4 labels. Agreement bet...

[22] [22]

First, classification confidence was similar across divisions

Bias and Robustness Checks Several checks were conducted to assess whether the classification was sensitive to division, text length, or confidence threshold. First, classification confidence was similar across divisions. Mean confidence was 0.887 for DGE, 0.892 for DRL, 0.901 for DUE, and 0.884 for HRD. This suggests that no single division was systemati...

[23] [23]

pseudo-award

Use of the Orientation Variable in the Empirical Analysis The final orientation label was merged back into the award-level dataset and then linked to the 53 scholar-year analytical panel through the principal investigator’s first observed NSF education award within the relevant division. For scholars with multiple awards, the orientation of the first obse...