How Proposal Novelty, Topical Diversity, and Theory-Practice Balance Shape Scholarly Outcomes in Funded Education Research
Pith reviewed 2026-06-28 16:05 UTC · model grok-4.3
The pith
Balanced proposals that integrate theoretical and practical aims show the most favorable profile of publication gains with fewer citation drawbacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NSF education funding is consistently associated with higher publication output across divisions, but this increase is not accompanied by stronger citation performance or higher journal-level visibility, with citation and CiteScore estimates often negative particularly in later decades. Proposal novelty shows limited and uneven associations with post-award outcomes, whereas topical diversity is more clearly related to publication growth in some divisions but weaker citation-based performance in others. Balanced proposals that integrate theoretical and practical aims display the most favourable overall profile, combining positive publication associations with fewer negative citation-based pat
What carries the argument
Three text-derived proposal features—novelty as semantic distance from prior funded projects, topical diversity as breadth across latent research themes, and intellectual orientation as theoretical, practical, or balanced—treated as predictors of publication output, citation performance, and journal visibility.
If this is right
- NSF education funding raises publication output across the examined divisions.
- Citation performance and CiteScore estimates are often negative, especially in later decades.
- Proposal novelty has limited and uneven associations with post-award scholarly outcomes.
- Topical diversity relates to publication growth in some divisions but weaker citation performance in others.
- Balanced theory-practice proposals combine positive publication associations with fewer negative citation-based patterns.
Where Pith is reading between the lines
- Funders could experiment with explicit signals that reward balanced theory-practice framing when scoring education proposals.
- The pattern may generalize to other domains where public funding expects both knowledge production and practical application.
- Division-specific differences imply that uniform evaluation criteria across programs could overlook context-dependent effects.
- Longer-term tracking of the same awards might reveal whether the observed citation shortfalls persist or reverse after additional years.
Load-bearing premise
Semantic distance from prior projects and breadth across latent themes accurately capture novelty and diversity, and these features can be treated as predictors of outcomes rather than being confounded by unmeasured applicant or institutional factors.
What would settle it
A study that adds controls for principal investigator prior record or uses alternative text measures and finds no remaining link between balanced orientation and the favorable publication-citation profile would falsify the central claim.
Figures
read the original abstract
Education research occupies a distinctive position in public science because it is expected to advance scholarly knowledge while also informing learning, teaching, participation, and workforce development. This study examines how the intellectual characteristics of NSF-funded education proposals are associated with the subsequent academic performance of funded scholars. Linking 8,715 NSF education awards from 1990 to 2020 with 84,519 publications by principal investigators, the analysis focuses on four major NSF education divisions that collectively span undergraduate and graduate levels, formal and informal learning environments, and inclusive educational initiatives. Proposal novelty is measured as semantic distance from prior funded projects within the same division, topical diversity as breadth across latent research themes, and intellectual orientation as theoretical, practical, or balanced. The results show that NSF education funding is consistently associated with higher publication output across divisions. However, this increase is not accompanied by stronger citation performance or higher journal-level visibility; citation and CiteScore estimates are often negative, particularly in later decades. Proposal novelty shows limited and uneven associations with post-award outcomes, whereas topical diversity is more clearly related to publication growth in some divisions but weaker citation-based performance in others. Balanced proposals that integrate theoretical and practical aims display the most favourable overall profile, combining positive publication associations with fewer negative citation-based patterns. These findings highlight the importance of evaluating education research funding through multiple academic outcomes and division-specific research contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper links 8,715 NSF education awards (1990–2020) across four divisions to 84,519 PI publications and examines associations between three text-derived proposal features—semantic novelty (distance from prior funded projects), topical diversity (breadth across latent themes), and intellectual orientation (theoretical/practical/balanced)—and post-award outcomes (publication counts, citations, CiteScore). It reports that funding raises publication output but is often linked to weaker citation performance, with novelty showing limited associations, diversity showing mixed patterns, and balanced proposals displaying the most favorable overall profile.
Significance. If the reported associations prove robust, the work would provide division-specific evidence that proposal intellectual orientation matters for scholarly productivity in education research and could inform how funders evaluate balance between theory and practice. The large linked dataset and multi-outcome design are strengths, but the observational nature and absence of pre-award controls limit the ability to attribute outcomes to proposal features rather than applicant or institutional selection.
major comments (2)
- [Abstract] Abstract and methods description: the reported associations between proposal features and outcomes do not mention inclusion of pre-award PI productivity, citation history, or institution fixed effects as covariates. Without these, the favorable profile for balanced proposals cannot be distinguished from selection of stronger applicants or departments, directly undermining interpretation of the central claim.
- [Abstract] Results on citation and CiteScore outcomes: the paper reports often-negative estimates for novelty and diversity but supplies no information on model specifications, robustness checks, or alternative text measures. This leaves open whether the patterns are sensitive to analytic choices or to unmeasured confounders noted in the skeptic assessment.
minor comments (2)
- [Abstract] Clarify how the four NSF divisions are defined and whether division-specific models include interaction terms or separate estimations.
- [Abstract] The time span 1990–2020 spans major changes in publication practices; note whether decade fixed effects or period-specific models are used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on controls, model transparency, and interpretation. We address each point below and will revise the manuscript to strengthen clarity while remaining faithful to the observational design and available data.
read point-by-point responses
-
Referee: [Abstract] Abstract and methods description: the reported associations between proposal features and outcomes do not mention inclusion of pre-award PI productivity, citation history, or institution fixed effects as covariates. Without these, the favorable profile for balanced proposals cannot be distinguished from selection of stronger applicants or departments, directly undermining interpretation of the central claim.
Authors: We agree that pre-award controls are important for interpreting selection versus treatment effects. The manuscript does not currently include them. In revision we will (1) explicitly state in the abstract and methods that the models lack pre-award PI productivity and institution fixed effects, (2) add a limitations paragraph discussing how this constrains causal claims, and (3) include supplementary analyses that add available pre-award publication counts for the subset of PIs where such data can be reliably linked. Full institution fixed effects across all divisions and decades are not feasible with the current linkage, but we will test division-level clustering and discuss residual selection concerns. revision: partial
-
Referee: [Abstract] Results on citation and CiteScore outcomes: the paper reports often-negative estimates for novelty and diversity but supplies no information on model specifications, robustness checks, or alternative text measures. This leaves open whether the patterns are sensitive to analytic choices or to unmeasured confounders noted in the skeptic assessment.
Authors: The full methods section already specifies the regression families (negative binomial for publication counts, linear models for log-citations and CiteScore) and the construction of the three text features. However, we accept that the abstract and main results tables do not foreground these details or robustness checks. We will revise the abstract to reference the model specifications, add an explicit robustness subsection reporting (a) alternative embeddings for novelty, (b) different topic-model specifications for diversity, and (c) sensitivity to sample restrictions. We will also expand the discussion of unmeasured confounders in the limitations section. revision: yes
Circularity Check
No circularity; observational study with external data sources
full rationale
This is an observational empirical study linking external NSF grant records (1990-2020) to PI publication data. Novelty (semantic distance), topical diversity (latent theme breadth), and intellectual orientation (theoretical/practical/balanced) are computed from proposal text and treated as predictors of post-award counts, citations, and CiteScore. No equations, fitted parameters, or self-citations reduce any reported association to a quantity defined by the study's own inputs. The design is self-contained against external benchmarks with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The linkage strategy was designed to minimise errors from name -based matching
Purpose and Overview This appendix documents the procedure used to link NSF education awards to publication and author records in Web of Science (WoS) and Microsoft Academic Graph (MAG). The linkage strategy was designed to minimise errors from name -based matching. Instead of directly matching NSF principal investigators to bibliometric authors through n...
1990
-
[2]
DUE -1234567
These records included award numbers, titles, abstracts, start and end years, funding amounts, award mechanisms, division identifiers, institutional affiliatio ns, and principal investigator information. The sample was restricted to awards administered through DGE, DRL, DUE, and HRD. These divisions were retained because they represent the core education ...
-
[3]
DOI strings were cleaned by converting to lowercase, removing URL prefixes, trimming whitespace, and standardising punctuation
DOI-Based Linkage from WoS to MAG The second linkage used DOI identifiers to connect WoS publications to MAG records. DOI strings were cleaned by converting to lowercase, removing URL prefixes, trimming whitespace, and standardising punctuation. WoS records without valid DOI information we re retained for audit purposes but could not be used for the WoS-M...
-
[4]
The numbers are reported at the award or PI -linkage stage indicated by each row
Sample Exclusion and Retention Table S1 reports the sample cleaning and matching process. The numbers are reported at the award or PI -linkage stage indicated by each row. The final analytical sample contains 8,715 NSF education awards linked to principal investigators with usable proposal text and bibliometric histories. Table S1. Sample exclusion and re...
1990
-
[5]
Table S2 reports publication -level retention from the initial WoS match to the final MAG publication portfolio
Publication-Level Retention The award-level linkage generated a publication-level file used to identify MAG author profiles 40 and construct annual outcomes. Table S2 reports publication -level retention from the initial WoS match to the final MAG publication portfolio. Table S2. Publication-level linkage and retention. No. Step Excluded Remaining 1 WoS p...
-
[6]
A random sample of 150 NSF-WoS award-number matches was manually inspected
Quality Assurance and Manual Verification Several validation checks were conducted to assess linkage reliability. A random sample of 150 NSF-WoS award-number matches was manually inspected. In 148 cases, the WoS funding acknowledgement clearly referred to the same NSF award number and project. Two cases contained multiple NSF awards in the same acknowledg...
-
[7]
Overview This study examines education -related awards made by the NSF between 1990 and 2020. During this period, most NSF education research and capacity -building activities were administered through the Directorate for Education and Human Resources (EHR), which wa s later reorganised as the Directorate for STEM Education. The analysis focuses on four m...
1990
-
[8]
Results This note, in conjunction with the topic distribution figures, further elucidates the heterogeneity in knowledge structures across different National Science Foundation funding divisions. The left panel of the figure displays the average topical diversity score of individual proposals within each division, while the right panel quantifies the tota...
-
[9]
Purpose of the Classification This appendix describes the procedure used to classify NSF education research proposals into three intellectual orientations: theoretical, practical, and balanced. The classification was conducted to identify whether each proposal primarily emphasised conc eptual or methodological development, practical educational interventi...
-
[10]
Classification Categories Theoretical orientation refers to proposals whose primary contribution is conceptual, explanatory, or methodological. These proposals typically aim to develop theory, refine constructs, explain learning processes, advance measurement frameworks, or contrib ute to generalisable understanding of education systems. 45 Practical orie...
-
[11]
Each record contained an award title, abstract, award year, and division identifier
Input Corpus and Pre-Processing The initial corpus contained 8,715 NSF education awards from DGE, DRL, DUE, and HRD between 1990 and 2020. Each record contained an award title, abstract, award year, and division identifier. Before classification, the texts were processed as follows. First, titles and abstracts were concatenated. Second, boilerplate admini...
1990
-
[12]
The prompt therefore instructed the model to classify the proposal according to its primary intellectual purpose rather than isolated terms
Prompt Design The prompt was designed to minimise three common classification risks: over -weighting 46 individual keywords, confusing applied empirical research with practical orientation, and treating any mention of theory as evidence of theoretical orientation. The prompt therefore instructed the model to classify the proposal according to its primary ...
-
[13]
THEORETICAL The proposal primarily develops, tests, extends, or refines concepts, theories, explanatory models, measurement frameworks, or methods for understanding learning, teaching, educational systems, or STEM education processes
-
[14]
PRACTICAL The proposal primarily designs, implements, evaluates, scales, or improves an educational intervention, curriculum, instructional tool, professional development programme, institutional reform, assessment practice, or student-support programme
-
[15]
It uses theory to guide intervention or programme design, or uses practical implementation evidence to refine theory, models, or conceptual understanding
BALANCED The proposal explicitly integrates theoretical development and practical implementation. It uses theory to guide intervention or programme design, or uses practical implementation evidence to refine theory, models, or conceptual understanding
-
[16]
framework
UNCLEAR Use this only if the title and abstract do not provide enough information to classify the proposal reliably. Important rules: - Do not classify a proposal as theoretical merely because it uses words such as “framework”, “model”, or “theory”. - Do not classify a proposal as practical merely because it studies students, teachers, classrooms, or inst...
-
[17]
Each pilot used a stratified sample of proposals across the four NSF divisions
Pilot Testing and Prompt Revision The prompt was developed through three pilot rounds. Each pilot used a stratified sample of proposals across the four NSF divisions. In Pilot Round 1, 240 proposals were classified. Manual inspection showed that the model over-classified proposals as balanced when abstracts contained both conceptual and applied vocabulary...
-
[18]
The initial results were (see Table S3): Table S3
Automated Classification Results The first full GPT-4 classification was applied to 8,642 eligible proposals. The initial results were (see Table S3): Table S3. Classification results. Classification Number of proposals Percentage Theoretical 2,184 25.3% Practical 3,746 43.3% Balanced 2,318 26.8% Unclear 394 4.6% Total 8,642 100.0% The distribution was co...
-
[19]
Classifications with confidence scores of 0.80 or above were retained provisionally
Confidence Thresholding and Review Routing The model returned a confidence score between 0 and 1 for each classification. Classifications with confidence scores of 0.80 or above were retained provisionally. Classifications below 0.80, together with all unclear cases, were routed to additional revie w. Of the 8,642 classified proposals, 7,681 received conf...
-
[20]
This represents 98.8 percent of the 8,715 original NSF award records and 99.7 percent of the 8,642 records eligible for LLM classification
Final Orientation Sample After automated classification, second-pass review, and human adjudication, 8,614 proposals received final orientation labels, as shown in Table S7. This represents 98.8 percent of the 8,715 original NSF award records and 99.7 percent of the 8,642 records eligible for LLM classification. The final orientation distribution was: Tab...
-
[21]
The sample was stratified by division and orientation to ensure that smaller categories were represented
Validation Sample To assess classification quality, a validation sample of 600 labelled proposals was drawn from the final classified corpus. The sample was stratified by division and orientation to ensure that smaller categories were represented. Two expert reviewers independently coded the validation sample without seeing the GPT-4 labels. Agreement bet...
-
[22]
First, classification confidence was similar across divisions
Bias and Robustness Checks Several checks were conducted to assess whether the classification was sensitive to division, text length, or confidence threshold. First, classification confidence was similar across divisions. Mean confidence was 0.887 for DGE, 0.892 for DRL, 0.901 for DUE, and 0.884 for HRD. This suggests that no single division was systemati...
-
[23]
pseudo-award
Use of the Orientation Variable in the Empirical Analysis The final orientation label was merged back into the award-level dataset and then linked to the 53 scholar-year analytical panel through the principal investigator’s first observed NSF education award within the relevant division. For scholars with multiple awards, the orientation of the first obse...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.