Merit or networks? What decides where research is published

Ning Li

arxiv: 2606.03763 · v1 · pith:4ZKVXJNQnew · submitted 2026-06-02 · 💰 econ.GN · cs.AI· q-fin.EC

Merit or networks? What decides where research is published

Ning Li This is my paper

Pith reviewed 2026-06-28 07:51 UTC · model grok-4.3

classification 💰 econ.GN cs.AIq-fin.EC

keywords economics publishingidea qualityjournal placementsocial connectionsLLM evaluationmeritocracyprestige ladderworking papers

0 comments

The pith

Economics publishing follows a prestige ladder: execution quality sets the floor, idea quality grades the rungs, and connections set a ceiling only at the top journals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures idea quality of economics papers directly from their text using a trained LLM that ignores author names and outcomes. It combines this score with execution quality, a connection index, author ability, and language-model text metrics to model journal placement for 6,208 working papers. Execution quality emerges as the largest overall input and creates a minimum threshold for any placement. Idea quality then differentiates outcomes across mid-level journals. Connections add an independent boost, strongest near the most selective outlets, by both raising idea scores and improving placement odds at any given score. The advantage stays bounded, so ordinary ideas rarely reach the apex even with connections.

Core claim

Using a text-based idea quality score from an LLM evaluator, the study finds that journal placement in economics follows a sequence along the prestige ladder. Execution quality establishes a meritocratic floor and remains the largest input overall. Text-legible idea quality grades the intermediate rungs. Connections impose a favoritism ceiling that matters most near the apex. Connections operate through two additive channels: connected authors produce higher-scoring papers, and at equal scores their papers still place better. Yet the advantage is bounded, as even the highest-scoring papers face real friction reaching the visible journal ladder. The result nests rather than chooses between me

What carries the argument

The prestige ladder model sequencing execution quality as floor, text-legible idea quality as rungs, and connections as ceiling, estimated via a five-input production function for journal placement.

If this is right

Higher execution quality raises placement odds more than any other single input.
Idea quality predicts movement across intermediate journal tiers once execution clears the floor.
Connection index increases placement probability at equal idea and execution scores.
The connection advantage is largest for the most selective journals.
Even the highest idea-quality papers encounter barriers to the apex.
Ordinary connected papers still rarely reach top outlets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same text-based scoring method could test whether the floor-rungs-ceiling sequence holds in other fields.
Journals could explore blind idea-quality screens to limit connection effects at the top.
Interventions could target either the idea-generation channel or the review channel of the connection advantage.
Changes in the relative weights of these inputs over time would reveal whether publishing is shifting toward greater or lesser meritocracy.

Load-bearing premise

The discipline-trained LLM evaluator scores idea quality from text without seeing author names or outcomes and provides a valid unbiased measure that can be used ahead of publication fate.

What would settle it

If the LLM idea quality scores show no correlation with journal placement after controlling for execution and connections, or if independent blind expert ratings of the same papers produce different rankings that eliminate the connection effect.

read the original abstract

Does scientific publishing reward the quality of ideas or the advantage of connections? The question is universal to prestige-driven science, yet it has resisted decades of study because a paper's quality could not be gauged ahead of its publication fate without using that fate as the yardstick. We break this constraint by measuring a paper's idea quality directly from its text, before publication, using a discipline-trained LLM evaluator that scores the idea without seeing author names or outcomes. Using economics as a case study, we combine this text-legible idea-quality score with an execution-quality rubric, a connection index, an author-ability index, and an off-the-shelf language-model text score to estimate a five-input production function for journal placement across 6,208 economics working papers. The inputs are not rivals but a sequence along the ladder of prestige. Execution sets a meritocratic floor and is the largest input overall. Text-legible idea quality grades the rungs in between. Connections set a favoritism ceiling that bites mainly near the apex, the most selective journals. Connections work through two additive channels: connected authors write papers that score higher, and at equal scores their papers are still more likely to place better. Yet this advantage is bounded. Connections raise the odds of every rung without making the apex the typical outcome for ordinary ideas, and even the highest-scoring papers face real friction reaching the visible journal ladder. The result nests, rather than chooses between, the meritocracy and network accounts of how science is published.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The LLM text score is the real move here but the abstract gives no validation that it is independent of networks or outcomes, so the production-function decomposition stays unconvincing.

read the letter

The paper's main contribution is using a discipline-trained LLM to score idea quality directly from text before publication, then feeding that into a five-input production function for journal placement in economics. This is a direct attempt to break the long-standing circularity where quality gets inferred from the outcome itself.

It does one thing cleanly: it nests the two stories instead of forcing a choice. Execution is the biggest input and sets a floor, idea quality fills the middle rungs, and connections mainly matter as a ceiling at the very top journals. The claim that connections operate through both higher scores and residual favoritism at equal scores is stated plainly.

The problem is the LLM score itself. The abstract says it scores the idea without seeing names or outcomes, but supplies no training data, no fine-tuning details, no correlation with blinded expert ratings, and no test that the score is not picking up prestige-linked text features from published economics papers. That single unvalidated input carries the whole decomposition. The production function is also estimated on the same sample used to build the inputs, with no robustness checks or out-of-sample evidence mentioned.

This is worth sending to referees because the question is central to science-of-science work and the method is a genuine attempt to solve the measurement issue. A serious review would focus on whether the LLM can be shown to be exogenous to the outcome and networks; without that, the results stay provisional.

Referee Report

3 major / 2 minor

Summary. The paper claims that a discipline-trained LLM can score the idea quality of economics working papers directly from text (without author names or outcomes), and when combined with execution quality, a connection index, author-ability index, and an off-the-shelf LM text score, a five-input production function estimated on 6,208 working papers shows execution setting a meritocratic floor (largest input overall), idea quality grading intermediate rungs, and connections setting a favoritism ceiling that operates mainly at the most selective journals via two additive channels: connected authors produce higher-scoring papers and, conditional on score, still place better.

Significance. If the LLM score is shown to be a valid, independent measure of idea quality, the decomposition would provide a novel empirical nesting of meritocratic and network accounts of journal placement, with quantitative estimates of each input's contribution and bounds on the scope of favoritism. The approach could generalize beyond economics if the core measurement innovation holds.

major comments (3)

[Abstract, §3] Abstract and §3 (LLM evaluator): the claim that the discipline-trained LLM provides a valid, pre-publication measure of idea quality independent of networks or publication outcomes is load-bearing for the entire decomposition, yet no training corpus details, fine-tuning procedure, correlation with blinded expert ratings, or out-of-sample predictive checks are reported. Without these, the score may internalize prestige-correlated text features.
[§4] §4 (production function): the five-input function is estimated on the same sample used to construct the inputs (idea quality, execution, connections, etc.), with no mention of hold-out validation, external benchmarks, or robustness to alternative functional forms; this directly raises the circularity risk noted in the stress test and undermines the reported sequence of effect sizes.
[§5] §5 (results on connections): the two-channel claim (higher scores plus residual placement advantage) is central to the 'additive' conclusion, but the manuscript supplies no explicit test separating whether the residual channel reflects favoritism versus unmeasured execution or idea dimensions that the LLM rubric misses.

minor comments (2)

[Table 1, Figure 2] Table 1 and Figure 2: variable definitions and scaling for the connection index and author-ability index should be stated explicitly so readers can assess overlap with the LLM score.
[Data section] The abstract states '6,208 economics working papers' but does not specify the sampling frame or exclusion criteria; this belongs in the data section for replicability.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We are grateful to the referee for highlighting key areas where the manuscript's claims require stronger supporting evidence. Below we respond to each major comment and indicate planned revisions.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (LLM evaluator): the claim that the discipline-trained LLM provides a valid, pre-publication measure of idea quality independent of networks or publication outcomes is load-bearing for the entire decomposition, yet no training corpus details, fine-tuning procedure, correlation with blinded expert ratings, or out-of-sample predictive checks are reported. Without these, the score may internalize prestige-correlated text features.

Authors: The referee correctly notes that details on the LLM training are not fully reported. We will revise §3 to provide the training corpus details and fine-tuning procedure. However, we did not collect a separate set of blinded expert ratings for correlation analysis, limiting our ability to add that specific check. revision: partial
Referee: [§4] §4 (production function): the five-input function is estimated on the same sample used to construct the inputs (idea quality, execution, connections, etc.), with no mention of hold-out validation, external benchmarks, or robustness to alternative functional forms; this directly raises the circularity risk noted in the stress test and undermines the reported sequence of effect sizes.

Authors: We will add hold-out validation by splitting the sample and re-estimating the production function on the training subset to predict on the hold-out, along with checks for alternative functional forms. This will be included in the revised manuscript to address the circularity concern. revision: yes
Referee: [§5] §5 (results on connections): the two-channel claim (higher scores plus residual placement advantage) is central to the 'additive' conclusion, but the manuscript supplies no explicit test separating whether the residual channel reflects favoritism versus unmeasured execution or idea dimensions that the LLM rubric misses.

Authors: To better isolate the residual channel, we will include an additional test in §5 that examines the connection effect within subsamples where the LLM idea quality and execution scores are both high, to see if the placement advantage remains. This provides an indirect test against unmeasured quality dimensions. revision: yes

standing simulated objections not resolved

Correlation with blinded expert ratings, as no such validation was performed beyond the training annotations.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper constructs a text-based idea-quality score via a discipline-trained LLM asserted to operate without author names or publication outcomes, then combines this with four other inputs (execution rubric, connection index, author-ability index, off-the-shelf LM score) to estimate a five-input production function explaining journal placement on 6,208 papers. This is a standard regression of observed outcomes on independently constructed features; no equation reduces the LLM score or production-function coefficients to the journal outcome by construction, no self-citation chain is load-bearing, and no fitted parameter is relabeled as an out-of-sample prediction. The central decomposition therefore rests on the asserted independence of the LLM evaluator rather than on any definitional or statistical tautology internal to the reported estimates.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the LLM as an independent idea-quality measure and the functional form of the estimated production function; no other free parameters, axioms, or invented entities are identifiable from the abstract.

free parameters (1)

production function coefficients
The model estimates the relative contribution of each of the five inputs to journal placement, fitted to the 6208-paper dataset.

axioms (1)

domain assumption The LLM evaluator measures idea quality independently of author identity and publication outcome.
This assumption is required to break the prior circularity constraint described in the abstract.

pith-pipeline@v0.9.1-grok · 5794 in / 1322 out tokens · 28017 ms · 2026-06-28T07:51:59.914257+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 6 canonical work pages · 5 internal anchors

[1]

(2018) Science of science

Fortunato S, et al. (2018) Science of science. Science 359(6379):eaao0185

2018
[2]

Science 159(3810):56–63

Merton RK (1968) The Matthew effect in science. Science 159(3810):56–63

1968
[3]

J Polit Econ 102(1):194–203

Laband DN, Piette MJ (1994) Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. J Polit Econ 102(1):194–203

1994
[4]

J Financ Econ 111(1):251–270

Brogaard J, Engelberg J, Parsons CA (2014) Networks and productivity: Causal evidence from editor rotations. J Financ Econ 111(1):251–270

2014
[5]

Rev Econ Stat 100(1):45–50

Colussi T (2018) Social ties in academia: A friend is a treasure. Rev Econ Stat 100(1):45–50

2018
[6]

Medoff MH (2003) Editorial favoritism in economics? South Econ J 70(2):425–434

2003
[7]

J Polit Econ 132(9):2999–3024

Carrell SE, Figlio DN, Lusher L (2024) Clubs and networks in economics reviewing. J Polit Econ 132(9):2999–3024

2024
[8]

J Econ Lit 51(1):144–161

Card D, DellaVigna S (2013) Nine facts about top journals in economics. J Econ Lit 51(1):144–161

2013
[9]

Rev Econ Stat 102(1):195–217

Card D, DellaVigna S (2020) What do editors maximize? Evidence from four economics journals. Rev Econ Stat 102(1):195–217

2020
[10]

Res Policy 46(8):1416–1436

Wang J, Veugelers R, Stephan P (2017) Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Res Policy 46(8):1416–1436

2017
[11]

J Polit Econ 110(5):994–1034

Ellison G (2002) Evolving standards for academic publishing: A q-r theory. J Polit Econ 110(5):994–1034

2002
[12]

Econ J 132(648):2951–2991

Hengel E (2022) Publishing while female: Are women held to higher standards? Evidence from peer review. Econ J 132(648):2951–2991

2022
[13]

Proc Natl Acad Sci USA 114(48):12708–12713

Tomkins A, Zhang M, Heavlin WD (2017) Reviewer bias in single- versus double-blind peer review. Proc Natl Acad Sci USA 114(48):12708–12713

2017
[14]

LLMs learn scientific taste from institutional traces across the social sciences

Gong Z, Li N, Zhou H (2026) LLMs learn scientific taste from institutional traces across the social sciences. arXiv:2603.16659

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng L, et al. (2023) Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

(2024) Can large language models provide useful feedback on research papers? A large-scale empirical analysis

Liang W, et al. (2024) Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI 1(8):AIoa2400196

2024
[17]

arXiv:2502.00070

Pataranutaporn P, Powdthavee N, Achiwaranguprok C, Maes P (2025) Can AI solve the peer review crisis? A large- scale cross-model experiment of LLMs’ performance and biases in evaluating over 1000 economics papers. arXiv:2502.00070

work page arXiv 2025
[18]

J Econ Lit 58(2):419–470

Heckman JJ, Moktan S (2020) Publishing and promotion in economics: The tyranny of the top five. J Econ Lit 58(2):419–470

2020
[19]

Science 214(4523):881–886

Cole S, Cole JR, Simon GA (1981) Chance and consensus in peer review. Science 214(4523):881–886

1981
[20]

Rev Econ Stat 96(5):936–948

Ductor L, Fafchamps M, Goyal S, van der Leij M (2014) Social networks and research output. Rev Econ Stat 96(5):936–948

2014
[21]

J Am Stat Assoc 103(484):1481–1495

Anderson ML (2008) Multiple inference and gender differences in the effects of early intervention. J Am Stat Assoc 103(484):1481–1495

2008
[22]

(2023) G-Eval: NLG evaluation using GPT-4 with better human alignment

Liu Y, et al. (2023) G-Eval: NLG evaluation using GPT-4 with better human alignment. Proc 2023 Conf Empir Methods Nat Lang Process (EMNLP) 2511–2522

2023
[23]

Finetuned Language Models Are Zero-Shot Learners

Wei J, et al. (2021) Finetuned language models are zero-shot learners. arXiv:2109.01652

work page internal anchor Pith review Pith/arXiv arXiv 2021
[24]

Training language models to follow instructions with human feedback

Ouyang L, et al. (2022) Training language models to follow instructions with human feedback. arXiv:2203.02155

work page internal anchor Pith review Pith/arXiv arXiv 2022
[25]

The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research

Li N (2026) The ideation bottleneck: Decomposing the quality gap between AI-generated and human economics research. arXiv:2604.03338

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

J Econ Lit 56(1):115–156

Hamermesh DS (2018) Citations in economics: Measurement, uses, and impacts. J Econ Lit 56(1):115–156

2018
[27]

(2018) Low agreement among reviewers evaluating the same NIH grant applications

Pier EL, et al. (2018) Low agreement among reviewers evaluating the same NIH grant applications. Proc Natl Acad Sci USA 115(12):2952–2957

2018
[28]

Sci Adv 1(1):e1400005

Clauset A, Arbesman S, Larremore DB (2015) Systematic inequality and hierarchy in faculty hiring networks. Sci Adv 1(1):e1400005

2015
[29]

Merit or networks? What decides where research is published

Wuchty S, Jones BF, Uzzi B (2007) The increasing dominance of teams in production of knowledge. Science 316(5827):1036–1039. Supplementary Information for “Merit or networks? What decides where research is published” Numeric citations (N) refer to the reference list in the main text. This section collects the technical detail underlying the Methods and Re...

2007

[1] [1]

(2018) Science of science

Fortunato S, et al. (2018) Science of science. Science 359(6379):eaao0185

2018

[2] [2]

Science 159(3810):56–63

Merton RK (1968) The Matthew effect in science. Science 159(3810):56–63

1968

[3] [3]

J Polit Econ 102(1):194–203

Laband DN, Piette MJ (1994) Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. J Polit Econ 102(1):194–203

1994

[4] [4]

J Financ Econ 111(1):251–270

Brogaard J, Engelberg J, Parsons CA (2014) Networks and productivity: Causal evidence from editor rotations. J Financ Econ 111(1):251–270

2014

[5] [5]

Rev Econ Stat 100(1):45–50

Colussi T (2018) Social ties in academia: A friend is a treasure. Rev Econ Stat 100(1):45–50

2018

[6] [6]

Medoff MH (2003) Editorial favoritism in economics? South Econ J 70(2):425–434

2003

[7] [7]

J Polit Econ 132(9):2999–3024

Carrell SE, Figlio DN, Lusher L (2024) Clubs and networks in economics reviewing. J Polit Econ 132(9):2999–3024

2024

[8] [8]

J Econ Lit 51(1):144–161

Card D, DellaVigna S (2013) Nine facts about top journals in economics. J Econ Lit 51(1):144–161

2013

[9] [9]

Rev Econ Stat 102(1):195–217

Card D, DellaVigna S (2020) What do editors maximize? Evidence from four economics journals. Rev Econ Stat 102(1):195–217

2020

[10] [10]

Res Policy 46(8):1416–1436

Wang J, Veugelers R, Stephan P (2017) Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Res Policy 46(8):1416–1436

2017

[11] [11]

J Polit Econ 110(5):994–1034

Ellison G (2002) Evolving standards for academic publishing: A q-r theory. J Polit Econ 110(5):994–1034

2002

[12] [12]

Econ J 132(648):2951–2991

Hengel E (2022) Publishing while female: Are women held to higher standards? Evidence from peer review. Econ J 132(648):2951–2991

2022

[13] [13]

Proc Natl Acad Sci USA 114(48):12708–12713

Tomkins A, Zhang M, Heavlin WD (2017) Reviewer bias in single- versus double-blind peer review. Proc Natl Acad Sci USA 114(48):12708–12713

2017

[14] [14]

LLMs learn scientific taste from institutional traces across the social sciences

Gong Z, Li N, Zhou H (2026) LLMs learn scientific taste from institutional traces across the social sciences. arXiv:2603.16659

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng L, et al. (2023) Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

(2024) Can large language models provide useful feedback on research papers? A large-scale empirical analysis

Liang W, et al. (2024) Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI 1(8):AIoa2400196

2024

[17] [17]

arXiv:2502.00070

Pataranutaporn P, Powdthavee N, Achiwaranguprok C, Maes P (2025) Can AI solve the peer review crisis? A large- scale cross-model experiment of LLMs’ performance and biases in evaluating over 1000 economics papers. arXiv:2502.00070

work page arXiv 2025

[18] [18]

J Econ Lit 58(2):419–470

Heckman JJ, Moktan S (2020) Publishing and promotion in economics: The tyranny of the top five. J Econ Lit 58(2):419–470

2020

[19] [19]

Science 214(4523):881–886

Cole S, Cole JR, Simon GA (1981) Chance and consensus in peer review. Science 214(4523):881–886

1981

[20] [20]

Rev Econ Stat 96(5):936–948

Ductor L, Fafchamps M, Goyal S, van der Leij M (2014) Social networks and research output. Rev Econ Stat 96(5):936–948

2014

[21] [21]

J Am Stat Assoc 103(484):1481–1495

Anderson ML (2008) Multiple inference and gender differences in the effects of early intervention. J Am Stat Assoc 103(484):1481–1495

2008

[22] [22]

(2023) G-Eval: NLG evaluation using GPT-4 with better human alignment

Liu Y, et al. (2023) G-Eval: NLG evaluation using GPT-4 with better human alignment. Proc 2023 Conf Empir Methods Nat Lang Process (EMNLP) 2511–2522

2023

[23] [23]

Finetuned Language Models Are Zero-Shot Learners

Wei J, et al. (2021) Finetuned language models are zero-shot learners. arXiv:2109.01652

work page internal anchor Pith review Pith/arXiv arXiv 2021

[24] [24]

Training language models to follow instructions with human feedback

Ouyang L, et al. (2022) Training language models to follow instructions with human feedback. arXiv:2203.02155

work page internal anchor Pith review Pith/arXiv arXiv 2022

[25] [25]

The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research

Li N (2026) The ideation bottleneck: Decomposing the quality gap between AI-generated and human economics research. arXiv:2604.03338

work page internal anchor Pith review Pith/arXiv arXiv 2026

[26] [26]

J Econ Lit 56(1):115–156

Hamermesh DS (2018) Citations in economics: Measurement, uses, and impacts. J Econ Lit 56(1):115–156

2018

[27] [27]

(2018) Low agreement among reviewers evaluating the same NIH grant applications

Pier EL, et al. (2018) Low agreement among reviewers evaluating the same NIH grant applications. Proc Natl Acad Sci USA 115(12):2952–2957

2018

[28] [28]

Sci Adv 1(1):e1400005

Clauset A, Arbesman S, Larremore DB (2015) Systematic inequality and hierarchy in faculty hiring networks. Sci Adv 1(1):e1400005

2015

[29] [29]

Merit or networks? What decides where research is published

Wuchty S, Jones BF, Uzzi B (2007) The increasing dominance of teams in production of knowledge. Science 316(5827):1036–1039. Supplementary Information for “Merit or networks? What decides where research is published” Numeric citations (N) refer to the reference list in the main text. This section collects the technical detail underlying the Methods and Re...

2007