pith. sign in

arxiv: 2606.03763 · v1 · pith:4ZKVXJNQnew · submitted 2026-06-02 · 💰 econ.GN · cs.AI· q-fin.EC

Merit or networks? What decides where research is published

Pith reviewed 2026-06-28 07:51 UTC · model grok-4.3

classification 💰 econ.GN cs.AIq-fin.EC
keywords economics publishingidea qualityjournal placementsocial connectionsLLM evaluationmeritocracyprestige ladderworking papers
0
0 comments X

The pith

Economics publishing follows a prestige ladder: execution quality sets the floor, idea quality grades the rungs, and connections set a ceiling only at the top journals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures idea quality of economics papers directly from their text using a trained LLM that ignores author names and outcomes. It combines this score with execution quality, a connection index, author ability, and language-model text metrics to model journal placement for 6,208 working papers. Execution quality emerges as the largest overall input and creates a minimum threshold for any placement. Idea quality then differentiates outcomes across mid-level journals. Connections add an independent boost, strongest near the most selective outlets, by both raising idea scores and improving placement odds at any given score. The advantage stays bounded, so ordinary ideas rarely reach the apex even with connections.

Core claim

Using a text-based idea quality score from an LLM evaluator, the study finds that journal placement in economics follows a sequence along the prestige ladder. Execution quality establishes a meritocratic floor and remains the largest input overall. Text-legible idea quality grades the intermediate rungs. Connections impose a favoritism ceiling that matters most near the apex. Connections operate through two additive channels: connected authors produce higher-scoring papers, and at equal scores their papers still place better. Yet the advantage is bounded, as even the highest-scoring papers face real friction reaching the visible journal ladder. The result nests rather than chooses between me

What carries the argument

The prestige ladder model sequencing execution quality as floor, text-legible idea quality as rungs, and connections as ceiling, estimated via a five-input production function for journal placement.

If this is right

  • Higher execution quality raises placement odds more than any other single input.
  • Idea quality predicts movement across intermediate journal tiers once execution clears the floor.
  • Connection index increases placement probability at equal idea and execution scores.
  • The connection advantage is largest for the most selective journals.
  • Even the highest idea-quality papers encounter barriers to the apex.
  • Ordinary connected papers still rarely reach top outlets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same text-based scoring method could test whether the floor-rungs-ceiling sequence holds in other fields.
  • Journals could explore blind idea-quality screens to limit connection effects at the top.
  • Interventions could target either the idea-generation channel or the review channel of the connection advantage.
  • Changes in the relative weights of these inputs over time would reveal whether publishing is shifting toward greater or lesser meritocracy.

Load-bearing premise

The discipline-trained LLM evaluator scores idea quality from text without seeing author names or outcomes and provides a valid unbiased measure that can be used ahead of publication fate.

What would settle it

If the LLM idea quality scores show no correlation with journal placement after controlling for execution and connections, or if independent blind expert ratings of the same papers produce different rankings that eliminate the connection effect.

read the original abstract

Does scientific publishing reward the quality of ideas or the advantage of connections? The question is universal to prestige-driven science, yet it has resisted decades of study because a paper's quality could not be gauged ahead of its publication fate without using that fate as the yardstick. We break this constraint by measuring a paper's idea quality directly from its text, before publication, using a discipline-trained LLM evaluator that scores the idea without seeing author names or outcomes. Using economics as a case study, we combine this text-legible idea-quality score with an execution-quality rubric, a connection index, an author-ability index, and an off-the-shelf language-model text score to estimate a five-input production function for journal placement across 6,208 economics working papers. The inputs are not rivals but a sequence along the ladder of prestige. Execution sets a meritocratic floor and is the largest input overall. Text-legible idea quality grades the rungs in between. Connections set a favoritism ceiling that bites mainly near the apex, the most selective journals. Connections work through two additive channels: connected authors write papers that score higher, and at equal scores their papers are still more likely to place better. Yet this advantage is bounded. Connections raise the odds of every rung without making the apex the typical outcome for ordinary ideas, and even the highest-scoring papers face real friction reaching the visible journal ladder. The result nests, rather than chooses between, the meritocracy and network accounts of how science is published.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that a discipline-trained LLM can score the idea quality of economics working papers directly from text (without author names or outcomes), and when combined with execution quality, a connection index, author-ability index, and an off-the-shelf LM text score, a five-input production function estimated on 6,208 working papers shows execution setting a meritocratic floor (largest input overall), idea quality grading intermediate rungs, and connections setting a favoritism ceiling that operates mainly at the most selective journals via two additive channels: connected authors produce higher-scoring papers and, conditional on score, still place better.

Significance. If the LLM score is shown to be a valid, independent measure of idea quality, the decomposition would provide a novel empirical nesting of meritocratic and network accounts of journal placement, with quantitative estimates of each input's contribution and bounds on the scope of favoritism. The approach could generalize beyond economics if the core measurement innovation holds.

major comments (3)
  1. [Abstract, §3] Abstract and §3 (LLM evaluator): the claim that the discipline-trained LLM provides a valid, pre-publication measure of idea quality independent of networks or publication outcomes is load-bearing for the entire decomposition, yet no training corpus details, fine-tuning procedure, correlation with blinded expert ratings, or out-of-sample predictive checks are reported. Without these, the score may internalize prestige-correlated text features.
  2. [§4] §4 (production function): the five-input function is estimated on the same sample used to construct the inputs (idea quality, execution, connections, etc.), with no mention of hold-out validation, external benchmarks, or robustness to alternative functional forms; this directly raises the circularity risk noted in the stress test and undermines the reported sequence of effect sizes.
  3. [§5] §5 (results on connections): the two-channel claim (higher scores plus residual placement advantage) is central to the 'additive' conclusion, but the manuscript supplies no explicit test separating whether the residual channel reflects favoritism versus unmeasured execution or idea dimensions that the LLM rubric misses.
minor comments (2)
  1. [Table 1, Figure 2] Table 1 and Figure 2: variable definitions and scaling for the connection index and author-ability index should be stated explicitly so readers can assess overlap with the LLM score.
  2. [Data section] The abstract states '6,208 economics working papers' but does not specify the sampling frame or exclusion criteria; this belongs in the data section for replicability.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We are grateful to the referee for highlighting key areas where the manuscript's claims require stronger supporting evidence. Below we respond to each major comment and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (LLM evaluator): the claim that the discipline-trained LLM provides a valid, pre-publication measure of idea quality independent of networks or publication outcomes is load-bearing for the entire decomposition, yet no training corpus details, fine-tuning procedure, correlation with blinded expert ratings, or out-of-sample predictive checks are reported. Without these, the score may internalize prestige-correlated text features.

    Authors: The referee correctly notes that details on the LLM training are not fully reported. We will revise §3 to provide the training corpus details and fine-tuning procedure. However, we did not collect a separate set of blinded expert ratings for correlation analysis, limiting our ability to add that specific check. revision: partial

  2. Referee: [§4] §4 (production function): the five-input function is estimated on the same sample used to construct the inputs (idea quality, execution, connections, etc.), with no mention of hold-out validation, external benchmarks, or robustness to alternative functional forms; this directly raises the circularity risk noted in the stress test and undermines the reported sequence of effect sizes.

    Authors: We will add hold-out validation by splitting the sample and re-estimating the production function on the training subset to predict on the hold-out, along with checks for alternative functional forms. This will be included in the revised manuscript to address the circularity concern. revision: yes

  3. Referee: [§5] §5 (results on connections): the two-channel claim (higher scores plus residual placement advantage) is central to the 'additive' conclusion, but the manuscript supplies no explicit test separating whether the residual channel reflects favoritism versus unmeasured execution or idea dimensions that the LLM rubric misses.

    Authors: To better isolate the residual channel, we will include an additional test in §5 that examines the connection effect within subsamples where the LLM idea quality and execution scores are both high, to see if the placement advantage remains. This provides an indirect test against unmeasured quality dimensions. revision: yes

standing simulated objections not resolved
  • Correlation with blinded expert ratings, as no such validation was performed beyond the training annotations.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper constructs a text-based idea-quality score via a discipline-trained LLM asserted to operate without author names or publication outcomes, then combines this with four other inputs (execution rubric, connection index, author-ability index, off-the-shelf LM score) to estimate a five-input production function explaining journal placement on 6,208 papers. This is a standard regression of observed outcomes on independently constructed features; no equation reduces the LLM score or production-function coefficients to the journal outcome by construction, no self-citation chain is load-bearing, and no fitted parameter is relabeled as an out-of-sample prediction. The central decomposition therefore rests on the asserted independence of the LLM evaluator rather than on any definitional or statistical tautology internal to the reported estimates.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the LLM as an independent idea-quality measure and the functional form of the estimated production function; no other free parameters, axioms, or invented entities are identifiable from the abstract.

free parameters (1)
  • production function coefficients
    The model estimates the relative contribution of each of the five inputs to journal placement, fitted to the 6208-paper dataset.
axioms (1)
  • domain assumption The LLM evaluator measures idea quality independently of author identity and publication outcome.
    This assumption is required to break the prior circularity constraint described in the abstract.

pith-pipeline@v0.9.1-grok · 5794 in / 1322 out tokens · 28017 ms · 2026-06-28T07:51:59.914257+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 6 canonical work pages · 5 internal anchors

  1. [1]

    (2018) Science of science

    Fortunato S, et al. (2018) Science of science. Science 359(6379):eaao0185

  2. [2]

    Science 159(3810):56–63

    Merton RK (1968) The Matthew effect in science. Science 159(3810):56–63

  3. [3]

    J Polit Econ 102(1):194–203

    Laband DN, Piette MJ (1994) Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. J Polit Econ 102(1):194–203

  4. [4]

    J Financ Econ 111(1):251–270

    Brogaard J, Engelberg J, Parsons CA (2014) Networks and productivity: Causal evidence from editor rotations. J Financ Econ 111(1):251–270

  5. [5]

    Rev Econ Stat 100(1):45–50

    Colussi T (2018) Social ties in academia: A friend is a treasure. Rev Econ Stat 100(1):45–50

  6. [6]

    Medoff MH (2003) Editorial favoritism in economics? South Econ J 70(2):425–434

  7. [7]

    J Polit Econ 132(9):2999–3024

    Carrell SE, Figlio DN, Lusher L (2024) Clubs and networks in economics reviewing. J Polit Econ 132(9):2999–3024

  8. [8]

    J Econ Lit 51(1):144–161

    Card D, DellaVigna S (2013) Nine facts about top journals in economics. J Econ Lit 51(1):144–161

  9. [9]

    Rev Econ Stat 102(1):195–217

    Card D, DellaVigna S (2020) What do editors maximize? Evidence from four economics journals. Rev Econ Stat 102(1):195–217

  10. [10]

    Res Policy 46(8):1416–1436

    Wang J, Veugelers R, Stephan P (2017) Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Res Policy 46(8):1416–1436

  11. [11]

    J Polit Econ 110(5):994–1034

    Ellison G (2002) Evolving standards for academic publishing: A q-r theory. J Polit Econ 110(5):994–1034

  12. [12]

    Econ J 132(648):2951–2991

    Hengel E (2022) Publishing while female: Are women held to higher standards? Evidence from peer review. Econ J 132(648):2951–2991

  13. [13]

    Proc Natl Acad Sci USA 114(48):12708–12713

    Tomkins A, Zhang M, Heavlin WD (2017) Reviewer bias in single- versus double-blind peer review. Proc Natl Acad Sci USA 114(48):12708–12713

  14. [14]

    LLMs learn scientific taste from institutional traces across the social sciences

    Gong Z, Li N, Zhou H (2026) LLMs learn scientific taste from institutional traces across the social sciences. arXiv:2603.16659

  15. [15]

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Zheng L, et al. (2023) Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685

  16. [16]

    (2024) Can large language models provide useful feedback on research papers? A large-scale empirical analysis

    Liang W, et al. (2024) Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI 1(8):AIoa2400196

  17. [17]

    arXiv:2502.00070

    Pataranutaporn P, Powdthavee N, Achiwaranguprok C, Maes P (2025) Can AI solve the peer review crisis? A large- scale cross-model experiment of LLMs’ performance and biases in evaluating over 1000 economics papers. arXiv:2502.00070

  18. [18]

    J Econ Lit 58(2):419–470

    Heckman JJ, Moktan S (2020) Publishing and promotion in economics: The tyranny of the top five. J Econ Lit 58(2):419–470

  19. [19]

    Science 214(4523):881–886

    Cole S, Cole JR, Simon GA (1981) Chance and consensus in peer review. Science 214(4523):881–886

  20. [20]

    Rev Econ Stat 96(5):936–948

    Ductor L, Fafchamps M, Goyal S, van der Leij M (2014) Social networks and research output. Rev Econ Stat 96(5):936–948

  21. [21]

    J Am Stat Assoc 103(484):1481–1495

    Anderson ML (2008) Multiple inference and gender differences in the effects of early intervention. J Am Stat Assoc 103(484):1481–1495

  22. [22]

    (2023) G-Eval: NLG evaluation using GPT-4 with better human alignment

    Liu Y, et al. (2023) G-Eval: NLG evaluation using GPT-4 with better human alignment. Proc 2023 Conf Empir Methods Nat Lang Process (EMNLP) 2511–2522

  23. [23]

    Finetuned Language Models Are Zero-Shot Learners

    Wei J, et al. (2021) Finetuned language models are zero-shot learners. arXiv:2109.01652

  24. [24]

    Training language models to follow instructions with human feedback

    Ouyang L, et al. (2022) Training language models to follow instructions with human feedback. arXiv:2203.02155

  25. [25]

    The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research

    Li N (2026) The ideation bottleneck: Decomposing the quality gap between AI-generated and human economics research. arXiv:2604.03338

  26. [26]

    J Econ Lit 56(1):115–156

    Hamermesh DS (2018) Citations in economics: Measurement, uses, and impacts. J Econ Lit 56(1):115–156

  27. [27]

    (2018) Low agreement among reviewers evaluating the same NIH grant applications

    Pier EL, et al. (2018) Low agreement among reviewers evaluating the same NIH grant applications. Proc Natl Acad Sci USA 115(12):2952–2957

  28. [28]

    Sci Adv 1(1):e1400005

    Clauset A, Arbesman S, Larremore DB (2015) Systematic inequality and hierarchy in faculty hiring networks. Sci Adv 1(1):e1400005

  29. [29]

    Merit or networks? What decides where research is published

    Wuchty S, Jones BF, Uzzi B (2007) The increasing dominance of teams in production of knowledge. Science 316(5827):1036–1039. Supplementary Information for “Merit or networks? What decides where research is published” Numeric citations (N) refer to the reference list in the main text. This section collects the technical detail underlying the Methods and Re...