pith. machine review for the scientific record. sign in

arxiv: 2604.02678 · v1 · submitted 2026-04-03 · 📊 stat.ME · cs.AI· stat.AP

Recognition: 2 theorem links

· Lean Theorem

Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

Yanxun Xu, Yao Zhao, Zhiyue Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:55 UTC · model grok-4.3

classification 📊 stat.ME cs.AIstat.AP
keywords meta-analysisclinical trialseligibility criteriaagentic frameworkevidence synthesisLLMrisk ratioprecision medicine
0
0 comments X

The pith

EligMeta adjusts meta-analysis weights using eligibility criteria alignment between trials and target populations instead of statistical precision alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EligMeta, an agentic framework that turns natural-language queries into reproducible trial selection and then incorporates eligibility alignment into study weighting for meta-analysis. Conventional methods weight studies only by precision and ignore differences in patient populations defined by eligibility criteria. EligMeta uses LLMs to generate interpretable rules and parse metadata while keeping all weighting, filtering, and statistical pooling deterministic for reproducibility. This produces cohort-specific pooled estimates that better match the populations of interest. The approach matters for precision medicine because it quantifies how eligibility differences affect evidence synthesis, as shown by a shift in the olaparib adverse-events risk ratio from 2.18 to 1.97.

Core claim

EligMeta translates natural-language queries into reproducible trial selection by generating interpretable rules with LLMs and performing schema-constrained parsing of trial metadata, then structures eligibility criteria to compute similarity-based study weights reflecting population alignment between target and comparator trials, leading to adjusted pooled estimates in meta-analysis.

What carries the argument

Eligibility-aware weighting that computes similarity-based study weights from structured eligibility criteria to reflect population alignment.

If this is right

  • Reduces 4,044 candidate trials to 39 clinically relevant studies in a gastric cancer landscape analysis while recovering all 13 guideline-cited trials.
  • Shifts the pooled risk ratio for olaparib adverse events across four trials from 2.18 (95% CI 1.71-2.79) under conventional Mantel-Haenszel estimation to 1.97 (95% CI 1.76-2.20).
  • Enables scalable, reproducible evidence synthesis that accounts for clinical compatibility in addition to statistical precision.
  • Produces cohort-specific pooled estimates rather than general ones that ignore population differences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hybrid architecture could be tested on non-oncology meta-analyses to check whether eligibility weighting changes conclusions in other disease areas.
  • Integration with existing guideline-development workflows could allow automatic flagging of trials whose eligibility criteria diverge from the target population.
  • Further validation against larger trial registries would show whether the reduction from thousands of candidates to dozens scales without loss of relevant studies.

Load-bearing premise

LLM-generated rules and schema-constrained parsing correctly identify all clinically relevant trials while similarity-based weights accurately capture population alignment without introducing new bias.

What would settle it

A side-by-side manual expert review of the same queries that selects a different set of trials or applies different weights and produces a statistically different pooled estimate from the EligMeta result.

Figures

Figures reproduced from arXiv: 2604.02678 by Yanxun Xu, Yao Zhao, Zhiyue Zhang.

Figure 1
Figure 1. Figure 1: Overview of the EligMeta framework for clinical trial evidence synthesis. A free-text [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Trial selection and structuring workflow. Left: A free-text clinical query is translated [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Eligibility-aware meta-analysis workflow. Left: Free-text eligibility criteria, together [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Stepwise filtering results for the gastric cancer use case. The diagram follows the six rules [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of pooled risk ratio estimates for all-grade vomiting under classical precision [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Clinical evidence synthesis requires identifying relevant trials from large registries and aggregating results that account for population differences. While recent LLM-based approaches have automated components of systematic review, they do not support end-to-end evidence synthesis. Moreover, conventional meta-analysis weights studies by statistical precision without considering clinical compatibility reflected in eligibility criteria. We propose EligMeta, an agentic framework that integrates automated trial discovery with eligibility-aware meta-analysis, translating natural-language queries into reproducible trial selection and incorporating eligibility alignment into study weighting to produce cohort-specific pooled estimates. EligMeta employs a hybrid architecture separating LLM-based reasoning from deterministic execution: LLMs generate interpretable rules from natural-language queries and perform schema-constrained parsing of trial metadata, while all logical operations, weight computations, and statistical pooling are executed deterministically to ensure reproducibility. The framework structures eligibility criteria and computes similarity-based study weights reflecting population alignment between target and comparator trials. In a gastric cancer landscape analysis, EligMeta reduced 4,044 candidate trials to 39 clinically relevant studies through rule-based filtering, recovering all 13 guideline-cited trials. In an olaparib adverse events meta-analysis across four trials, eligibility-aware weighting shifted the pooled risk ratio from 2.18 (95% CI: 1.71-2.79) under conventional Mantel-Haenszel estimation to 1.97 (95% CI: 1.76-2.20), demonstrating quantifiable impact of incorporating eligibility alignment. EligMeta bridges automated trial discovery with eligibility-aware meta-analysis, providing a scalable and reproducible framework for evidence synthesis in precision medicine.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces EligMeta, a hybrid agentic framework for clinical trial meta-analysis that uses LLMs to translate natural-language queries into interpretable eligibility rules and perform schema-constrained parsing of trial metadata, while executing all weighting, similarity computations, and statistical pooling (e.g., Mantel-Haenszel) deterministically. It demonstrates the approach in two cases: filtering 4,044 gastric cancer trials down to 39 relevant studies while recovering all 13 guideline-cited trials, and reweighting an olaparib adverse-events meta-analysis across four trials to shift the pooled risk ratio from 2.18 (95% CI 1.71-2.79) under conventional estimation to 1.97 (95% CI 1.76-2.20) under eligibility-aware weights.

Significance. If the LLM-driven eligibility alignment step can be shown to produce weights that accurately reflect population compatibility without introducing new bias, the framework would offer a scalable, reproducible way to incorporate clinical eligibility criteria into meta-analytic weighting, moving beyond precision-only weights. The gastric-cancer filtering result and the concrete numeric shift in the olaparib example illustrate potential impact for precision-medicine evidence synthesis.

major comments (3)
  1. [olaparib adverse events meta-analysis demonstration] The central demonstration (olaparib pooled RR moving from 2.18 [1.71-2.79] to 1.97 [1.76-2.20]) rests on the claim that LLM-generated rules and schema-constrained parsing produce similarity scores that correctly measure population alignment. No expert review, inter-rater reliability assessment, or comparison of the parsed eligibility criteria against manual gold-standard annotations is reported, so it remains possible that the narrower CI and point-estimate change arise from systematic LLM parsing artifacts rather than true eligibility alignment.
  2. [methods for eligibility similarity computation] The eligibility similarity function and resulting study weights are presented as capturing population compatibility, yet the manuscript provides neither a sensitivity analysis (e.g., varying the similarity threshold or LLM prompt) nor a validation against independent expert-assigned weights. Without such checks, the quantitative impact attributed to eligibility-aware weighting cannot be isolated from potential biases in the LLM rule-generation step.
  3. [framework architecture and reproducibility claims] Reproducibility is asserted via the hybrid architecture (LLM reasoning separated from deterministic execution), but no code, data, or parsed eligibility schemas are released, and no quantification of variability across repeated LLM calls is supplied. This leaves the reported trial-selection and weighting results difficult to verify independently.
minor comments (2)
  1. The abstract and methods would benefit from an explicit equation or pseudocode for the eligibility similarity weight computation, including how the deterministic formula combines parsed criteria.
  2. [gastric cancer landscape analysis] The gastric-cancer landscape analysis reports recovery of all 13 guideline-cited trials but does not state the total number of guideline-cited trials that existed in the registry or provide a confusion matrix for the rule-based filter.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful review and valuable suggestions. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we will revise the manuscript to incorporate additional analyses and materials to address the concerns raised.

read point-by-point responses
  1. Referee: [olaparib adverse events meta-analysis demonstration] The central demonstration (olaparib pooled RR moving from 2.18 [1.71-2.79] to 1.97 [1.76-2.20]) rests on the claim that LLM-generated rules and schema-constrained parsing produce similarity scores that correctly measure population alignment. No expert review, inter-rater reliability assessment, or comparison of the parsed eligibility criteria against manual gold-standard annotations is reported, so it remains possible that the narrower CI and point-estimate change arise from systematic LLM parsing artifacts rather than true eligibility alignment.

    Authors: We agree that expert validation would provide stronger evidence for the accuracy of the eligibility similarity scores. The current manuscript presents the olaparib example as a demonstration of the framework's potential impact rather than a definitive validation study. The hybrid architecture is designed to mitigate parsing artifacts by using schema-constrained extraction, which produces structured, inspectable criteria. In the revised manuscript, we will add a limitations section explicitly discussing the absence of expert review and the possibility of LLM-induced biases. We will also include a small-scale comparison of parsed criteria for the four trials against manual annotations by the authors to illustrate the process. revision: partial

  2. Referee: [methods for eligibility similarity computation] The eligibility similarity function and resulting study weights are presented as capturing population compatibility, yet the manuscript provides neither a sensitivity analysis (e.g., varying the similarity threshold or LLM prompt) nor a validation against independent expert-assigned weights. Without such checks, the quantitative impact attributed to eligibility-aware weighting cannot be isolated from potential biases in the LLM rule-generation step.

    Authors: We acknowledge the lack of sensitivity analyses in the original submission. To address this, we will perform and report sensitivity analyses by varying the similarity threshold (e.g., 0.5, 0.7, 0.9) and different LLM prompts for rule generation, showing the stability of the pooled estimates. For validation against expert-assigned weights, this would require additional resources and is beyond the scope of the current work; however, we will add it as a key direction for future research in the discussion. These additions will help isolate the effect of eligibility-aware weighting. revision: partial

  3. Referee: [framework architecture and reproducibility claims] Reproducibility is asserted via the hybrid architecture (LLM reasoning separated from deterministic execution), but no code, data, or parsed eligibility schemas are released, and no quantification of variability across repeated LLM calls is supplied. This leaves the reported trial-selection and weighting results difficult to verify independently.

    Authors: We take the reproducibility concern seriously. Although the manuscript emphasizes the hybrid design to promote reproducibility, we agree that open release of materials is necessary for independent verification. In the revised version, we will include a link to a public GitHub repository containing the full code, the parsed eligibility schemas for the case studies, and the trial metadata used. Additionally, we will quantify variability by repeating the LLM calls (e.g., 5 runs with temperature 0.1) for the olaparib and gastric cancer cases and report the range of similarity scores and resulting pooled estimates. revision: yes

Circularity Check

0 steps flagged

Eligibility similarity weights derived from external trial metadata via deterministic formulas; no reduction to fitted parameters or self-citation chains

full rationale

The paper's derivation chain separates LLM-based rule generation and schema-constrained parsing from deterministic weight computation and statistical pooling. Eligibility-aware weights are computed from parsed criteria using similarity measures applied to external trial metadata, then fed into standard Mantel-Haenszel pooling. The reported RR shift (2.18 to 1.97) is produced by these fixed operations rather than any equation that redefines the output in terms of the same fitted inputs. No self-citation is load-bearing for the central claim, and no ansatz or uniqueness theorem is smuggled in. The framework remains self-contained against external benchmarks, warranting only a minor score for possible incidental self-citation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that LLM rule generation and eligibility similarity computation add value beyond standard methods; no explicit free parameters are named, but the similarity function itself is an implicit modeling choice.

free parameters (1)
  • eligibility similarity function parameters
    The weighting scheme that converts eligibility overlap into study weights is not specified as parameter-free in the abstract.
axioms (1)
  • domain assumption LLM-generated rules from natural-language queries are sufficiently accurate and complete for trial filtering
    The entire pipeline depends on this step to reduce 4044 trials to 39 without missing guideline-cited studies.

pith-pipeline@v0.9.0 · 5593 in / 1273 out tokens · 36170 ms · 2026-05-13T18:55:42.854620+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Systematic reviews: identifying relevant studies for systematic reviews.Bmj, 309(6964):1286–1291, 1994

    Kay Dickersin, Roberta Scherer, and Carol Lefebvre. Systematic reviews: identifying relevant studies for systematic reviews.Bmj, 309(6964):1286–1291, 1994

  2. [2]

    Fixed-and random-effects models in meta-analysis.Psy- chological methods, 3(4):486, 1998

    Larry V Hedges and Jack L Vevea. Fixed-and random-effects models in meta-analysis.Psy- chological methods, 3(4):486, 1998

  3. [3]

    Extending dersimonian and laird’s methodology to perform network meta-analyses with random inconsistency effects.Statistics in medicine, 35(6):819–839, 2016

    Dan Jackson, Martin Law, Jessica K Barrett, Rebecca Turner, Julian PT Higgins, Georgia Salanti, and Ian R White. Extending dersimonian and laird’s methodology to perform network meta-analyses with random inconsistency effects.Statistics in medicine, 35(6):819–839, 2016

  4. [4]

    Joanna IntHout, John Ioannidis, and George F Borm. The hartung-knapp-sidik-jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard dersimonian-laird method.BMC medical research methodology, 14(1):25, 2014

  5. [5]

    Establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics

    K Lee, Y Mai, Zongzhi Liu, Kalpana Raja, Michelle K Higashi, T Jun, M Ma, T Wang, L Ai, E Calay, et al. Establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics. medRxiv, pages 2024–02, 2024

  6. [6]

    Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

    Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

  7. [7]

    Biogpt: generative pre-trained transformer for biomedical text generation and mining.Brief- ings in bioinformatics, 23(6):bbac409, 2022

    Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. Biogpt: generative pre-trained transformer for biomedical text generation and mining.Brief- ings in bioinformatics, 23(6):bbac409, 2022

  8. [8]

    Alpapico: Extraction of pico frames from clinical trial documents using llms.Methods, 226:78–88, 2024

    Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Ku- mar Naskar, and Debasis Ganguly. Alpapico: Extraction of pico frames from clinical trial documents using llms.Methods, 226:78–88, 2024

  9. [9]

    Surabhi Datta, Kyeryoung Lee, Hunki Paek, Frank J Manion, Nneka Ofoegbu, Jingcheng Du, Ying Li, Liang-Chin Huang, Jingqi Wang, Bin Lin, et al. Autocriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models.Journal of the American Medical Informatics Association, 31(2):375–385, 2024

  10. [10]

    Zero-shot clinical trial patient matching with llms.NEJM AI, 2(1): AIcs2400360, 2025

    Michael Wornow, Alejandro Lozano, Dev Dash, Jenelle Jindal, Kenneth W Mahaffey, and Nigam H Shah. Zero-shot clinical trial patient matching with llms.NEJM AI, 2(1): AIcs2400360, 2025. 14

  11. [11]

    Patient2trial: From patient to participant in clinical trials using large language models.Informatics in Medicine Unlocked, page 101615, 2025

    Surabhi Datta, Kyeryoung Lee, Liang-Chin Huang, Hunki Paek, Roger Gildersleeve, Jonathan Gold, Deepak Pillai, Jingqi Wang, Mitchell K Higashi, Lizheng Shi, et al. Patient2trial: From patient to participant in clinical trials using large language models.Informatics in Medicine Unlocked, page 101615, 2025

  12. [12]

    Matching pa- tients to clinical trials with large language models.Nature communications, 15(1):9074, 2024

    Qiao Jin, Zifeng Wang, Charalampos S Floudas, Fangyuan Chen, Changlin Gong, Dara Bracken-Clarke, Elisabetta Xue, Yifan Yang, Jimeng Sun, and Zhiyong Lu. Matching pa- tients to clinical trials with large language models.Nature communications, 15(1):9074, 2024

  13. [13]

    Seetrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials

    Kyeryoung Lee, Hunki Paek, Liang-Chin Huang, C Beau Hilton, Surabhi Datta, Josh Hi- gashi, Nneka Ofoegbu, Jingqi Wang, Samuel M Rubinstein, Andrew J Cowan, et al. Seetrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials. medRxiv, pages 2024–01, 2024

  14. [14]

    Automatically extracting numerical results from randomized controlled trials with large language models

    Hye Sun Yun, David Pogrebitskiy, Iain J Marshall, and Byron C Wallace. Automatically extracting numerical results from randomized controlled trials with large language models. arXiv preprint arXiv:2405.01686, 2024

  15. [15]

    Empowering meta-analysis: Lever- aging large language models for scientific synthesis

    Jawad Ibn Ahad, Rafeed Mohammad Sultan, Abraham Kaikobad, Fuad Rahman, Moham- mad Ruhul Amin, Nabeel Mohammed, and Shafin Rahman. Empowering meta-analysis: Lever- aging large language models for scientific synthesis. In2024 IEEE International Conference on Big Data (BigData), pages 1615–1624. IEEE, 2024

  16. [16]

    The unified medical language system (umls): integrating biomedical terminology.Nucleic acids research, 32(suppl_1):D267–D270, 2004

    Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology.Nucleic acids research, 32(suppl_1):D267–D270, 2004

  17. [17]

    Clustering clinical trials with similar eligibility criteria features.Journal of biomedical informatics, 52: 112–120, 2014

    Tianyong Hao, Alexander Rusanov, Mary Regina Boland, and Chunhua Weng. Clustering clinical trials with similar eligibility criteria features.Journal of biomedical informatics, 52: 112–120, 2014

  18. [18]

    Analysis of eligibility criteria clusters based on large language models for clinical trial design.Journal of the American Medical Informatics Association, 32(3):447–458, 2025

    Alban Bornet, Philipp Khlebnikov, Florian Meer, Quentin Haas, Anthony Yazdani, Boya Zhang, Poorya Amini, and Douglas Teodoro. Analysis of eligibility criteria clusters based on large language models for clinical trial design.Journal of the American Medical Informatics Association, 32(3):447–458, 2025

  19. [19]

    A survey of llm-based agents in medicine: How far are we from baymax?arXiv preprint arXiv:2502.11211, 2025

    WenxuanWang, ZizhanMa, ZhengWang, ChenghanWu, WentingChen, XiangLi, andYixuan Yuan. A survey of llm-based agents in medicine: How far are we from baymax?arXiv preprint arXiv:2502.11211, 2025

  20. [20]

    Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

  21. [21]

    Introducing codex.https://openai.com/index/introducing-codex/, 2025

    OpenAI. Introducing codex.https://openai.com/index/introducing-codex/, 2025. Ac- cessed: 2026-03-17

  22. [22]

    Claude code.https://www.anthropic.com/claude-code, 2026

    Anthropic. Claude code.https://www.anthropic.com/claude-code, 2026. Accessed: 2026- 03-17. 15

  23. [23]

    Statistical aspects of the analysis of data from retro- spective studies of disease.Journal of the national cancer institute, 22(4):719–748, 1959

    Nathan Mantel and William Haenszel. Statistical aspects of the analysis of data from retro- spective studies of disease.Journal of the national cancer institute, 22(4):719–748, 1959

  24. [24]

    Nccn clinical practice guidelines in oncology: Gas- tric cancer (version 1.2026), 2026

    National Comprehensive Cancer Network. Nccn clinical practice guidelines in oncology: Gas- tric cancer (version 1.2026), 2026. URLhttps://www.nccn.org/professionals/physician_ gls/pdf/gastric.pdf. Accessed: 2026-04-02

  25. [25]

    Introducing gpt-5.4.https://openai.com/index/introducing-gpt-5-4/, March

    OpenAI. Introducing gpt-5.4.https://openai.com/index/introducing-gpt-5-4/, March

  26. [26]

    Accessed: 2026-04-01

    OpenAI product announcement. Accessed: 2026-04-01

  27. [27]

    Specific toxicity of maintenance olaparib versus placebo in advanced malignancies: a systematic review and meta-analysis.Anticancer Research, 40(2):597–608, 2020

    Angela Dalia Ricci, Alessandro Rizzo, Marco Novelli, Simona Tavolari, Andrea Palloni, Nas- tassja Tober, Francesca Abbati, Veronica Mollica, Stefania De Lorenzo, Daniela Turchetti, et al. Specific toxicity of maintenance olaparib versus placebo in advanced malignancies: a systematic review and meta-analysis.Anticancer Research, 40(2):597–608, 2020

  28. [28]

    Maintenance olaparib for germline brca-mutated metastatic pancreatic cancer.New England Journal of Medicine, 381(4):317–327, 2019

    Talia Golan, Pascal Hammel, Michele Reni, Eric Van Cutsem, Teresa Macarulla, Michael J Hall, Joon-Oh Park, Daniel Hochhauser, Dirk Arnold, Do-Youn Oh, et al. Maintenance olaparib for germline brca-mutated metastatic pancreatic cancer.New England Journal of Medicine, 381(4):317–327, 2019

  29. [29]

    Maintenance olaparib in patients with newly diagnosed advanced ovarian cancer.New England Journal of Medicine, 379(26):2495–2505, 2018

    Kathleen Moore, Nicoletta Colombo, Giovanni Scambia, Byoung-Gie Kim, Ana Oaknin, Michael Friedlander, Alla Lisyanskaya, Anne Floquet, Alexandra Leary, Gabe S Sonke, et al. Maintenance olaparib in patients with newly diagnosed advanced ovarian cancer.New England Journal of Medicine, 379(26):2495–2505, 2018

  30. [30]

    Jonathan Ledermann, Philipp Harter, Charlie Gourley, Michael Friedlander, Ignace Vergote, Gordon Rustin, Clare L Scott, Werner Meier, Ronnie Shapira-Frommer, Tamar Safra, et al. Olaparib maintenance therapy in patients with platinum-sensitive relapsed serous ovarian can- cer: a preplanned retrospective analysis of outcomes by brca status in a randomised p...

  31. [31]

    Eric Pujade-Lauraine, Jonathan A Ledermann, Frédéric Selle, Val Gebski, Richard T Penson, Amit M Oza, Jacob Korach, Tomasz Huzarski, Andrés Poveda, Sandro Pignata, et al. Olaparib tablets as maintenance therapy in patients with platinum-sensitive, relapsed ovarian cancer and a brca1/2 mutation (solo2/engot-ov21): a double-blind, randomised, placebo-contro...

  32. [32]

    Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

    James Robins, Sander Greenland, and Norman E Breslow. A general estimator for the variance of the mantel haenszel odds ratio.American journal of epidemiology, 124(5):719–723, 1986. 16 Supplementary Materials for “Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis" A EligMeta Implementation Details A.1 LLM-based Rul...

  33. [33]

    Indication/Condition

  34. [34]

    Endpoints: Overall Survival

  35. [35]

    Variants of uncertain clinical significance

    Endpoints: Progression-Free Survival 16 D.2 Results from GPT-5.4 and Codex Given the query without further manual intervention, both systems autonomously executed land- scape analyses. We present the workflow and outputs for each approach, followed by a comparative assessment of coverage, accuracy, and adherence to predefined eligibility criteria. Codex g...