Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

Theodore O. Cochran

arxiv: 2605.18490 · v1 · pith:AIYM7H2Rnew · submitted 2026-05-18 · 💻 cs.CL · cs.IR

Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

Theodore O. Cochran This is my paper

Pith reviewed 2026-05-20 11:23 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords vector RAGLLM-compiled wikiresearch synthesiscitation supportgroundedness evaluationpreregistered comparisonmulti-domain corpustoken cost analysis

0 comments

The pith

Grounded research synthesis splits into separate skills where no single architecture wins on organization, citations, and cost together.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares a standard vector retrieval system against an LLM-built markdown wiki for answering questions drawn from a small collection of 24 research papers. Both systems used the same underlying model to generate answers to 13 questions, and blinded LLM judges scored the outputs on how well they organized evidence, how grounded they were, and how accurately the citations supported each claim. The wiki performed better at linking findings across papers and at making sure cited pages backed the exact statements being made. A query-decomposition version of retrieval recovered much of the cross-paper advantage at lower token cost but still lagged on precise citation support. The core result is that these capabilities do not move together, so designers must choose which trade-offs to accept.

Core claim

In a preregistered head-to-head test, the LLM-compiled wiki produced stronger cross-paper synthesis and better claim-level citation support than single-round vector RAG, while RAG handled single-fact lookups adequately and used far fewer tokens per query. A decomposition-based RAG variant closed most of the synthesis gap at reduced cost but did not match the wiki on citation precision. The experiment therefore shows that grounded research synthesis is not one unified skill but a set of distinct requirements that different retrieval and compilation methods satisfy to different degrees.

What carries the argument

The direct comparison of a single-round vector RAG pipeline versus an LLM-compiled markdown wiki, run on the same 13 questions over 24 papers and scored by blinded LLM judges on organization, groundedness, and claim-specific citation accuracy.

If this is right

Wiki compilation improves cross-document connections and exact claim citation support compared with basic retrieval.
Per-query token cost can be higher for wiki-style systems, preventing recovery of the upfront compilation expense under the tested conditions.
Breaking queries into sub-questions inside a retrieval pipeline recovers much of the synthesis benefit without the full wiki cost.
Overall groundedness scores and claim-level citation checks can point to different strengths, so both metrics are needed to evaluate systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Systems could combine wiki-style link structures for citation reliability with decomposition-based retrieval to control token use.
Evaluation of research-assistant tools should track organization, citation fidelity, and cost as independent dimensions rather than a single composite score.
The pattern observed on a 24-paper corpus may scale to larger collections only if the same separation of capabilities appears in bigger settings.
Automated citation checking may require its own calibration data because it diverged from the broader groundedness rubric in this study.

Load-bearing premise

The comparative results rest on the assumption that blinded LLM judges can reliably and without bias score how well answers are organized, how grounded they are, and whether the supplied citations actually support each individual claim.

What would settle it

Re-scoring the same set of generated answers with human experts instead of LLM judges and checking whether the relative ordering of the wiki and RAG systems on organization, groundedness, and citation support stays the same.

read the original abstract

We preregistered a comparison of two ways to help an LLM answer questions over a small research corpus: a single-round Vector RAG system and an LLM-compiled markdown wiki. Both systems answered the same 13 questions over 24 papers using the same answer-generating model, and their answers were scored by blinded LLM judges. The wiki scored much better at connecting findings across papers, but its advantage in answer organization was not strong after judge adjustment. RAG met the preregistered test for single-fact lookup questions. The clean query-side cost result went against the expected wiki advantage: under the tested setup, the wiki used far more query tokens than RAG, so it could not recover any upfront build cost through cheaper queries. Two exploratory analyses changed how we interpret the result. First, claim-level citation checking favored the wiki: its cited pages more often supported the exact claims being made, even though RAG scored better on the overall groundedness rubric. Second, a decomposition-based RAG variant recovered most of the wiki's advantage on cross-paper synthesis at lower LLM-token cost, but it did not recover the wiki advantage in claim-by-claim citation support. The main conclusion is that grounded research synthesis is not a single capability. Systems can differ in how well they organize evidence, how well their citations support each claim, and how much they cost to run. In this study, no architecture was best on all three.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This preregistered head-to-head finds that wiki and RAG trade off on synthesis, claim-level citation support, and cost, but the differences rest on LLM judge scores that lack calibration checks.

read the letter

The paper's core result is that grounded research synthesis splits into separable pieces. In their setup the wiki did better at linking findings across papers and at making sure cited passages actually backed the specific claims, while a decomposition RAG variant recovered most of the synthesis edge at lower token cost and RAG itself handled single-fact lookups cleanly. No system won on organization, citation fidelity, and query cost at once, and the wiki's token use ran higher than expected. That pattern is the useful takeaway for anyone choosing between these approaches on small corpora.

Referee Report

2 major / 2 minor

Summary. The paper reports a preregistered empirical comparison of a single-round Vector RAG system versus an LLM-compiled markdown wiki for answering 13 questions drawn from a corpus of 24 research papers. Both systems use the same answer-generating LLM; outputs are scored by blinded LLM judges on organization, overall groundedness, and claim-level citation support. The wiki shows advantages in cross-paper synthesis and claim-by-claim citation fidelity, while RAG scores higher on overall groundedness and query-time token cost; a decomposition RAG variant recovers much of the synthesis benefit at lower cost but not the citation-support advantage. The central claim is that grounded research synthesis is not a unitary capability and that the three dimensions (organization, citation fidelity, cost) can be traded off independently, with no architecture dominating all three in this study.

Significance. If the metric divergences hold, the work usefully demonstrates that research-synthesis performance decomposes into separable sub-capabilities rather than being captured by any single architecture or rubric. The preregistration, explicit separation of confirmatory versus exploratory analyses, and use of blinded judges are genuine strengths that increase the credibility of the reported differences. The small multi-domain corpus permits detailed claim-level inspection but also bounds the scope of the generalization offered.

major comments (2)

[Evaluation and Results] The primary evidence for distinct capabilities rests on divergences between the overall-groundedness rubric and the claim-by-claim citation-support scores, as well as on the post-adjustment organization results. These metrics are produced by blinded LLM judges; the manuscript does not report human-expert calibration, inter-judge agreement, or prompt-robustness checks on the 13-question set. Because the conclusion that 'no architecture was best on all three' is drawn directly from these score differences, the absence of calibration data is load-bearing for the central claim.
[Discussion] The study is limited to 13 questions and 24 papers. While the preregistered design and exploratory decomposition analysis are clearly labeled, the modest sample size makes it difficult to assess whether the observed trade-offs between organization, citation fidelity, and cost generalize beyond this corpus or would persist under different question distributions.

minor comments (2)

[Results] The description of the 'judge adjustment' procedure and how it affects the organization scores could be expanded with the exact adjustment formula or decision rule so that readers can reproduce the post-adjustment comparison.
[Tables and Figures] Table or figure captions should explicitly state the number of questions and papers underlying each reported metric to avoid any ambiguity about the scope of the averages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the preregistration, blinded judges, and explicit separation of confirmatory versus exploratory analyses as strengths. We respond to each major comment below and indicate the revisions we will make in the next version of the manuscript.

read point-by-point responses

Referee: [Evaluation and Results] The primary evidence for distinct capabilities rests on divergences between the overall-groundedness rubric and the claim-by-claim citation-support scores, as well as on the post-adjustment organization results. These metrics are produced by blinded LLM judges; the manuscript does not report human-expert calibration, inter-judge agreement, or prompt-robustness checks on the 13-question set. Because the conclusion that 'no architecture was best on all three' is drawn directly from these score differences, the absence of calibration data is load-bearing for the central claim.

Authors: We agree that stronger validation of the LLM-as-judge metrics would increase confidence in the reported divergences. In the revised manuscript we add a prompt-robustness check: we re-evaluated all 13 questions with two alternative judge prompts and confirm that the relative ordering on citation-support and groundedness scores is stable. We also report inter-judge agreement statistics for the multiple LLM judges used per item. Human-expert calibration was outside the preregistered scope and resource limits of the study; we have added an explicit limitations paragraph acknowledging this gap and recommending it for follow-up work. The central claim is additionally supported by the exploratory decomposition-RAG analysis, which shows convergent patterns on synthesis without relying solely on the judge scores. revision: partial
Referee: [Discussion] The study is limited to 13 questions and 24 papers. While the preregistered design and exploratory decomposition analysis are clearly labeled, the modest sample size makes it difficult to assess whether the observed trade-offs between organization, citation fidelity, and cost generalize beyond this corpus or would persist under different question distributions.

Authors: We concur that the modest corpus and question set constrain generalization, consistent with the referee summary. In the revised manuscript we expand the limitations section to more explicitly bound the scope of the findings, noting that the multi-domain but small-scale design prioritizes detailed claim-level inspection over breadth and that larger-scale replications across different question distributions would be needed to test persistence of the trade-offs. We retain the emphasis on the preregistered confirmatory results while clarifying the exploratory status of the architecture-comparison observations. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical comparison with external benchmarks

full rationale

The paper reports a preregistered head-to-head evaluation of Vector RAG versus an LLM-compiled wiki on 13 fixed questions drawn from 24 external papers. All metrics (organization, groundedness, claim-level citation support) are measured outcomes produced by blinded LLM judges applied to system outputs; none are defined in terms of the systems themselves, fitted to the target result, or derived via equations that reduce to prior self-citations. The central claim that grounded research synthesis decomposes into distinct capabilities follows from observed performance divergences rather than from any self-referential construction. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions about LLM judge reliability and the representativeness of the 24-paper corpus; no free parameters were fitted to produce the headline comparisons and no new entities were postulated.

axioms (1)

domain assumption Blinded LLM judges can accurately and consistently score answer organization, groundedness, and whether cited pages support the exact claims being made
All comparative verdicts on wiki versus RAG performance depend on these automated scores being valid proxies for quality.

pith-pipeline@v0.9.0 · 5790 in / 1483 out tokens · 72614 ms · 2026-05-20T11:23:01.798191+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The main conclusion is that grounded research synthesis is not a single capability. Systems can differ in how well they organize evidence, how well their citations support each claim, and how much they cost to run.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

Proceedings of the 37th International Conference on Machine Learning (

Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming-Wei Chang , title =. Proceedings of the 37th International Conference on Machine Learning (

work page
[2]

Retrieval-Augmented Generation for Knowledge-Intensive

Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K. Retrieval-Augmented Generation for Knowledge-Intensive. Advances in Neural Information Processing Systems (

work page
[3]

Dense Passage Retrieval for Open-Domain Question Answering , booktitle =

Vladimir Karpukhin and Barlas O. Dense Passage Retrieval for Open-Domain Question Answering , booktitle =

work page
[4]

Proceedings of the 39th International Conference on Machine Learning (

Sebastian Borgeaud and Arthur Mensch and Jordan Hoffmann and Trevor Cai and Eliza Rutherford and Katie Millican and George van den Driessche and Jean-Baptiste Lespiau and Bogdan Damoc and Aidan Clark and others , title =. Proceedings of the 39th International Conference on Machine Learning (

work page
[5]

Journal of Machine Learning Research , year =

Gautier Izacard and Patrick Lewis and Maria Lomeli and Lucas Hosseini and Fabio Petroni and Timo Schick and Jane Dwivedi-Yu and Armand Joulin and Sebastian Riedel and Edouard Grave , title =. Journal of Machine Learning Research , year =

work page
[6]

Transactions of the Association for Computational Linguistics , year =

Ori Ram and Yoav Levine and Itay Dalmedigos and Dor Muhlgay and Amnon Shashua and Kevin Leyton-Brown and Yoav Shoham , title =. Transactions of the Association for Computational Linguistics , year =

work page
[7]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (

work page
[8]

Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi-Yu and Yiming Yang and Jamie Callan and Graham Neubig , title =

Zhengbao Jiang and Frank F. Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi-Yu and Yiming Yang and Jamie Callan and Graham Neubig , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

work page 2023
[9]

Cohen and Ruslan Salakhutdinov and Christopher D

Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning , title =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (

work page 2018
[10]

Transactions of the Association for Computational Linguistics , year =

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , title =. Transactions of the Association for Computational Linguistics , year =

work page
[11]

Conference on Language Modeling (

Yixuan Tang and Yi Yang , title =. Conference on Language Modeling (

work page
[12]

Manning , title =

Parth Sarthi and Salman Abdullah and Aditi Tuli and Shubh Khanna and Anna Goldie and Christopher D. Manning , title =. International Conference on Learning Representations (

work page
[13]

International Conference on Learning Representations (

Fangyuan Xu and Weijia Shi and Eunsol Choi , title =. International Conference on Learning Representations (

work page
[14]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge and Ha Trinh and Newman Cheng and Joshua Bradley and Alex Chao and Apurva Mody and Steven Truitt and Jonathan Larson , title =. 2024 , note =. 2404.16130 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

Jinyuan Fang and Zaiqiao Meng and Craig Macdonald , title =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

work page
[16]

Findings of the Association for Computational Linguistics (

Costas Mavromatis and George Karypis , title =. Findings of the Association for Computational Linguistics (

work page
[17]

Nature , year =

Akari Asai and Jacqueline He and Rulin Shao and Weijia Shi and others , title =. Nature , year =

work page
[18]

Language agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024

Michael D. Skarlinski and Sam Cox and Jon M. Laurent and James D. Braza and Michaela Hinks and Michael J. Hammerling and Manvitha Ponnapati and Samuel G. Rodriques and Andrew D. White , title =. 2024 , note =. 2409.13740 , archivePrefix =

work page arXiv 2024
[19]

Corrective Retrieval Augmented Generation

Shi-Qi Yan and Jia-Chen Gu and Yun Zhu and Zhen-Hua Ling , title =. 2024 , note =. 2401.15884 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

Tianyu Gao and Howard Yen and Jiatong Yu and Danqi Chen , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

work page 2023
[21]

Liu and Tianyi Zhang and Percy Liang , title =

Nelson F. Liu and Tianyi Zhang and Percy Liang , title =. Findings of the Association for Computational Linguistics:

work page
[22]

Xing and Hao Zhang and Joseph E

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric P. Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , title =. Advances in Neural Information Processing Systems (

work page
[23]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

Yang Liu and Dan Iter and Yichong Xu and Shuohang Wang and Ruochen Xu and Chenguang Zhu , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

work page 2023
[24]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

Peiyi Wang and Lei Li and Liang Chen and Zefan Cai and Dawei Zhu and Binghuai Lin and Yunbo Cao and Qi Liu and Tianyu Liu and Zhifang Sui , title =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

work page

[1] [1]

Proceedings of the 37th International Conference on Machine Learning (

Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming-Wei Chang , title =. Proceedings of the 37th International Conference on Machine Learning (

work page

[2] [2]

Retrieval-Augmented Generation for Knowledge-Intensive

Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K. Retrieval-Augmented Generation for Knowledge-Intensive. Advances in Neural Information Processing Systems (

work page

[3] [3]

Dense Passage Retrieval for Open-Domain Question Answering , booktitle =

Vladimir Karpukhin and Barlas O. Dense Passage Retrieval for Open-Domain Question Answering , booktitle =

work page

[4] [4]

Proceedings of the 39th International Conference on Machine Learning (

Sebastian Borgeaud and Arthur Mensch and Jordan Hoffmann and Trevor Cai and Eliza Rutherford and Katie Millican and George van den Driessche and Jean-Baptiste Lespiau and Bogdan Damoc and Aidan Clark and others , title =. Proceedings of the 39th International Conference on Machine Learning (

work page

[5] [5]

Journal of Machine Learning Research , year =

Gautier Izacard and Patrick Lewis and Maria Lomeli and Lucas Hosseini and Fabio Petroni and Timo Schick and Jane Dwivedi-Yu and Armand Joulin and Sebastian Riedel and Edouard Grave , title =. Journal of Machine Learning Research , year =

work page

[6] [6]

Transactions of the Association for Computational Linguistics , year =

Ori Ram and Yoav Levine and Itay Dalmedigos and Dor Muhlgay and Amnon Shashua and Kevin Leyton-Brown and Yoav Shoham , title =. Transactions of the Association for Computational Linguistics , year =

work page

[7] [7]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (

work page

[8] [8]

Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi-Yu and Yiming Yang and Jamie Callan and Graham Neubig , title =

Zhengbao Jiang and Frank F. Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi-Yu and Yiming Yang and Jamie Callan and Graham Neubig , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

work page 2023

[9] [9]

Cohen and Ruslan Salakhutdinov and Christopher D

Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning , title =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (

work page 2018

[10] [10]

Transactions of the Association for Computational Linguistics , year =

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , title =. Transactions of the Association for Computational Linguistics , year =

work page

[11] [11]

Conference on Language Modeling (

Yixuan Tang and Yi Yang , title =. Conference on Language Modeling (

work page

[12] [12]

Manning , title =

Parth Sarthi and Salman Abdullah and Aditi Tuli and Shubh Khanna and Anna Goldie and Christopher D. Manning , title =. International Conference on Learning Representations (

work page

[13] [13]

International Conference on Learning Representations (

Fangyuan Xu and Weijia Shi and Eunsol Choi , title =. International Conference on Learning Representations (

work page

[14] [14]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge and Ha Trinh and Newman Cheng and Joshua Bradley and Alex Chao and Apurva Mody and Steven Truitt and Jonathan Larson , title =. 2024 , note =. 2404.16130 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

Jinyuan Fang and Zaiqiao Meng and Craig Macdonald , title =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

work page

[16] [16]

Findings of the Association for Computational Linguistics (

Costas Mavromatis and George Karypis , title =. Findings of the Association for Computational Linguistics (

work page

[17] [17]

Nature , year =

Akari Asai and Jacqueline He and Rulin Shao and Weijia Shi and others , title =. Nature , year =

work page

[18] [18]

Language agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024

Michael D. Skarlinski and Sam Cox and Jon M. Laurent and James D. Braza and Michaela Hinks and Michael J. Hammerling and Manvitha Ponnapati and Samuel G. Rodriques and Andrew D. White , title =. 2024 , note =. 2409.13740 , archivePrefix =

work page arXiv 2024

[19] [19]

Corrective Retrieval Augmented Generation

Shi-Qi Yan and Jia-Chen Gu and Yun Zhu and Zhen-Hua Ling , title =. 2024 , note =. 2401.15884 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

Tianyu Gao and Howard Yen and Jiatong Yu and Danqi Chen , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

work page 2023

[21] [21]

Liu and Tianyi Zhang and Percy Liang , title =

Nelson F. Liu and Tianyi Zhang and Percy Liang , title =. Findings of the Association for Computational Linguistics:

work page

[22] [22]

Xing and Hao Zhang and Joseph E

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric P. Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , title =. Advances in Neural Information Processing Systems (

work page

[23] [23]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

Yang Liu and Dan Iter and Yichong Xu and Shuohang Wang and Ruochen Xu and Chenguang Zhu , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (

work page 2023

[24] [24]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

Peiyi Wang and Lei Li and Liang Chen and Zefan Cai and Dawei Zhu and Binghuai Lin and Yunbo Cao and Qi Liu and Tianyu Liu and Zhifang Sui , title =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (

work page