From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
Pith reviewed 2026-05-19 11:45 UTC · model grok-4.3
The pith
Compound AI Systems integrate LLMs with retrievers, agents, tools, and orchestrators to surpass standalone model limits in memory, reasoning, and multimodal tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that Compound AI Systems overcome the inherent constraints of standalone large language models by deliberately composing them with retrievers for external knowledge, agents for sequential decision making, tools for action execution, and orchestrators for workflow control, and that a taxonomy based on these roles and orchestration patterns can organize the currently scattered literature and guide future system design.
What carries the argument
A multi-dimensional taxonomy organized by component roles and orchestration strategies, used to classify and compare systems across the four paradigms of retrieval-augmented generation, LLM agents, multimodal LLMs, and explicit orchestration.
If this is right
- Representative systems in each paradigm can be directly compared on design trade-offs such as latency versus accuracy.
- Standardized evaluation methods become feasible once systems are placed in the same taxonomy.
- Identified challenges in scalability, interoperability, and coordination point to concrete next research steps.
- Practitioners gain a map for choosing which components to add when building task-specific pipelines.
Where Pith is reading between the lines
- The taxonomy could be tested by applying it to new systems released after the survey to measure how well it accommodates innovation.
- Connections between orchestration strategies and existing workflow engines in software engineering may speed up adoption.
- If coordination challenges remain unsolved, hybrid human-AI oversight layers may become a necessary addition to the described architectures.
Load-bearing premise
The existing collection of Compound AI Systems is too scattered and lacks any shared framework, so a new taxonomy is needed before the field can advance systematically.
What would settle it
A later survey or benchmark study that shows the proposed taxonomy fails to group real systems into coherent categories or that a simpler existing classification already captures all major design choices without gaps.
Figures
read the original abstract
Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented and lacks a unified framework for analysis, taxonomy, and evaluation. In this survey, we define the concept of CAIS, propose a multi-dimensional taxonomy based on component roles and orchestration strategies, and analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and Orchestration. We review representative systems, compare design trade-offs, and summarize evaluation methodologies across these paradigms. Finally, we identify key challenges - including scalability, interoperability, benchmarking, and coordination - and outline promising directions for future research. This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys Compound AI Systems (CAIS), defining them as integrations of LLMs with external components (retrievers, agents, tools, orchestrators) to overcome standalone LLM limitations in memory, reasoning, real-time grounding, and multimodal tasks. It proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four paradigms (RAG, LLM Agents, MLLMs, Orchestration), analyzes representative systems and trade-offs, summarizes evaluation methodologies, and outlines challenges including scalability, interoperability, benchmarking, and coordination.
Significance. If the taxonomy and synthesis hold, the work provides a useful organizational framework for an emerging, fragmented area of system-level AI. It synthesizes literature across paradigms without advancing unsubstantiated mathematical claims and highlights actionable open problems for researchers and practitioners.
minor comments (3)
- The abstract states that the CAIS landscape 'lacks a unified framework' and that the proposed taxonomy addresses this, but the manuscript should include an explicit comparison (e.g., in the introduction or related-work section) to prior surveys on RAG, agents, or multimodal systems to substantiate the novelty of the multi-dimensional taxonomy.
- The four paradigms are listed without an accompanying table or diagram that maps each to the taxonomy dimensions (component roles and orchestration strategies); adding such a summary table would improve clarity and allow readers to see the taxonomy in action.
- Evaluation methodologies are summarized across paradigms, but the manuscript would benefit from a brief discussion of common pitfalls (e.g., contamination in RAG benchmarks or agent trajectory evaluation) to strengthen the practical guidance.
Simulated Author's Rebuttal
We thank the referee for the constructive and positive review of our survey on Compound AI Systems. The summary accurately reflects the scope, taxonomy, and contributions of the manuscript. We appreciate the recommendation for minor revision and will incorporate improvements to enhance clarity and completeness.
Circularity Check
No significant circularity
full rationale
This survey paper defines CAIS as integrations of LLMs with external components to address capability gaps and proposes a multi-dimensional taxonomy based on component roles and orchestration strategies. It catalogs four paradigms (RAG, LLM Agents, MLLMs, Orchestration) by reviewing representative systems, trade-offs, and evaluations drawn from external prior literature. No equations, fitted parameters, predictions, or derivations appear; the central organizational claim that the field lacks a unified framework is addressed directly by the survey's synthesis rather than reducing to any self-referential input, self-citation chain, or ansatz. The argument remains self-contained as a literature review without load-bearing steps that collapse by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define CAIS as modular and extensible architectures that integrate LLMs with specialized external components... propose a multi-dimensional taxonomy based on component roles and orchestration strategies
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and Orchestration
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Uncertainty Propagation in LLM-Based Systems
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insight...
-
A Survey of Context Engineering for Large Language Models
The survey organizes Context Engineering into retrieval, processing, management, and integrated systems like RAG and multi-agent setups while identifying an asymmetry where LLMs handle complex inputs well but struggle...
Reference graph
Works this paper leans on
-
[1]
Arkadeep Acharya, Brijraj Singh, and Naoyuki Onoe. 2023. LLM Based Generation of Item-Description for Recommendation System. InProceedings of the 17th ACM Conference on Recommender Systems (Singapore, Singapore) (RecSys ’23). Association for Computing Machinery, New York, NY, USA, 1204–1207. https://doi.org/10.1145/3604915.3610647
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems 35 (2022), 23716–23736
work page 2022
-
[4]
Anthropic. 2024. Model Context Protocol. https://www.anthropic.com/news/model-context-protocol. Accessed: 2025-03-29
work page 2024
-
[5]
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [6]
-
[7]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[8]
Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4–1
work page 2019
-
[9]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing . 1533–1544
work page 2013
-
[10]
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In ICML, Vol. 2. 4
work page 2021
-
[11]
Multimodal Blog. 2024. How to Chunk Documents for Retrieval-Augmented Generation (RAG) . https://www.multimodal.dev/post/how-to-chunk- documents-for-rag?utm_source=chatgpt.com Accessed: 2024-12-14
work page 2024
-
[12]
Bloomberg Intelligence. 2023. Generative AI to Become a $1.3 Trillion Market by 2032, Research Finds. https://www.bloomberg.com/company/ press/generative-ai-to-become-a-1-3-trillion-market-by-2032-research-finds/. Accessed: 2025-05-30
work page 2023
-
[13]
Daniil A Boiko, Robert MacKnight, and Gabe Gomes. 2023. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332 (2023)
work page internal anchor Pith review arXiv 2023
-
[14]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901
work page 2020
-
[15]
Junbum Cha, Wooyoung Kang, Jonghwan Mun, and Byungseok Roh. 2024. Honeybee: Locality-enhanced projector for multimodal llm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 13817–13827
work page 2024
-
[16]
Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2023. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, and Benyou Wang. 2024. Allava: Harnessing gpt4v-synthesized data for lite vision-language models. arXiv preprint arXiv:2402.11684 (2024)
work page internal anchor Pith review arXiv 2024
-
[18]
Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, and Dahua Lin. 2025. Sharegpt4v: Improving large multi-modal models with better captions. In European Conference on Computer Vision . Springer, 370–387
work page 2025
-
[19]
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, et al
-
[20]
IEEE Journal of Selected Topics in Signal Processing 16, 6 (2022), 1505–1518
Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing 16, 6 (2022), 1505–1518
work page 2022
- [21]
-
[22]
Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chen Qian, Chi-Min Chan, Yujia Qin, Yaxi Lu, Ruobing Xie, et al . 2023. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848 2, 4 (2023), 6
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Decoupling knowledge from memorization: Retrieval-augmented prompt learning. Advances in Neural Information Processing Systems 35 (2022), 23908–23922
work page 2022
-
[24]
Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan. 2024. Lift yourself up: Retrieval-augmented text generation with self-memory. Advances in Neural Information Processing Systems 36 (2024)
work page 2024
-
[25]
Wikipedia contributors. 2024. Claude (language model). https://en.wikipedia.org/wiki/Claude_(language_model). Accessed: 2024-12-24
work page 2024
-
[26]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M Voorhees, and Ian Soboroff. 2021. TREC deep learning track: Reusable test collections in the large data regime. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2369–2375
work page 2021
-
[27]
Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The power of noise: Redefining retrieval for rag systems. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval . 719–729
work page 2024
-
[28]
Hongwei Cui, Yuyang Du, Qun Yang, Yulin Shao, and Soung Chang Liew. 2024. Llmind: Orchestrating ai and iot with llm for complex task execution. IEEE Communications Magazine (2024)
work page 2024
- [29]
-
[30]
Irene de Zarzà, Joachim de Curtò, Gemma Roig, and Carlos T Calafate. 2023. Llm adaptive pid control for b5g truck platooning systems. Sensors 23, 13 (2023), 5899
work page 2023
-
[31]
deepset.ai Team. 2024. Haystack GitHub Repository. https://github.com/deepset-ai/haystack. Accessed: 2024-12-24
work page 2024
-
[32]
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems 36 (2023), 28091–28114
work page 2023
-
[33]
DataStax Documentation. 2024. Introduction to Indexing in Retrieval-Augmented Generation (RAG) . https://docs.datastax.com/en/ragstack/intro-to- rag/indexing.html?utm_source=chatgpt.com Accessed: 2024-12-14
work page 2024
-
[34]
Jian Dong, Wei Bao, Xiaoqi Cao, Yang Xu, Yuze Yang, Binbin Li, Qi Zhang, and Heng Ye. 2025. AISBench: an performance benchmark for AI server systems. The Journal of Supercomputing 81, 2 (2025), 1–24
work page 2025
-
[35]
Alexey Dosovitskiy. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[36]
Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. 2023. Improving factuality and reasoning in language models through multiagent debate. In Forty-first International Conference on Machine Learning
work page 2023
-
[38]
Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. arXiv preprint arXiv:1903.00161 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[39]
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[40]
Shahul Es, Jithin James, Luis Espinosa Anke, and Steven Schockaert. 2024. Ragas: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations . 150–158
work page 2024
-
[41]
Jonathan Evertz, Merlin Chlosta, Lea Schönherr, and Thorsten Eisenhofer. 2024. Whispers in the Machine: Confidentiality in LLM-integrated Systems. arXiv preprint arXiv:2402.06922 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Alexander R Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R Radev. 2019. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. arXiv preprint arXiv:1906.01749 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[43]
Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. 2022. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems 35 (2022), 18343–18362
work page 2022
-
[44]
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 6491–6501
work page 2024
-
[45]
Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane Debbah. 2025. From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review. arXiv preprint arXiv:2504.19678 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Xu Wen, Rui Ren, Chen Zheng, Xiwen He, Hainan Ye, et al. 2018. Bigdatabench: A scalable and unified big data and ai benchmark suite. arXiv preprint arXiv:1802.08254 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[47]
Manas Gaur, Kalpa Gunaratna, Vijay Srinivasan, and Hongxia Jin. 2022. Iseeq: Information seeking question generation using dynamic meta- information retrieval and knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 36. 10672–10680
work page 2022
- [48]
-
[49]
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. 2021. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics 9 (2021), 346–361
work page 2021
- [50]
- [51]
- [52]
-
[53]
Anirudh Goyal, Abram Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C Humphreys, Ksenia Konyushova, et al. 2022. Retrieval-augmented reinforcement learning. In International Conference on Machine Learning . PMLR, 7740–7765
work page 2022
-
[54]
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, and Graham Neubig. 2021. Wikiasp: A dataset for multi-domain aspect-based summarization. Transactions of the Association for Computational Linguistics 9 (2021), 211–225
work page 2021
-
[56]
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps. arXiv preprint arXiv:2011.01060 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[57]
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. 2023. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 3, 4 (2023), 6
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[58]
Wenbo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, and Zhuowen Tu. 2024. Bliva: A simple multimodal llm for better handling of text-rich visual questions. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 38. 2256–2264
work page 2024
- [59]
-
[60]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, and Alireza Fathi. 2023. Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 23369–23379. https://doi.org/10.1109/CVPR5272...
-
[61]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2023. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems (2023)
work page 2023
-
[62]
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, et al
-
[63]
Inner Monologue: Embodied Reasoning through Planning with Language Models
Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[64]
Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[65]
Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research 24, 251 (2023), 1–43
work page 2023
- [66]
-
[67]
Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[68]
Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, and Sung Ju Hwang. 2024. Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. Advances in Neural Information Processing Systems 36 (2024)
work page 2024
-
[69]
Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir Radev, Noah A Smith, Yejin Choi, Kentaro Inui, et al. 2023. Realtime qa: What’s the answer right now? Advances in neural information processing systems 36 (2023), 49025–49043
work page 2023
- [70]
-
[71]
Geunwoo Kim, Pierre Baldi, and Stephen McAleer. 2023. Language models can solve computer tasks. Advances in Neural Information Processing Systems 36 (2023), 39648–39677
work page 2023
-
[72]
Geewook Kim, Teakgyu Hong, Moonbin Yim, JeongYeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, and Seunghyun Park. 2022. Ocr-free document understanding transformer. In European Conference on Computer Vision . Springer, 498–517
work page 2022
-
[73]
Diederik P Kingma, Max Welling, et al. 2019. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning 12, 4 (2019), 307–392
work page 2019
-
[74]
Tomáš Kočisk`y, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. 2018. The narrativeqa reading comprehension challenge. Transactions of the Association for Computational Linguistics 6 (2018), 317–328
work page 2018
-
[75]
Jing Yu Koh, Daniel Fried, and Russ R Salakhutdinov. 2024. Generating images with multimodal language models. Advances in Neural Information Processing Systems 36 (2024). Manuscript submitted to ACM 30 Chen et al
work page 2024
-
[76]
Jing Yu Koh, Ruslan Salakhutdinov, and Daniel Fried. 2023. Grounding language models to images for multimodal inputs and outputs. InInternational Conference on Machine Learning . PMLR, 17283–17300
work page 2023
- [77]
-
[78]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics 7 (2019), 453–466
work page 2019
-
[79]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles . 611–626
work page 2023
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.