arxiv: 2601.15170 · v2 · submitted 2026-01-21 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval

Zhucun Xue , Jiangning Zhang , Juntao Jiang , Jinzhuo Liu , Haoyang He , Teng Hu , Xiaobin Hu , Yong Liu

show 1 more author

Shuicheng Yan

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords knowledge profilingliterature databasetopic clusteringLLM parsingAI research trendsmultimodal reasoningresearch evolutionhierarchical retrieval

0 comments

The pith

A pipeline of topic clustering, LLM parsing and structured retrieval on over 100,000 AI papers profiles research activity and reveals topic shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compiles a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025. It applies topic clustering, LLM-assisted parsing of full texts, and structured retrieval to build a multidimensional representation of the field. This allows tracking of topic lifecycles, methodological transitions, dataset and model usage, and institutional directions. A sympathetic reader would see value in moving beyond metadata to see concrete semantic patterns in how AI research evolves.

Core claim

By compiling a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025 and constructing a multidimensional profiling pipeline that combines topic clustering, LLM-assisted parsing, and structured retrieval, the authors derive a comprehensive representation of research activity that supports the study of topic lifecycles, methodological transitions, dataset and model usage patterns, and institutional research directions, with analysis showing growth in safety, multimodal reasoning, and agent-oriented studies alongside stabilization in areas such as neural machine translation and graph-based methods.

What carries the argument

The multidimensional profiling pipeline, which integrates topic clustering, LLM-assisted parsing of full paper texts, and hierarchical structured retrieval to organize semantic content from the large literature corpus.

Load-bearing premise

LLM-assisted parsing accurately extracts semantic content such as methods, datasets, and research directions from the full text of papers without systematic bias.

What would settle it

A manual annotation of topics, methods, datasets, and directions on a representative sample of papers that shows substantial mismatches with the LLM outputs would falsify the accuracy of the derived representation.

Figures

Figures reproduced from arXiv: 2601.15170 by Haoyang He, Jiangning Zhang, Jinzhuo Liu, Juntao Jiang, Shuicheng Yan, Teng Hu, Xiaobin Hu, Yong Liu, Zhucun Xue.

**Figure 2.** Figure 2: LLM-Driven Multi-Dimensional Knowledge Profiling Framework with Large-Scale Literature Database and Hierarchical Retrieval. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Four-quadrant topic lifecycle evolution (quadrants I-IV: Emerging, Booming, Mature, Declining). The inset (bottom left) shows [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Compute Resource Analysis. Left: Temporal evolution of compute demand; Right: Boxplot of the top 10 topics with highest [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Dataset and Benchmark Trends (2020-2025). The bar [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Trends in Representative Dataset Usage (2020-2025). [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 9.** Figure 9: (a) Heatmap of topic-institution for top pairs; (b) Heatmap [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Radar chart of ChatGPT-5 performance vs. ChatGPT-5 [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

read the original abstract

The rapid expansion of research across machine learning, vision, and language has produced a volume of publications that is increasingly difficult to synthesize. Traditional bibliometric tools rely mainly on metadata and offer limited visibility into the semantic content of papers, making it hard to track how research themes evolve over time or how different areas influence one another.To obtain a clearer picture of recent developments, we compile a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025 and construct a multidimensional profiling pipeline to organize and analyze their textual content. By combining topic clustering, LLM-assisted parsing, and structured retrieval, we derive a comprehensive representation of research activity that supports the study of topic lifecycles, methodological transitions, dataset and model usage patterns, and institutional research directions.Our analysis highlights several notable shifts, including the growth of safety, multimodal reasoning, and agent-oriented studies, as well as the gradual stabilization of areas such as neural machine translation and graph-based methods. These findings provide an evidence-based view of how AI research is evolving and offer a resource for understanding broader trends and identifying emerging directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper assembles a large recent AI corpus and applies LLM parsing plus clustering to surface trend shifts, but the lack of any validation on the parsing step leaves the specific findings on shaky ground.

read the letter

The paper pulls together over 100,000 papers from 22 major conferences between 2020 and 2025 into one corpus and runs a pipeline of topic clustering, LLM-assisted parsing, and structured retrieval to track how research themes, methods, datasets, and institutional focus have moved. What stands out is the concrete set of observations: growth in safety, multimodal reasoning, and agent work, with some stabilization in neural machine translation and graph methods. The multidimensional view they build does give a practical way to look at topic lifecycles and transitions that standard metadata tools miss. The corpus itself is a useful resource for anyone wanting a quantitative snapshot of recent activity. The pipeline uses standard techniques, which keeps the work straightforward and reproducible in principle. The main weakness is that the entire analysis rests on the LLM parsing step extracting methods, datasets, and directions accurately and without bias from full text, yet the paper provides no human validation, accuracy numbers, error analysis, or checks for hallucination or recency effects. Without those, the reported shifts and patterns cannot be taken at face value. Post-processing decisions are also left undescribed, so it is hard to judge how sensitive the results are to choices. This work is aimed at readers who follow high-level field trends for planning or funding purposes rather than those seeking new technical mechanisms. It is a solid bibliometric exercise that would benefit from referee input on the validation gap before publication.

Referee Report

2 major / 2 minor

Summary. The paper compiles a corpus of over 100,000 papers from 22 major AI conferences (2020–2025) and introduces a multidimensional profiling pipeline that combines topic clustering, LLM-assisted parsing of full-text content, and hierarchical structured retrieval. It claims this enables analysis of topic lifecycles, methodological transitions, dataset and model usage patterns, and institutional directions, with reported findings including growth in safety, multimodal reasoning, and agent-oriented work alongside stabilization in neural machine translation and graph-based methods.

Significance. If the LLM parsing step proves accurate and unbiased, the work would provide a large-scale, semantically grounded resource for tracking AI research evolution that goes beyond metadata-based bibliometrics. The scale of the corpus and the integration of clustering with retrieval are notable strengths that could support reproducible trend studies, but the absence of validation on the extraction accuracy directly limits the reliability of the claimed shifts and patterns.

major comments (2)

[Abstract / pipeline description] Abstract and pipeline description: the central claims on methodological transitions, dataset usage patterns, and topic lifecycles rest entirely on LLM-assisted parsing of full-text papers, yet no validation protocol, human-annotation benchmark, inter-annotator agreement, error-rate analysis, or prompting details are provided to establish accuracy or rule out systematic bias (e.g., recency or hallucination effects).
[Analysis section] Analysis section: no error bars, sensitivity analysis, or quantification of how post-processing choices in the retrieval and clustering steps affect the reported trend estimates (e.g., safety growth or stabilization of NMT), leaving the magnitude and robustness of the highlighted shifts unassessed.

minor comments (2)

[Abstract] The abstract refers to 'structured retrieval' without specifying the retrieval mechanism, indexing structure, or how it interacts with the topic clustering output.
[Methods] Notation for the multidimensional profile (e.g., how fields such as methods, datasets, and directions are represented) is introduced without an explicit schema or example output format.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key areas where additional evidence would strengthen the reliability of our claims. We agree that validation of the LLM parsing and robustness checks on the analysis are necessary and will revise the manuscript to address both points directly.

read point-by-point responses

Referee: [Abstract / pipeline description] Abstract and pipeline description: the central claims on methodological transitions, dataset usage patterns, and topic lifecycles rest entirely on LLM-assisted parsing of full-text papers, yet no validation protocol, human-annotation benchmark, inter-annotator agreement, error-rate analysis, or prompting details are provided to establish accuracy or rule out systematic bias (e.g., recency or hallucination effects).

Authors: We acknowledge that the submitted manuscript does not include a validation protocol for the LLM-assisted parsing step. In the revision we will add a dedicated subsection reporting a human-annotation benchmark performed on a stratified random sample of 1,000 papers. The benchmark will include inter-annotator agreement (Fleiss’ kappa), precision and recall for extracted fields (topics, datasets, models, methods), and a targeted analysis of potential systematic biases such as recency effects and hallucination rates. We will also release the exact prompts and annotation guidelines used. revision: yes
Referee: [Analysis section] Analysis section: no error bars, sensitivity analysis, or quantification of how post-processing choices in the retrieval and clustering steps affect the reported trend estimates (e.g., safety growth or stabilization of NMT), leaving the magnitude and robustness of the highlighted shifts unassessed.

Authors: We agree that the current analysis lacks quantitative assessment of robustness. In the revised version we will add bootstrap-derived error bars to all trend figures and conduct sensitivity analyses that vary the number of clusters, retrieval similarity thresholds, and post-processing filters. We will report the resulting range of trend estimates for the highlighted patterns (safety growth, multimodal reasoning, agent studies, and NMT/graph stabilization) so that readers can evaluate the stability of the observed shifts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline on external corpus

full rationale

The paper describes an empirical analysis that compiles a corpus of >100k papers from public conferences (2020-2025) and applies topic clustering, LLM-assisted parsing, and structured retrieval. No equations, fitted parameters, or mathematical derivations appear in the provided text. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes that reduce the central claims back to the authors' prior definitions. The pipeline operates on external data with standard tools; reported trends (safety growth, multimodal shifts) are outputs of the applied methods rather than inputs renamed as predictions. This matches the default expectation of a self-contained descriptive study with no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The pipeline depends on the assumption that LLM parsing faithfully captures research content and that the chosen conferences represent the broader field; no explicit free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption LLM-assisted parsing accurately extracts semantic elements such as methods, datasets, and directions without introducing model-specific artifacts
Invoked when the abstract states that LLM-assisted parsing is used to organize textual content.
domain assumption The selected 22 conferences form a representative sample of recent AI research
Implicit in the construction of the unified corpus.

pith-pipeline@v0.9.0 · 5519 in / 1311 out tokens · 35365 ms · 2026-05-16T12:09:23.928126+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By combining topic clustering, LLM-assisted parsing, and structured retrieval, we derive a comprehensive representation of research activity...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

use text encoder... UMAP... HDBSCAN for clustering, resulting in more than 300 topic categories

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 11 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Topic significance ranking of lda gen- erative models

Loulwah AlSumait, Daniel Barbar´a, James Gentle, and Car- lotta Domeniconi. Topic significance ranking of lda gen- erative models. InJoint European conference on machine learning and knowledge discovery in databases, pages 67–82. Springer, 2009. 2

work page 2009
[3]

bibliometrix: An r-tool for comprehensive science mapping analysis.Journal of informetrics, 11(4):959–975, 2017

Massimo Aria and Corrado Cuccurullo. bibliometrix: An r-tool for comprehensive science mapping analysis.Journal of informetrics, 11(4):959–975, 2017. 2

work page 2017
[4]

Squai: Scientific question-answering with multi- agent retrieval-augmented generation.arXiv preprint arXiv:2510.15682, 2025

Ines Besrour, Jingbo He, Tobias Schreieder, and Michael F¨arber. Squai: Scientific question-answering with multi- agent retrieval-augmented generation.arXiv preprint arXiv:2510.15682, 2025. 2

work page arXiv 2025
[5]

Probabilistic topic models.Communications of the ACM, 55(4):77–84, 2012

David M Blei. Probabilistic topic models.Communications of the ACM, 55(4):77–84, 2012. 2

work page 2012
[6]

Latent dirichlet allocation.Journal of machine Learning research, 3 (Jan):993–1022, 2003

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation.Journal of machine Learning research, 3 (Jan):993–1022, 2003. 2

work page 2003
[7]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020. 1, 2

work page 1901
[8]

Topic modeling through rank-based aggregation and llms: An approach for ai and human-generated scientific texts.Knowledge-Based Systems, 314:113219, 2025

Tu˘gba C ¸elikten and Aytu˘g Onan. Topic modeling through rank-based aggregation and llms: An approach for ai and human-generated scientific texts.Knowledge-Based Systems, 314:113219, 2025. 2

work page 2025
[9]

Reading tea leaves: How humans interpret topic models.Advances in neural information pro- cessing systems, 22, 2009

Jonathan Chang, Sean Gerrish, Chong Wang, Jordan Boyd- Graber, and David Blei. Reading tea leaves: How humans interpret topic models.Advances in neural information pro- cessing systems, 22, 2009. 2

work page 2009
[10]

Chaomei Chen. Citespace ii: Detecting and visualizing emerg- ing trends and transient patterns in scientific literature.Jour- nal of the American Society for information Science and Tech- nology, 57(3):359–377, 2006. 2

work page 2006
[11]

A sur- vey on knowledge-oriented retrieval-augmented generation

Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, et al. A sur- vey on knowledge-oriented retrieval-augmented generation. arXiv preprint arXiv:2503.10677, 2025. 2

work page arXiv 2025
[12]

Topic model diagnostics: Assessing domain rele- vance via topical alignment

Jason Chuang, Sonal Gupta, Christopher Manning, and Jef- frey Heer. Topic model diagnostics: Assessing domain rele- vance via topical alignment. InInternational conference on machine learning, pages 612–620. PMLR, 2013. 2

work page 2013
[13]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long con- text, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Bert: Pre-training of deep bidirectional transform- ers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transform- ers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019. 2

work page 2019
[15]

k-llmmeans: Summaries as centroids for interpretable and scalable llm-based text clustering.arXiv e-prints, pages arXiv–2502, 2025

Jairo Diaz-Rodriguez. k-llmmeans: Summaries as centroids for interpretable and scalable llm-based text clustering.arXiv e-prints, pages arXiv–2502, 2025. 2

work page 2025
[16]

How to conduct a bibliometric analysis: An overview and guidelines.Journal of business research, 133:285–296, 2021

Naveen Donthu, Satish Kumar, Debmalya Mukherjee, Nitesh Pandey, and Weng Marc Lim. How to conduct a bibliometric analysis: An overview and guidelines.Journal of business research, 133:285–296, 2021. 1, 2

work page 2021
[17]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. Retrieval-augmented generation for large lan- guage models: A survey.arXiv preprint arXiv:2312.10997, 2 (1), 2023. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Maarten Grootendorst. Bertopic: Neural topic model- ing with a class-based tf-idf procedure.arXiv preprint arXiv:2203.05794, 2022. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Lllms: A data-driven survey of evolving research on limitations of large language models.arXiv preprint arXiv:2505.19240, 2025

Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole P ¨utz, Benjamin Paaßen, and Steffen Eger. Lllms: A data-driven survey of evolving research on limitations of large language models.arXiv preprint arXiv:2505.19240, 2025. 2

work page arXiv 2025
[21]

Pa- perqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559, 2023

Jakub L ´ala, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, and Andrew D White. Pa- perqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559, 2023. 2

work page arXiv 2023
[22]

Concept induction: Analyzing unstructured text with high-level concepts using lloom

Michelle S Lam, Janice Teoh, James A Landay, Jeffrey Heer, and Michael S Bernstein. Concept induction: Analyzing unstructured text with high-level concepts using lloom. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–28, 2024. 2

work page 2024
[23]

Towards agentic rag with deep reasoning: A survey of rag-reasoning systems in llms.arXiv preprint arXiv:2507.09477, 2025

Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, et al. Towards agentic rag with deep reasoning: A survey of rag-reasoning systems in llms.arXiv preprint arXiv:2507.09477, 2025. 2

work page arXiv 2025
[24]

Xun Liang, Jiawei Yang, Yezhaohui Wang, Chen Tang, Zifan Zheng, Shichao Song, Zehao Lin, Yebin Yang, Simin Niu, Hanyu Wang, Bo Tang, Feiyu Xiong, Keming Mao, and Zhiyu Li

Xun Liang, Jiawei Yang, Yezhaohui Wang, Chen Tang, Zifan Zheng, Shichao Song, Zehao Lin, Yebin Yang, Simin Niu, Hanyu Wang, et al. Surveyx: Academic survey automation via large language models.arXiv preprint arXiv:2502.14776,

work page arXiv
[25]

hdbscan: Hierarchical density based clustering.J

Leland McInnes, John Healy, Steve Astels, et al. hdbscan: Hierarchical density based clustering.J. Open Source Softw., 2(11):205, 2017. 3 9

work page 2017
[26]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimen- sion reduction.arXiv preprint arXiv:1802.03426, 2018. 3

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Sur- veyg: A multi-agent llm framework with hierarchical cita- tion graph for automated survey generation.arXiv preprint arXiv:2510.07733, 2025

Minh-Anh Nguye, Minh-Duc Nguyen, Nguyen Thi Ha Lan, Kieu Hai Dang, Nguyen Tien Dong, and Le Duy Dung. Sur- veyg: A multi-agent llm framework with hierarchical cita- tion graph for automated survey generation.arXiv preprint arXiv:2510.07733, 2025. 3

work page arXiv 2025
[28]

Ma-rag: Multi- agent retrieval-augmented generation via collaborative chain- of-thought reasoning.arXiv preprint arXiv:2505.20096, 2025

Thang Nguyen, Peter Chin, and Yu-Wing Tai. Ma-rag: Multi- agent retrieval-augmented generation via collaborative chain- of-thought reasoning.arXiv preprint arXiv:2505.20096, 2025. 2

work page arXiv 2025
[29]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1908
[30]

Dmitry Scherbakov, Nina Hubig, Vinita Jansari, Alexander Bakumenko, and Leslie A Lenert. The emergence of large lan- guage models as tools in literature reviews: a large language model-assisted systematic review.Journal of the American Medical Informatics Association, 32(6):1071–1086, 2025. 2

work page 2025
[31]

arXiv preprint arXiv:2409.04109 , year=

Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers.arXiv preprint arXiv:2409.04109,

work page arXiv
[32]

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Lan- guage agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024

Michael D Skarlinski, Sam Cox, Jon M Laurent, James D Braza, Michaela Hinks, Michael J Hammerling, Manvitha Ponnapati, Samuel G Rodriques, and Andrew D White. Lan- guage agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024. 2

work page arXiv 2024
[34]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Mar- tinet, Marie-Anne Lachaux, Timoth ´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Software survey: V osviewer, a computer program for bibliometric mapping

Nees Van Eck and Ludo Waltman. Software survey: V osviewer, a computer program for bibliometric mapping. scientometrics, 84(2):523–538, 2010. 2

work page 2010
[36]

MinerU: An Open-Source Solution for Precise Document Content Extraction

Bin Wang, Chao Xu, Xiaomeng Zhao, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, Fukai Shang, et al. Mineru: An open-source solution for precise document content extraction.arXiv preprint arXiv:2409.18839, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024

Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Qingsong Wen, Wei Ye, et al. Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024. 1, 3

work page 2024
[38]

Structure-r1: Dynamically leveraging structural knowledge in llm reasoning through reinforcement learning.arXiv preprint arXiv:2510.15191,

Junlin Wu, Xianrui Zhong, Jiashuo Sun, Bolian Li, Bowen Jin, Jiawei Han, and Qingkai Zeng. Structure-r1: Dynamically leveraging structural knowledge in llm reasoning through reinforcement learning.arXiv preprint arXiv:2510.15191,

work page arXiv
[39]

Llm-oriented token-adaptive knowledge distillation

Xurong Xie, Zhucun Xue, Jiafu Wu, Jian Li, Yabiao Wang, Xiaobin Hu, Yong Liu, and Jiangning Zhang. Llm-oriented token-adaptive knowledge distillation. InAAAI, 2026. 2

work page 2026
[40]

Adavideorag: Omni- contextual adaptive retrieval-augmented efficient long video understanding

Zhucun Xue, Jiangning Zhang, Xurong Xie, Yuxuan Cai, Yong Liu, Xiangtai Li, and Dacheng Tao. Adavideorag: Omni- contextual adaptive retrieval-augmented efficient long video understanding. InNeurIPS, 2025. 2

work page 2025
[41]

Surveyforge: On the outline heuristics, memory-driven generation, and multi- dimensional evaluation for automated survey writing

Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Lei Bai, and Bo Zhang. Surveyforge: On the outline heuristics, memory-driven generation, and multi- dimensional evaluation for automated survey writing. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12444–12465, 2025. 3

work page 2025
[42]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

abstract_summary

Ivan Zupic and Tomaˇz ˇCater. Bibliometric methods in man- agement and organization.Organizational research methods, 18(3):429–472, 2015. 1, 2 10 Appendix Overview The appendix presents the following sections to strengthen the main manuscript: — We provide more details in the Supplementary Materials for a better understanding of our work. —Sec.A.ResearchD...

work page 2015
[44]

Generate a short explanation for each keyword using LLM summarization

**Keywords**: Extract 3-10 keywords directly from the paper text. Generate a short explanation for each keyword using LLM summarization

work page
[45]

Summarize, do not copy verbatim

**abstract_summary**: Provide a compressed version of the paper’s abstract. Summarize, do not copy verbatim

work page
[46]

**Methods / Architecture / Loss / Training **: Summarize core method(s), model architecture, loss function, and training setup

work page
[47]

Use empty lists/ dictionaries if missing

**Datasets / Metrics / Mapping **: List datasets, corresponding metrics, and mapping between datasets and metrics. Use empty lists/ dictionaries if missing

work page
[48]

**Problem Statement **: Summarize the research problem in 1-2 sentences

work page
[49]

**Contributions**: Summarize the core contributions

work page
[50]

**Novelty Type **: Infer innovation type: - Algorithm / Model - Theory / Analysis - Benchmark / Dataset - Application / System - Methodological Improvement

work page
[51]

Include ablation studies if present

**Experiments / Results Summary **: Summarize experiments and main results. Include ablation studies if present

work page
[52]

**Limitations / Future Work **: Summarize limitations and future research directions

work page
[53]

**Trend Insight **: Provide trend level insights based on the paper

work page
[54]

**Field Positioning **: Infer the paper’s position in its research field: - Foundational Work - Methodological Innovation - Benchmark / Dataset Contribution - Application Validation - Trend Extension

work page
[55]

**Institution**:Extract all author affiliations from the provided

work page
[56]

conference

**Gpu_info**:Obtain the gpu resources used in the article, <total_gpu> *<GPU_MODEL>*< training_time>. LLM Intent Recognition Prompts.The core of our re- trieval system’s ability to effectively handle complex, multi- dimensional queries lies in an LLM-based intent recognition module. The goal of this module is to convert users’ natural language queries int...

work page 2020
[57]

If the query does not mention a conference, leave as []

**Conference**: Must the range of the following fixed options: [AAAI, ACL, COLM, COLT, CoRL, CVPR, ECCV, EMNLP, ICCV, ICLR, ICML, IJCAI, INTERSPEECH, IWSLT, MLSYS, NAACL, NDSS, NeurIPS, OSDI, UAI, USENIX-Fast, USENIX-Sec]. If the query does not mention a conference, leave as []

work page
[58]

**Year**: Extract the range of year contained in the query

work page
[59]

Otherwise, leave empty

**Authors**: If the query explicitly mentions author names, extract them into a list. Otherwise, leave empty

work page
[60]

Output as a list

**Paper Name **: Extract all explicitly mentioned paper titles. Output as a list. Use partial matching if necessary

work page
[61]

Use ‘ keywords_explanation‘ to briefly explain each

**Keywords**: Extract only technical terms explicitly mentioned in query. Use ‘ keywords_explanation‘ to briefly explain each

work page
[62]

**Abstract Summary **: One concise sentence describing technical focus from query

work page
[63]

- Must describe both the **technique** and its ** application scope **, not just a keyword

**methods**: - Output the **complete method or approach description**, including its application domain if present. - Must describe both the **technique** and its ** application scope **, not just a keyword. - If multiple distinct methods are present, include all as a list. - Do NOT shorten into keywords

work page
[64]

** Architecture / Loss Function / Datasets / Metrics**: Extract explicitly if mentioned; do not hallucinate

work page
[65]

How to achieve real-time, high-fidelity speech- to-gesture generation using decoupled diffusion models

**Vector Search Plan **: - Decide which fields to include in the search (‘ abstract_summary‘, ‘methods‘, ‘keywords‘, ‘ datasets‘, ‘metrics‘, ‘architecture‘, ‘ loss_function‘, etc.). - Only include fields that have non-empty content in the JSON output. - Assign a ‘weight‘ for each field between 0 and 1 based on **query emphasis **. - The weights **must be ...

work page
[66]

Enhanced Academic Rigor and Traceability: ResearchDB successfully retrieved seminal papers, allowing the final answer to cite up to nine top-tier conference papers (e.g., [1], [3], [19]), accurately attributing key methods like Retrieval Augmentation

work page
[67]

huge,”“here,

Clearer Theoretical Framing:The retrieved content enabled high-level theoretical decoupling of the problem (e.g., separating gesture drivers into semantic, rhythmic, and stylistic factors), resulting in a logically coherent survey argument. This qualitative evidence confirms thatResearchDB’s mul- tidimensional matching enhances the output’s Academic Ro- b...

work page 2023
[68]

How to achieve real-time, high-fidelity speech-to-gesture generation using decoupled diffusion models

Making it truly real-time-Compute budget example (per 160 ms stride):-Feature extraction: 3–5 ms CPU.-Encoders: 2–4 ms GPU.-Stage 1 diffusion (8 fps, 4–6 steps): 4–8 ms.-Stage 2 diffusion (30 fps, 4–6 steps): 7–12 ms.-IK/retarget/smoothing: 1–3 ms CPU or GPU.-Total: ~20–30 ms added on top of the lookahead, well under 100 ms on a gaming GPU. On mobile, red...

work page