pith. machine review for the scientific record. sign in

arxiv: 2601.15170 · v2 · submitted 2026-01-21 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords knowledge profilingliterature databasetopic clusteringLLM parsingAI research trendsmultimodal reasoningresearch evolutionhierarchical retrieval
0
0 comments X

The pith

A pipeline of topic clustering, LLM parsing and structured retrieval on over 100,000 AI papers profiles research activity and reveals topic shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compiles a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025. It applies topic clustering, LLM-assisted parsing of full texts, and structured retrieval to build a multidimensional representation of the field. This allows tracking of topic lifecycles, methodological transitions, dataset and model usage, and institutional directions. A sympathetic reader would see value in moving beyond metadata to see concrete semantic patterns in how AI research evolves.

Core claim

By compiling a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025 and constructing a multidimensional profiling pipeline that combines topic clustering, LLM-assisted parsing, and structured retrieval, the authors derive a comprehensive representation of research activity that supports the study of topic lifecycles, methodological transitions, dataset and model usage patterns, and institutional research directions, with analysis showing growth in safety, multimodal reasoning, and agent-oriented studies alongside stabilization in areas such as neural machine translation and graph-based methods.

What carries the argument

The multidimensional profiling pipeline, which integrates topic clustering, LLM-assisted parsing of full paper texts, and hierarchical structured retrieval to organize semantic content from the large literature corpus.

Load-bearing premise

LLM-assisted parsing accurately extracts semantic content such as methods, datasets, and research directions from the full text of papers without systematic bias.

What would settle it

A manual annotation of topics, methods, datasets, and directions on a representative sample of papers that shows substantial mismatches with the LLM outputs would falsify the accuracy of the derived representation.

Figures

Figures reproduced from arXiv: 2601.15170 by Haoyang He, Jiangning Zhang, Jinzhuo Liu, Juntao Jiang, Shuicheng Yan, Teng Hu, Xiaobin Hu, Yong Liu, Zhucun Xue.

Figure 1
Figure 1. Figure 1: Number of papers published and topic statistics across 22 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: LLM-Driven Multi-Dimensional Knowledge Profiling Framework with Large-Scale Literature Database and Hierarchical Retrieval. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Four-quadrant topic lifecycle evolution (quadrants I-IV: Emerging, Booming, Mature, Declining). The inset (bottom left) shows [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Compute Resource Analysis. Left: Temporal evolution of compute demand; Right: Boxplot of the top 10 topics with highest [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dataset and Benchmark Trends (2020-2025). The bar [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Trends in Representative Dataset Usage (2020-2025). [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: (a) Heatmap of topic-institution for top pairs; (b) Heatmap [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Radar chart of ChatGPT-5 performance vs. ChatGPT-5 [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
read the original abstract

The rapid expansion of research across machine learning, vision, and language has produced a volume of publications that is increasingly difficult to synthesize. Traditional bibliometric tools rely mainly on metadata and offer limited visibility into the semantic content of papers, making it hard to track how research themes evolve over time or how different areas influence one another.To obtain a clearer picture of recent developments, we compile a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025 and construct a multidimensional profiling pipeline to organize and analyze their textual content. By combining topic clustering, LLM-assisted parsing, and structured retrieval, we derive a comprehensive representation of research activity that supports the study of topic lifecycles, methodological transitions, dataset and model usage patterns, and institutional research directions.Our analysis highlights several notable shifts, including the growth of safety, multimodal reasoning, and agent-oriented studies, as well as the gradual stabilization of areas such as neural machine translation and graph-based methods. These findings provide an evidence-based view of how AI research is evolving and offer a resource for understanding broader trends and identifying emerging directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper compiles a corpus of over 100,000 papers from 22 major AI conferences (2020–2025) and introduces a multidimensional profiling pipeline that combines topic clustering, LLM-assisted parsing of full-text content, and hierarchical structured retrieval. It claims this enables analysis of topic lifecycles, methodological transitions, dataset and model usage patterns, and institutional directions, with reported findings including growth in safety, multimodal reasoning, and agent-oriented work alongside stabilization in neural machine translation and graph-based methods.

Significance. If the LLM parsing step proves accurate and unbiased, the work would provide a large-scale, semantically grounded resource for tracking AI research evolution that goes beyond metadata-based bibliometrics. The scale of the corpus and the integration of clustering with retrieval are notable strengths that could support reproducible trend studies, but the absence of validation on the extraction accuracy directly limits the reliability of the claimed shifts and patterns.

major comments (2)
  1. [Abstract / pipeline description] Abstract and pipeline description: the central claims on methodological transitions, dataset usage patterns, and topic lifecycles rest entirely on LLM-assisted parsing of full-text papers, yet no validation protocol, human-annotation benchmark, inter-annotator agreement, error-rate analysis, or prompting details are provided to establish accuracy or rule out systematic bias (e.g., recency or hallucination effects).
  2. [Analysis section] Analysis section: no error bars, sensitivity analysis, or quantification of how post-processing choices in the retrieval and clustering steps affect the reported trend estimates (e.g., safety growth or stabilization of NMT), leaving the magnitude and robustness of the highlighted shifts unassessed.
minor comments (2)
  1. [Abstract] The abstract refers to 'structured retrieval' without specifying the retrieval mechanism, indexing structure, or how it interacts with the topic clustering output.
  2. [Methods] Notation for the multidimensional profile (e.g., how fields such as methods, datasets, and directions are represented) is introduced without an explicit schema or example output format.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key areas where additional evidence would strengthen the reliability of our claims. We agree that validation of the LLM parsing and robustness checks on the analysis are necessary and will revise the manuscript to address both points directly.

read point-by-point responses
  1. Referee: [Abstract / pipeline description] Abstract and pipeline description: the central claims on methodological transitions, dataset usage patterns, and topic lifecycles rest entirely on LLM-assisted parsing of full-text papers, yet no validation protocol, human-annotation benchmark, inter-annotator agreement, error-rate analysis, or prompting details are provided to establish accuracy or rule out systematic bias (e.g., recency or hallucination effects).

    Authors: We acknowledge that the submitted manuscript does not include a validation protocol for the LLM-assisted parsing step. In the revision we will add a dedicated subsection reporting a human-annotation benchmark performed on a stratified random sample of 1,000 papers. The benchmark will include inter-annotator agreement (Fleiss’ kappa), precision and recall for extracted fields (topics, datasets, models, methods), and a targeted analysis of potential systematic biases such as recency effects and hallucination rates. We will also release the exact prompts and annotation guidelines used. revision: yes

  2. Referee: [Analysis section] Analysis section: no error bars, sensitivity analysis, or quantification of how post-processing choices in the retrieval and clustering steps affect the reported trend estimates (e.g., safety growth or stabilization of NMT), leaving the magnitude and robustness of the highlighted shifts unassessed.

    Authors: We agree that the current analysis lacks quantitative assessment of robustness. In the revised version we will add bootstrap-derived error bars to all trend figures and conduct sensitivity analyses that vary the number of clusters, retrieval similarity thresholds, and post-processing filters. We will report the resulting range of trend estimates for the highlighted patterns (safety growth, multimodal reasoning, agent studies, and NMT/graph stabilization) so that readers can evaluate the stability of the observed shifts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline on external corpus

full rationale

The paper describes an empirical analysis that compiles a corpus of >100k papers from public conferences (2020-2025) and applies topic clustering, LLM-assisted parsing, and structured retrieval. No equations, fitted parameters, or mathematical derivations appear in the provided text. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes that reduce the central claims back to the authors' prior definitions. The pipeline operates on external data with standard tools; reported trends (safety growth, multimodal shifts) are outputs of the applied methods rather than inputs renamed as predictions. This matches the default expectation of a self-contained descriptive study with no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The pipeline depends on the assumption that LLM parsing faithfully captures research content and that the chosen conferences represent the broader field; no explicit free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption LLM-assisted parsing accurately extracts semantic elements such as methods, datasets, and directions without introducing model-specific artifacts
    Invoked when the abstract states that LLM-assisted parsing is used to organize textual content.
  • domain assumption The selected 22 conferences form a representative sample of recent AI research
    Implicit in the construction of the unified corpus.

pith-pipeline@v0.9.0 · 5519 in / 1311 out tokens · 35365 ms · 2026-05-16T12:09:23.928126+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 11 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 1, 2

  2. [2]

    Topic significance ranking of lda gen- erative models

    Loulwah AlSumait, Daniel Barbar´a, James Gentle, and Car- lotta Domeniconi. Topic significance ranking of lda gen- erative models. InJoint European conference on machine learning and knowledge discovery in databases, pages 67–82. Springer, 2009. 2

  3. [3]

    bibliometrix: An r-tool for comprehensive science mapping analysis.Journal of informetrics, 11(4):959–975, 2017

    Massimo Aria and Corrado Cuccurullo. bibliometrix: An r-tool for comprehensive science mapping analysis.Journal of informetrics, 11(4):959–975, 2017. 2

  4. [4]

    Squai: Scientific question-answering with multi- agent retrieval-augmented generation.arXiv preprint arXiv:2510.15682, 2025

    Ines Besrour, Jingbo He, Tobias Schreieder, and Michael F¨arber. Squai: Scientific question-answering with multi- agent retrieval-augmented generation.arXiv preprint arXiv:2510.15682, 2025. 2

  5. [5]

    Probabilistic topic models.Communications of the ACM, 55(4):77–84, 2012

    David M Blei. Probabilistic topic models.Communications of the ACM, 55(4):77–84, 2012. 2

  6. [6]

    Latent dirichlet allocation.Journal of machine Learning research, 3 (Jan):993–1022, 2003

    David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation.Journal of machine Learning research, 3 (Jan):993–1022, 2003. 2

  7. [7]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020. 1, 2

  8. [8]

    Topic modeling through rank-based aggregation and llms: An approach for ai and human-generated scientific texts.Knowledge-Based Systems, 314:113219, 2025

    Tu˘gba C ¸elikten and Aytu˘g Onan. Topic modeling through rank-based aggregation and llms: An approach for ai and human-generated scientific texts.Knowledge-Based Systems, 314:113219, 2025. 2

  9. [9]

    Reading tea leaves: How humans interpret topic models.Advances in neural information pro- cessing systems, 22, 2009

    Jonathan Chang, Sean Gerrish, Chong Wang, Jordan Boyd- Graber, and David Blei. Reading tea leaves: How humans interpret topic models.Advances in neural information pro- cessing systems, 22, 2009. 2

  10. [10]

    Chaomei Chen. Citespace ii: Detecting and visualizing emerg- ing trends and transient patterns in scientific literature.Jour- nal of the American Society for information Science and Tech- nology, 57(3):359–377, 2006. 2

  11. [11]

    A sur- vey on knowledge-oriented retrieval-augmented generation

    Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, et al. A sur- vey on knowledge-oriented retrieval-augmented generation. arXiv preprint arXiv:2503.10677, 2025. 2

  12. [12]

    Topic model diagnostics: Assessing domain rele- vance via topical alignment

    Jason Chuang, Sonal Gupta, Christopher Manning, and Jef- frey Heer. Topic model diagnostics: Assessing domain rele- vance via topical alignment. InInternational conference on machine learning, pages 612–620. PMLR, 2013. 2

  13. [13]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long con- text, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 2

  14. [14]

    Bert: Pre-training of deep bidirectional transform- ers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transform- ers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019. 2

  15. [15]

    k-llmmeans: Summaries as centroids for interpretable and scalable llm-based text clustering.arXiv e-prints, pages arXiv–2502, 2025

    Jairo Diaz-Rodriguez. k-llmmeans: Summaries as centroids for interpretable and scalable llm-based text clustering.arXiv e-prints, pages arXiv–2502, 2025. 2

  16. [16]

    How to conduct a bibliometric analysis: An overview and guidelines.Journal of business research, 133:285–296, 2021

    Naveen Donthu, Satish Kumar, Debmalya Mukherjee, Nitesh Pandey, and Weng Marc Lim. How to conduct a bibliometric analysis: An overview and guidelines.Journal of business research, 133:285–296, 2021. 1, 2

  17. [17]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. Retrieval-augmented generation for large lan- guage models: A survey.arXiv preprint arXiv:2312.10997, 2 (1), 2023. 1, 2

  18. [18]

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure

    Maarten Grootendorst. Bertopic: Neural topic model- ing with a class-based tf-idf procedure.arXiv preprint arXiv:2203.05794, 2022. 1, 2, 3

  19. [19]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 3

  20. [20]

    Lllms: A data-driven survey of evolving research on limitations of large language models.arXiv preprint arXiv:2505.19240, 2025

    Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole P ¨utz, Benjamin Paaßen, and Steffen Eger. Lllms: A data-driven survey of evolving research on limitations of large language models.arXiv preprint arXiv:2505.19240, 2025. 2

  21. [21]

    Pa- perqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559, 2023

    Jakub L ´ala, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, and Andrew D White. Pa- perqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559, 2023. 2

  22. [22]

    Concept induction: Analyzing unstructured text with high-level concepts using lloom

    Michelle S Lam, Janice Teoh, James A Landay, Jeffrey Heer, and Michael S Bernstein. Concept induction: Analyzing unstructured text with high-level concepts using lloom. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–28, 2024. 2

  23. [23]

    Towards agentic rag with deep reasoning: A survey of rag-reasoning systems in llms.arXiv preprint arXiv:2507.09477, 2025

    Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, et al. Towards agentic rag with deep reasoning: A survey of rag-reasoning systems in llms.arXiv preprint arXiv:2507.09477, 2025. 2

  24. [24]

    Xun Liang, Jiawei Yang, Yezhaohui Wang, Chen Tang, Zifan Zheng, Shichao Song, Zehao Lin, Yebin Yang, Simin Niu, Hanyu Wang, Bo Tang, Feiyu Xiong, Keming Mao, and Zhiyu Li

    Xun Liang, Jiawei Yang, Yezhaohui Wang, Chen Tang, Zifan Zheng, Shichao Song, Zehao Lin, Yebin Yang, Simin Niu, Hanyu Wang, et al. Surveyx: Academic survey automation via large language models.arXiv preprint arXiv:2502.14776,

  25. [25]

    hdbscan: Hierarchical density based clustering.J

    Leland McInnes, John Healy, Steve Astels, et al. hdbscan: Hierarchical density based clustering.J. Open Source Softw., 2(11):205, 2017. 3 9

  26. [26]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimen- sion reduction.arXiv preprint arXiv:1802.03426, 2018. 3

  27. [27]

    Sur- veyg: A multi-agent llm framework with hierarchical cita- tion graph for automated survey generation.arXiv preprint arXiv:2510.07733, 2025

    Minh-Anh Nguye, Minh-Duc Nguyen, Nguyen Thi Ha Lan, Kieu Hai Dang, Nguyen Tien Dong, and Le Duy Dung. Sur- veyg: A multi-agent llm framework with hierarchical cita- tion graph for automated survey generation.arXiv preprint arXiv:2510.07733, 2025. 3

  28. [28]

    Ma-rag: Multi- agent retrieval-augmented generation via collaborative chain- of-thought reasoning.arXiv preprint arXiv:2505.20096, 2025

    Thang Nguyen, Peter Chin, and Yu-Wing Tai. Ma-rag: Multi- agent retrieval-augmented generation via collaborative chain- of-thought reasoning.arXiv preprint arXiv:2505.20096, 2025. 2

  29. [29]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019. 2

  30. [30]

    Dmitry Scherbakov, Nina Hubig, Vinita Jansari, Alexander Bakumenko, and Leslie A Lenert. The emergence of large lan- guage models as tools in literature reviews: a large language model-assisted systematic review.Journal of the American Medical Informatics Association, 32(6):1071–1086, 2025. 2

  31. [31]

    arXiv preprint arXiv:2409.04109 , year=

    Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers.arXiv preprint arXiv:2409.04109,

  32. [32]

    Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136, 2025. 2

  33. [33]

    Lan- guage agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024

    Michael D Skarlinski, Sam Cox, Jon M Laurent, James D Braza, Michaela Hinks, Michael J Hammerling, Manvitha Ponnapati, Samuel G Rodriques, and Andrew D White. Lan- guage agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024. 2

  34. [34]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Mar- tinet, Marie-Anne Lachaux, Timoth ´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023. 2

  35. [35]

    Software survey: V osviewer, a computer program for bibliometric mapping

    Nees Van Eck and Ludo Waltman. Software survey: V osviewer, a computer program for bibliometric mapping. scientometrics, 84(2):523–538, 2010. 2

  36. [36]

    MinerU: An Open-Source Solution for Precise Document Content Extraction

    Bin Wang, Chao Xu, Xiaomeng Zhao, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, Fukai Shang, et al. Mineru: An open-source solution for precise document content extraction.arXiv preprint arXiv:2409.18839, 2024. 3

  37. [37]

    Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024

    Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Qingsong Wen, Wei Ye, et al. Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024. 1, 3

  38. [38]

    Structure-r1: Dynamically leveraging structural knowledge in llm reasoning through reinforcement learning.arXiv preprint arXiv:2510.15191,

    Junlin Wu, Xianrui Zhong, Jiashuo Sun, Bolian Li, Bowen Jin, Jiawei Han, and Qingkai Zeng. Structure-r1: Dynamically leveraging structural knowledge in llm reasoning through reinforcement learning.arXiv preprint arXiv:2510.15191,

  39. [39]

    Llm-oriented token-adaptive knowledge distillation

    Xurong Xie, Zhucun Xue, Jiafu Wu, Jian Li, Yabiao Wang, Xiaobin Hu, Yong Liu, and Jiangning Zhang. Llm-oriented token-adaptive knowledge distillation. InAAAI, 2026. 2

  40. [40]

    Adavideorag: Omni- contextual adaptive retrieval-augmented efficient long video understanding

    Zhucun Xue, Jiangning Zhang, Xurong Xie, Yuxuan Cai, Yong Liu, Xiangtai Li, and Dacheng Tao. Adavideorag: Omni- contextual adaptive retrieval-augmented efficient long video understanding. InNeurIPS, 2025. 2

  41. [41]

    Surveyforge: On the outline heuristics, memory-driven generation, and multi- dimensional evaluation for automated survey writing

    Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Lei Bai, and Bo Zhang. Surveyforge: On the outline heuristics, memory-driven generation, and multi- dimensional evaluation for automated survey writing. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12444–12465, 2025. 3

  42. [42]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 2

  43. [43]

    abstract_summary

    Ivan Zupic and Tomaˇz ˇCater. Bibliometric methods in man- agement and organization.Organizational research methods, 18(3):429–472, 2015. 1, 2 10 Appendix Overview The appendix presents the following sections to strengthen the main manuscript: — We provide more details in the Supplementary Materials for a better understanding of our work. —Sec.A.ResearchD...

  44. [44]

    Generate a short explanation for each keyword using LLM summarization

    **Keywords**: Extract 3-10 keywords directly from the paper text. Generate a short explanation for each keyword using LLM summarization

  45. [45]

    Summarize, do not copy verbatim

    **abstract_summary**: Provide a compressed version of the paper’s abstract. Summarize, do not copy verbatim

  46. [46]

    **Methods / Architecture / Loss / Training **: Summarize core method(s), model architecture, loss function, and training setup

  47. [47]

    Use empty lists/ dictionaries if missing

    **Datasets / Metrics / Mapping **: List datasets, corresponding metrics, and mapping between datasets and metrics. Use empty lists/ dictionaries if missing

  48. [48]

    **Problem Statement **: Summarize the research problem in 1-2 sentences

  49. [49]

    **Contributions**: Summarize the core contributions

  50. [50]

    **Novelty Type **: Infer innovation type: - Algorithm / Model - Theory / Analysis - Benchmark / Dataset - Application / System - Methodological Improvement

  51. [51]

    Include ablation studies if present

    **Experiments / Results Summary **: Summarize experiments and main results. Include ablation studies if present

  52. [52]

    **Limitations / Future Work **: Summarize limitations and future research directions

  53. [53]

    **Trend Insight **: Provide trend level insights based on the paper

  54. [54]

    **Field Positioning **: Infer the paper’s position in its research field: - Foundational Work - Methodological Innovation - Benchmark / Dataset Contribution - Application Validation - Trend Extension

  55. [55]

    **Institution**:Extract all author affiliations from the provided

  56. [56]

    conference

    **Gpu_info**:Obtain the gpu resources used in the article, <total_gpu> *<GPU_MODEL>*< training_time>. LLM Intent Recognition Prompts.The core of our re- trieval system’s ability to effectively handle complex, multi- dimensional queries lies in an LLM-based intent recognition module. The goal of this module is to convert users’ natural language queries int...

  57. [57]

    If the query does not mention a conference, leave as []

    **Conference**: Must the range of the following fixed options: [AAAI, ACL, COLM, COLT, CoRL, CVPR, ECCV, EMNLP, ICCV, ICLR, ICML, IJCAI, INTERSPEECH, IWSLT, MLSYS, NAACL, NDSS, NeurIPS, OSDI, UAI, USENIX-Fast, USENIX-Sec]. If the query does not mention a conference, leave as []

  58. [58]

    **Year**: Extract the range of year contained in the query

  59. [59]

    Otherwise, leave empty

    **Authors**: If the query explicitly mentions author names, extract them into a list. Otherwise, leave empty

  60. [60]

    Output as a list

    **Paper Name **: Extract all explicitly mentioned paper titles. Output as a list. Use partial matching if necessary

  61. [61]

    Use ‘ keywords_explanation‘ to briefly explain each

    **Keywords**: Extract only technical terms explicitly mentioned in query. Use ‘ keywords_explanation‘ to briefly explain each

  62. [62]

    **Abstract Summary **: One concise sentence describing technical focus from query

  63. [63]

    - Must describe both the **technique** and its ** application scope **, not just a keyword

    **methods**: - Output the **complete method or approach description**, including its application domain if present. - Must describe both the **technique** and its ** application scope **, not just a keyword. - If multiple distinct methods are present, include all as a list. - Do NOT shorten into keywords

  64. [64]

    ** Architecture / Loss Function / Datasets / Metrics**: Extract explicitly if mentioned; do not hallucinate

  65. [65]

    How to achieve real-time, high-fidelity speech- to-gesture generation using decoupled diffusion models

    **Vector Search Plan **: - Decide which fields to include in the search (‘ abstract_summary‘, ‘methods‘, ‘keywords‘, ‘ datasets‘, ‘metrics‘, ‘architecture‘, ‘ loss_function‘, etc.). - Only include fields that have non-empty content in the JSON output. - Assign a ‘weight‘ for each field between 0 and 1 based on **query emphasis **. - The weights **must be ...

  66. [66]

    Enhanced Academic Rigor and Traceability: ResearchDB successfully retrieved seminal papers, allowing the final answer to cite up to nine top-tier conference papers (e.g., [1], [3], [19]), accurately attributing key methods like Retrieval Augmentation

  67. [67]

    huge,”“here,

    Clearer Theoretical Framing:The retrieved content enabled high-level theoretical decoupling of the problem (e.g., separating gesture drivers into semantic, rhythmic, and stylistic factors), resulting in a logically coherent survey argument. This qualitative evidence confirms thatResearchDB’s mul- tidimensional matching enhances the output’s Academic Ro- b...

  68. [68]

    How to achieve real-time, high-fidelity speech-to-gesture generation using decoupled diffusion models

    Making it truly real-time-Compute budget example (per 160 ms stride):-Feature extraction: 3–5 ms CPU.-Encoders: 2–4 ms GPU.-Stage 1 diffusion (8 fps, 4–6 steps): 4–8 ms.-Stage 2 diffusion (30 fps, 4–6 steps): 7–12 ms.-IK/retarget/smoothing: 1–3 ms CPU or GPU.-Total: ~20–30 ms added on top of the lookahead, well under 100 ms on a gaming GPU. On mobile, red...