Extraction and Search in Rocq: Theorems, Definitions and Their dependencies
Pith reviewed 2026-06-28 05:29 UTC · model grok-4.3
The pith
TheoremExtr extracts theorems, dependencies, and definitions from Rocq projects by analyzing both parsing and runtime phases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TheoremExtr analyzes theorem composition and extracts theorems, dependencies, and definitions from both the parsing phase and runtime in Rocq projects. When applied to 32 open-source projects from the Rocq community, it extracted 71,795 theorems and their dependencies. In addition, it extracted 27,481 definitions and their types among these projects. A website was developed that supports cross-project similarity search for theorems and definitions.
What carries the argument
TheoremExtr, a Rocq theorem extraction and analysis tool that processes both the parsing phase and runtime to identify theorems, dependencies, and definitions.
If this is right
- Users gain the ability to search for theorems outside the scope of imported modules.
- Tool developers and researchers can obtain detailed information on theorem names, statements, and dependencies.
- Large-scale extraction and analysis of theorems from multiple Rocq projects becomes practical.
- Cross-project similarity search for theorems and definitions is enabled through the developed website.
Where Pith is reading between the lines
- The collected dataset of theorems and dependencies could be used to train models that suggest relevant lemmas during proof development.
- Dependency graphs extracted at scale might reveal common patterns in how Rocq libraries are structured and reused.
- Runtime extraction capability could be adapted to other interactive theorem provers that lack similar analysis tools.
Load-bearing premise
The extraction process correctly identifies all relevant theorems, definitions, and dependencies without systematic omissions or misclassifications when applied to real Rocq projects of varying complexity.
What would settle it
A manual audit of one small Rocq project that produces a complete hand-verified list of theorems, definitions, and dependencies and checks whether the tool output matches it exactly.
Figures
read the original abstract
Rocq (Coq) are now widely used in various fields, including software verification and mathematical proofs. When proving a new theorem, users often need to search and apply proven theorems to assist the current proof process. However, the current search command is limited to the environment of imported modules and cannot search for theorems outside of this scope. Furthermore, tool developers and researchers may want to obtain detailed information about theorems, such as theorem's names, statements, and dependencies. But there are currently no user-friendly and efficient tools available for extracting comprehensive information from Rocq projects. We introduce a Rocq theorem extraction and analysis tool, TheoremExtr, which is capable of analyzing theorem composition and extracting theorems, dependencies, and definitions from both parsing phase and runtime. We extracted 71,795 theorems and their dependencies from 32 open-source projects from the Rocq community. In addition, we extracted 27,481 definitions and their types among these projects. We also developed a website that supports cross-project similarity search for theorems and definitions. The tool is available at https://github.com/Rw1nd/TheoremExtr, and the search website is available at https://lemmasearch.com.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TheoremExtr, a tool for extracting theorems, definitions, and dependencies from Rocq (Coq) projects at both parsing and runtime phases. It reports extracting 71,795 theorems and 27,481 definitions from 32 open-source projects and provides a website for cross-project similarity search of theorems and definitions.
Significance. If the extraction pipeline is shown to be accurate and complete, the tool and dataset would be significant for enabling broader theorem search beyond imported modules and for supporting research in theorem composition and dependency analysis within the Rocq community.
major comments (1)
- [Abstract] Abstract: The reported extraction counts (71,795 theorems, 27,481 definitions from 32 projects) are presented without any accuracy metrics, manual validation, comparison to Rocq's Print/Search commands, or discussion of edge cases such as opaque modules or universe polymorphism, leaving the central claim of comprehensive extraction unsupported.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing the need to substantiate the extraction claims. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The reported extraction counts (71,795 theorems, 27,481 definitions from 32 projects) are presented without any accuracy metrics, manual validation, comparison to Rocq's Print/Search commands, or discussion of edge cases such as opaque modules or universe polymorphism, leaving the central claim of comprehensive extraction unsupported.
Authors: We agree that the abstract does not reference accuracy metrics or edge-case handling, which weakens the presentation of the counts. The manuscript details the dual-phase extraction pipeline (parsing and runtime) in Sections 3 and 4 and reports the raw counts from 32 projects, but does not include systematic validation. In the revision we will add a new evaluation subsection that reports manual validation on a random sample of extractions, direct comparisons against Rocq's Print and Search commands on selected modules, and an explicit limitations paragraph covering opaque modules and universe polymorphism. These additions will be reflected in an updated abstract as well. revision: yes
Circularity Check
No circularity: tool description with direct extraction counts
full rationale
This is a software engineering paper describing the TheoremExtr tool and reporting extraction results (71,795 theorems, 27,481 definitions from 32 projects). It contains no derivations, equations, fitted parameters, predictions, or uniqueness theorems. All reported numbers are direct outputs of running the described extraction pipeline on the chosen projects; they are not obtained by fitting a model to a subset and then claiming a prediction, nor by self-citation chains. The extraction process is presented as an engineering artifact whose correctness is an unverified assumption (as noted by the skeptic), but that is a correctness issue, not a circularity reduction. No step matches any of the six enumerated patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Appel, A.W.: Verified Functional Algorithms, Software Foundations, vol. 3. Elec- tronic textbook (2025)
2025
-
[2]
Arias, E.J.G., Caglayan, A., Itzhaky, S., Ramachandra, R.: Language server proto- col native server for coq (2024),https://rocq-prover.org/doc/V8.20.0/refman/ index.html
2024
-
[3]
Carrott, P., Saavedra, N., Thompson, K., Lerner, S., Ferreira, J.F., First, E.: Coqpyt: Proof navigation in python in the era of llms. In: d’Amorim, M. (ed.) Companion Proceedings of the 32nd ACM International Conference on the Foun- dations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, July 15- 19,2024.pp.637–641.ACM(2024).https://doi.org...
-
[4]
Gallego Arias, E.J.: SerAPI: Machine-Friendly, Data-Centric Serialization for Coq. Tech. rep., MINES ParisTech (Oct 2016),https://hal-mines-paristech. archives-ouvertes.fr/hal-01384408
2016
-
[5]
O’Reilly Media, Inc
Grinberg, M.: Flask web development. " O’Reilly Media, Inc." (2018)
2018
-
[6]
In: Matthews, J.N., An- derson, T.E
Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D.A., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: formal verification of an os kernel. In: Matthews, J.N., An- derson, T.E. (eds.) Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, SOSP 2009, Big Sky, Mon...
2009
-
[7]
pp. 207–220. ACM (2009).https://doi.org/10.1145/1629575.1629596, https://doi.org/10.1145/1629575.1629596
-
[8]
Krebbers, R., Leroy, X., Wiedijk, F.: Formal C semantics: Compcert and the C standard. In: Klein, G., Gamboa, R. (eds.) Interactive Theorem Proving - 5th In- ternational Conference, ITP 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 14-17, 2014. Proceedings. Lecture Notes in Com- puter Science, vol. 8558, pp. 543–548. Sp...
-
[9]
Proof automation with large language models,
Lu, M., Delaware, B., Zhang, T.: Proof automation with large language models. In: Filkov, V., Ray, B., Zhou, M. (eds.) Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024. pp. 1509–1520. ACM (2024).https://doi.org/10.1145/3691620.3695521,https://doi.org/ ...
-
[10]
Lù, X.H.: Bm25s: Orders of magnitude faster lexical search via eager sparse scoring (2024),https://arxiv.org/abs/2407.03618
arXiv 2024
-
[11]
Online book (2021)
Mahboubi, A., Tassi, E.: Mathematical components. Online book (2021)
2021
-
[12]
Microsoft: Language server protocol,https://microsoft.github.io/ language-server-protocol/
-
[13]
The Coq Development Team: The Coq reference manual – release 8.20.0 (2024), https://rocq-prover.org/doc/V8.20.0/refman/index.html
2024
-
[14]
The Coq Development Team: Rocq ocaml api (2024),https://rocq-prover.org/ doc/V8.20.0/api/coq-core/Vernacexpr/index.html#type-synpure_vernac_ expr
2024
-
[15]
The mathlib4 Development Team: The math library of lean 4 (2025),https:// rocq-prover.org/doc/V8.20.0/refman/index.html
2025
-
[16]
Thompson, K., Saavedra, N., Carrott, P., Fisher, K., Sanchez-Stern, A., Brun, Y., Ferreira, J.F., Lerner, S., First, E.: Rango: Adaptive retrieval-augmented proving Title Suppressed Due to Excessive Length 9 for automated software verification. In: 47th IEEE/ACM International Conference on Software Engineering, ICSE 2025, Ottawa, ON, Canada, April 26 - Ma...
-
[17]
Yang,K.,Deng,J.:Learningtoprovetheoremsviainteractingwithproofassistants (2019),https://arxiv.org/abs/1905.09381
Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.