Typify: A Lightweight Usage-driven Static Analyzer for Precise Python Type Inference
Pith reviewed 2026-05-10 19:22 UTC · model grok-4.3
The pith
Typify infers precise Python types from usage patterns with static analysis alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Typify integrates symbolic execution with iterative fixpoint analysis and a context-matching retrieval system to propagate and predict type information across entire projects. By constructing and traversing dependency graphs in an execution-aware manner, Typify accurately connects function calls to their definitions and infers usage-based type semantics, even in complex, interdependent modules.
What carries the argument
Execution-aware dependency graphs combined with context-matching retrieval, which link call sites to definitions and propagate usage-derived type information through symbolic execution and fixpoint analysis.
If this is right
- Typify produces type predictions for variables, arguments, and return values without requiring annotations or training data.
- The tool matches or exceeds accuracy of deep-learning systems on standard benchmarks such as ManyTypes4Py and Typilus.
- It remains computationally light and interpretable, making it suitable for large and continuously changing codebases.
- Usage-driven retrieval serves as a practical substitute for statistical learning in type-inference tasks.
Where Pith is reading between the lines
- The same graph-construction and retrieval pattern could be tested on other dynamic languages that lack static types.
- Hybrid systems might combine Typify-style usage graphs with lightweight statistical signals to handle cases where static traces are incomplete.
- Embedding the analyzer in editors could supply immediate type suggestions during development without requiring users to run separate training steps.
Load-bearing premise
That execution-aware dependency graphs and context-matching retrieval will correctly identify call targets and carry usage-based type information through complex, interdependent modules.
What would settle it
A large Python project containing many ambiguous cross-module calls where Typify assigns wrong types to a substantial fraction of variables, arguments, or returns.
Figures
read the original abstract
Python's dynamic type system, while offering significant flexibility and expressiveness, poses substantial challenges for static analysis and automated tooling, particularly in unannotated or partially annotated codebases. Existing type inference approaches often depend on existing type annotations or on deep learning models that require extensive training corpora and considerable computational resources, resulting in limited scalability and reduced interpretability. We introduce Typify, a lightweight, usage-driven static analysis engine designed to infer precise and contextually relevant type information without relying on statistical learning or large datasets. Typify integrates symbolic execution with iterative fixpoint analysis and a context-matching retrieval system to propagate and predict type information across entire projects. By constructing and traversing dependency graphs in an execution-aware manner, Typify accurately connects function calls to their definitions and infers usage-based type semantics, even in complex, interdependent modules. We evaluate Typify on a diverse corpus of real-world Python repositories, including the ManyTypes4Py and Typilus datasets, benchmarking its effectiveness in predicting types of variables, arguments, and return statements. Results from the evaluation show that Typify consistently matches or surpasses state-of-the-art deep learning-based systems such as Type4Py and HiTyper, as well as industry-standard static type inference tools like Pyre. Our findings demonstrate that usage-driven, retrieval-based inference can match or exceed the accuracy of data-driven methods, offering a practical, interpretable, and computationally efficient alternative for large and evolving Python codebases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Typify, a lightweight usage-driven static analyzer for Python type inference. It combines symbolic execution, iterative fixpoint analysis, and context-matching retrieval over execution-aware dependency graphs to propagate usage-based type information for variables, arguments, and return statements without annotations or machine learning. The central claim is that Typify matches or surpasses deep-learning tools (Type4Py, HiTyper) and the static tool Pyre on the ManyTypes4Py and Typilus datasets.
Significance. If the dependency-graph and retrieval mechanisms are shown to be robust, Typify would provide a scalable, interpretable, and training-free alternative to data-driven type inference, which is valuable for practical tooling in large, evolving Python codebases.
major comments (2)
- [Methodology / Dependency Graph Construction] The performance claim (matching or exceeding Type4Py, HiTyper, and Pyre) rests on the assumption that execution-aware dependency graphs plus context-matching retrieval correctly resolve call sites and propagate types across interdependent modules. The description of how the graph construction and traversal handle Python dynamic features (duck typing, decorators, conditional imports, first-class functions) is insufficient to verify completeness or ambiguity resolution; this is load-bearing for the evaluation results.
- [Evaluation] Evaluation section: the abstract and reported results supply no methodological details on error bars, exclusion criteria, exact metrics (top-1 accuracy, etc.), baseline re-implementations, or handling of partially annotated code, preventing verification that the superiority claim is statistically supported or reproducible.
minor comments (2)
- Add a small illustrative example (with code snippet and resulting graph) showing how a call site is resolved and a type is propagated; this would clarify the core mechanism without lengthening the paper.
- [Abstract] The abstract could explicitly state the precise evaluation metrics and dataset splits used, rather than only naming the corpora.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment point by point below, indicating where we will revise the manuscript to strengthen clarity and reproducibility while preserving the core contributions.
read point-by-point responses
-
Referee: [Methodology / Dependency Graph Construction] The performance claim (matching or exceeding Type4Py, HiTyper, and Pyre) rests on the assumption that execution-aware dependency graphs plus context-matching retrieval correctly resolve call sites and propagate types across interdependent modules. The description of how the graph construction and traversal handle Python dynamic features (duck typing, decorators, conditional imports, first-class functions) is insufficient to verify completeness or ambiguity resolution; this is load-bearing for the evaluation results.
Authors: We agree that the current description of dependency graph construction and traversal provides insufficient detail on Python dynamic features, which limits independent verification. In the revised manuscript we will add an expanded subsection (likely in Section 3) that explicitly describes: (1) approximation of duck typing via usage-pattern matching in the context-retrieval step rather than nominal type checks, (2) decorator handling by symbolically executing wrapper functions and propagating the resulting type constraints, (3) conditional imports by exploring all feasible execution paths during fixpoint iteration, and (4) first-class functions by treating call sites as context-dependent edges in the execution-aware graph. We will also include small illustrative examples and pseudocode for ambiguity resolution. These additions will not change the reported results but will make the load-bearing mechanisms verifiable. revision: yes
-
Referee: [Evaluation] Evaluation section: the abstract and reported results supply no methodological details on error bars, exclusion criteria, exact metrics (top-1 accuracy, etc.), baseline re-implementations, or handling of partially annotated code, preventing verification that the superiority claim is statistically supported or reproducible.
Authors: The referee correctly notes that the abstract and high-level result summaries omit several reproducibility details. The full evaluation section already defines top-1 accuracy as the primary metric and uses the ManyTypes4Py and Typilus datasets, but we acknowledge the need for greater transparency. In revision we will augment the evaluation section with: (1) error bars reported as standard deviation across the individual repositories, (2) explicit exclusion criteria (files with parse errors or containing no variables/arguments/returns eligible for inference), (3) precise description of baseline usage (official Type4Py and HiTyper models with default hyperparameters; Pyre run in strict mode), and (4) clarification that partially annotated code is evaluated only on unannotated elements while any existing annotations serve solely for ground-truth validation. These changes will be presented in a new “Evaluation Methodology” paragraph and will not alter the numerical results. revision: yes
Circularity Check
No circularity: claims rest on external evaluation
full rationale
The paper presents Typify as a usage-driven static analyzer combining symbolic execution, fixpoint analysis, and context-matching retrieval on execution-aware dependency graphs. No equations, fitted parameters, self-definitional constructions, or load-bearing self-citations appear in the abstract or described approach. Performance claims (matching or surpassing Type4Py, HiTyper, Pyre) are grounded in evaluation on independent external datasets (ManyTypes4Py, Typilus) rather than any internal reduction of predictions to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Usage patterns in Python code reliably reveal type information even without annotations.
Reference graph
Works this paper leans on
- [1]
-
[2]
https://jedi.readthedocs.io/, 2026
Jedi. https://jedi.readthedocs.io/, 2026. Retrieved Jan 29, 2026
work page 2026
-
[3]
Mypy. https://mypy-lang.org/, 2026. Retrieved Jan 29, 2026
work page 2026
-
[4]
Pyre. https://pyre-check.org/, 2026. Retrieved Jan 29, 2026
work page 2026
-
[5]
https://microsoft.github.io/pyright/, 2026
Pyright. https://microsoft.github.io/pyright/, 2026. Retrieved Jan 29, 2026
work page 2026
-
[6]
https://google.github.io/pytype/, 2026
Pytype. https://google.github.io/pytype/, 2026. Retrieved Jan 29, 2026
work page 2026
-
[7]
https://www.typescriptlang.org/, 2026
Typescript. https://www.typescriptlang.org/, 2026. Retrieved Jan 29, 2026
work page 2026
-
[8]
M. Allamanis, E. T. Barr, S. Ducousso, and Z. Gao. Typilus: neural type hints. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 91–105, 2020
work page 2020
-
[9]
L. O. Andersen. Program analysis and specialization for the c programming language. Technical Report 148, DIKU, University of Copenhagen, 1994
work page 1994
-
[10]
D. F. Bacon and P. F. Sweeney. Fast static analysis of c++ virtual function calls. InProceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 324–341, 1996
work page 1996
-
[11]
S. Cui, L. Zhao, X. Li, and J. Huang. Pyinfer: deep learning semantic type inference for python variables. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) Workshops, 2020. Typify: A Lightweight Usage-driven Static Analyzer for Precise Python Type Inference
work page 2020
-
[12]
J. Dean, D. Grove, and C. Chambers. Optimization of object-oriented programs using static class hierarchy analysis. InProceedings of the European Conference on Object-Oriented Programming (ECOOP), pages 77–101, 1995
work page 1995
-
[13]
Y. Zhang et al. Generating python type annotations from type inference.Com- munications of the ACM, 2024
work page 2024
-
[14]
L. Di Grazia and M. Pradel. The evolution of type annotations in python: an empirical study. InProceedings of the 30th ACM Joint European Software Engi- neering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 209–220, 2022
work page 2022
- [15]
-
[16]
V. J. Hellendoorn, C. Bird, E. T. Barr, and M. Allamanis. Deep learning type inference. InProceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 152–162, 2018
work page 2018
- [17]
- [18]
-
[19]
J. Lehtosalo and G. van Rossum. Pep 561: distributing and packaging type information. https://peps.python.org/pep-0561/, 2017. Retrieved Jan 29, 2026
work page 2017
-
[20]
A. M. Mir, E. Latoškinas, and G. Gousios. Manytypes4py: a benchmark python dataset for machine learning-based type inference. InProceedings of the IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pages 585–589, 2021
work page 2021
-
[21]
A. M. Mir, E. Latoškinas, S. Proksch, and G. Gousios. Type4py: practical deep similarity learning-based type inference for python. InProceedings of the 44th International Conference on Software Engineering (ICSE), pages 2241–2252, 2022
work page 2022
-
[22]
Y. Peng, C. Gao, Z. Li, B. Gao, D. Lo, Q. Zhang, and M. Lyu. Static inference meets deep learning: a hybrid type inference approach for python. InProceedings of the 44th International Conference on Software Engineering (ICSE), pages 2019–2030, 2022
work page 2019
-
[23]
Y. Peng, C. Gao, Z. Li, D. Lo, and M. Lyu. Generative type inference for python. arXiv, 2023
work page 2023
-
[24]
B. C. Pierce and D. N. Turner. Local type inference.ACM Transactions on Programming Languages and Systems, 22(1):1–44, 2000
work page 2000
-
[25]
M. Pradel, G. Gousios, J. Liu, and S. Chandra. Typewriter: neural type prediction with search-based validation. InProceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 209–220, 2020
work page 2020
- [26]
-
[27]
B. Steensgaard. Points-to analysis in almost linear time. InProceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 32–41, 1996
work page 1996
-
[28]
S. Tobin-Hochstadt and M. Felleisen. Typed racket: a practical gradual type system. InProceedings of the ACM International Conference on Functional Pro- gramming (ICFP), pages 78–89, 2008
work page 2008
-
[29]
E. Traut and J. Zijlstra. Pep 695: type parameter syntax. https://peps.python.org/ pep-0695/, 2023. Retrieved Jan 29, 2026
work page 2023
-
[30]
G. van Rossum, J. Lehtosalo, and Ł. Langa. Pep 484: type hints. https://peps. python.org/pep-0484/, 2014. Retrieved Jan 29, 2026
work page 2014
-
[31]
A. P. S. Venkatesh, R. Lämmel, and E. Bodden. Typeevalpy: a micro-benchmarking framework for python type inference and checking.arXiv, 2023
work page 2023
-
[32]
Y. Wang. Pysonar2. https://github.com/yinwang0/pysonar2, 2026. Retrieved Jan 29, 2026
work page 2026
-
[33]
J. Wei, M. Goyal, R. Jain, B. Nieuwenhuis, H. Madhyastha, P. Anderson, and I. Dillig. Lambdanet: probabilistic type inference using graph neural networks. InInternational Conference on Learning Representations (ICLR), 2020
work page 2020
-
[34]
Z. Xu, V. Raychev, M. Vechev, and T. Touili. Python probabilistic type inference with natural language support. InProceedings of the ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE), 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.