pith. sign in

arxiv: 2604.05067 · v1 · submitted 2026-04-06 · 💻 cs.SE

Typify: A Lightweight Usage-driven Static Analyzer for Precise Python Type Inference

Pith reviewed 2026-05-10 19:22 UTC · model grok-4.3

classification 💻 cs.SE
keywords python type inferencestatic analysisusage-driven inferencedependency graphssymbolic executionfixpoint analysistype prediction
0
0 comments X

The pith

Typify infers precise Python types from usage patterns with static analysis alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Typify as a static analysis tool that infers types in Python programs by examining how variables, arguments, and functions are actually used. It builds execution-aware dependency graphs to trace calls to their definitions and applies symbolic execution plus fixpoint iteration to spread type information project-wide. The system requires no pre-existing annotations and no large training sets. Evaluation on real repositories demonstrates that these steps produce type predictions that equal or beat both deep-learning tools and conventional static analyzers. If the approach holds, developers gain a lightweight, inspectable way to obtain type information in large, evolving codebases without the overhead of model training.

Core claim

Typify integrates symbolic execution with iterative fixpoint analysis and a context-matching retrieval system to propagate and predict type information across entire projects. By constructing and traversing dependency graphs in an execution-aware manner, Typify accurately connects function calls to their definitions and infers usage-based type semantics, even in complex, interdependent modules.

What carries the argument

Execution-aware dependency graphs combined with context-matching retrieval, which link call sites to definitions and propagate usage-derived type information through symbolic execution and fixpoint analysis.

If this is right

  • Typify produces type predictions for variables, arguments, and return values without requiring annotations or training data.
  • The tool matches or exceeds accuracy of deep-learning systems on standard benchmarks such as ManyTypes4Py and Typilus.
  • It remains computationally light and interpretable, making it suitable for large and continuously changing codebases.
  • Usage-driven retrieval serves as a practical substitute for statistical learning in type-inference tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-construction and retrieval pattern could be tested on other dynamic languages that lack static types.
  • Hybrid systems might combine Typify-style usage graphs with lightweight statistical signals to handle cases where static traces are incomplete.
  • Embedding the analyzer in editors could supply immediate type suggestions during development without requiring users to run separate training steps.

Load-bearing premise

That execution-aware dependency graphs and context-matching retrieval will correctly identify call targets and carry usage-based type information through complex, interdependent modules.

What would settle it

A large Python project containing many ambiguous cross-module calls where Typify assigns wrong types to a substantial fraction of variables, arguments, or returns.

Figures

Figures reproduced from arXiv: 2604.05067 by Ali Aman, Muhammad Asaduzzaman, Shaowei Wang.

Figure 1
Figure 1. Figure 1: Types in Python. TypeExpr. The TypeExpr abstraction is Typify’s canonical in￾ternal representation of a type. It captures the final, fully resolved type of a type slot in a uniform structure that is consistent across all type categories. The categories of types handled by Typify are summarized in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Typify’s inference pipeline. 3.5 Usage-Driven Inference Once modules are scheduled, Typify performs usage-driven inference within each module. This stage is the core of the engine: rather than relying on predefined annotations or training data, Typify infers types directly from how variables, functions, and data structures are used. The analysis proceeds statement by statement, updating the inferred types … view at source ↗
Figure 3
Figure 3. Figure 3: Overlap of data points correctly predicted by Typify [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Python's dynamic type system, while offering significant flexibility and expressiveness, poses substantial challenges for static analysis and automated tooling, particularly in unannotated or partially annotated codebases. Existing type inference approaches often depend on existing type annotations or on deep learning models that require extensive training corpora and considerable computational resources, resulting in limited scalability and reduced interpretability. We introduce Typify, a lightweight, usage-driven static analysis engine designed to infer precise and contextually relevant type information without relying on statistical learning or large datasets. Typify integrates symbolic execution with iterative fixpoint analysis and a context-matching retrieval system to propagate and predict type information across entire projects. By constructing and traversing dependency graphs in an execution-aware manner, Typify accurately connects function calls to their definitions and infers usage-based type semantics, even in complex, interdependent modules. We evaluate Typify on a diverse corpus of real-world Python repositories, including the ManyTypes4Py and Typilus datasets, benchmarking its effectiveness in predicting types of variables, arguments, and return statements. Results from the evaluation show that Typify consistently matches or surpasses state-of-the-art deep learning-based systems such as Type4Py and HiTyper, as well as industry-standard static type inference tools like Pyre. Our findings demonstrate that usage-driven, retrieval-based inference can match or exceed the accuracy of data-driven methods, offering a practical, interpretable, and computationally efficient alternative for large and evolving Python codebases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Typify, a lightweight usage-driven static analyzer for Python type inference. It combines symbolic execution, iterative fixpoint analysis, and context-matching retrieval over execution-aware dependency graphs to propagate usage-based type information for variables, arguments, and return statements without annotations or machine learning. The central claim is that Typify matches or surpasses deep-learning tools (Type4Py, HiTyper) and the static tool Pyre on the ManyTypes4Py and Typilus datasets.

Significance. If the dependency-graph and retrieval mechanisms are shown to be robust, Typify would provide a scalable, interpretable, and training-free alternative to data-driven type inference, which is valuable for practical tooling in large, evolving Python codebases.

major comments (2)
  1. [Methodology / Dependency Graph Construction] The performance claim (matching or exceeding Type4Py, HiTyper, and Pyre) rests on the assumption that execution-aware dependency graphs plus context-matching retrieval correctly resolve call sites and propagate types across interdependent modules. The description of how the graph construction and traversal handle Python dynamic features (duck typing, decorators, conditional imports, first-class functions) is insufficient to verify completeness or ambiguity resolution; this is load-bearing for the evaluation results.
  2. [Evaluation] Evaluation section: the abstract and reported results supply no methodological details on error bars, exclusion criteria, exact metrics (top-1 accuracy, etc.), baseline re-implementations, or handling of partially annotated code, preventing verification that the superiority claim is statistically supported or reproducible.
minor comments (2)
  1. Add a small illustrative example (with code snippet and resulting graph) showing how a call site is resolved and a type is propagated; this would clarify the core mechanism without lengthening the paper.
  2. [Abstract] The abstract could explicitly state the precise evaluation metrics and dataset splits used, rather than only naming the corpora.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment point by point below, indicating where we will revise the manuscript to strengthen clarity and reproducibility while preserving the core contributions.

read point-by-point responses
  1. Referee: [Methodology / Dependency Graph Construction] The performance claim (matching or exceeding Type4Py, HiTyper, and Pyre) rests on the assumption that execution-aware dependency graphs plus context-matching retrieval correctly resolve call sites and propagate types across interdependent modules. The description of how the graph construction and traversal handle Python dynamic features (duck typing, decorators, conditional imports, first-class functions) is insufficient to verify completeness or ambiguity resolution; this is load-bearing for the evaluation results.

    Authors: We agree that the current description of dependency graph construction and traversal provides insufficient detail on Python dynamic features, which limits independent verification. In the revised manuscript we will add an expanded subsection (likely in Section 3) that explicitly describes: (1) approximation of duck typing via usage-pattern matching in the context-retrieval step rather than nominal type checks, (2) decorator handling by symbolically executing wrapper functions and propagating the resulting type constraints, (3) conditional imports by exploring all feasible execution paths during fixpoint iteration, and (4) first-class functions by treating call sites as context-dependent edges in the execution-aware graph. We will also include small illustrative examples and pseudocode for ambiguity resolution. These additions will not change the reported results but will make the load-bearing mechanisms verifiable. revision: yes

  2. Referee: [Evaluation] Evaluation section: the abstract and reported results supply no methodological details on error bars, exclusion criteria, exact metrics (top-1 accuracy, etc.), baseline re-implementations, or handling of partially annotated code, preventing verification that the superiority claim is statistically supported or reproducible.

    Authors: The referee correctly notes that the abstract and high-level result summaries omit several reproducibility details. The full evaluation section already defines top-1 accuracy as the primary metric and uses the ManyTypes4Py and Typilus datasets, but we acknowledge the need for greater transparency. In revision we will augment the evaluation section with: (1) error bars reported as standard deviation across the individual repositories, (2) explicit exclusion criteria (files with parse errors or containing no variables/arguments/returns eligible for inference), (3) precise description of baseline usage (official Type4Py and HiTyper models with default hyperparameters; Pyre run in strict mode), and (4) clarification that partially annotated code is evaluated only on unannotated elements while any existing annotations serve solely for ground-truth validation. These changes will be presented in a new “Evaluation Methodology” paragraph and will not alter the numerical results. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external evaluation

full rationale

The paper presents Typify as a usage-driven static analyzer combining symbolic execution, fixpoint analysis, and context-matching retrieval on execution-aware dependency graphs. No equations, fitted parameters, self-definitional constructions, or load-bearing self-citations appear in the abstract or described approach. Performance claims (matching or surpassing Type4Py, HiTyper, Pyre) are grounded in evaluation on independent external datasets (ManyTypes4Py, Typilus) rather than any internal reduction of predictions to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard static-analysis assumptions rather than new fitted parameters or invented entities; full paper would be needed to audit any implementation-specific choices.

axioms (1)
  • domain assumption Usage patterns in Python code reliably reveal type information even without annotations.
    This premise underpins the entire usage-driven inference engine described in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1236 out tokens · 43357 ms · 2026-05-10T19:22:44.748428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    https://flow.org/, 2026

    Flow. https://flow.org/, 2026. Retrieved Jan 29, 2026

  2. [2]

    https://jedi.readthedocs.io/, 2026

    Jedi. https://jedi.readthedocs.io/, 2026. Retrieved Jan 29, 2026

  3. [3]

    https://mypy-lang.org/, 2026

    Mypy. https://mypy-lang.org/, 2026. Retrieved Jan 29, 2026

  4. [4]

    https://pyre-check.org/, 2026

    Pyre. https://pyre-check.org/, 2026. Retrieved Jan 29, 2026

  5. [5]

    https://microsoft.github.io/pyright/, 2026

    Pyright. https://microsoft.github.io/pyright/, 2026. Retrieved Jan 29, 2026

  6. [6]

    https://google.github.io/pytype/, 2026

    Pytype. https://google.github.io/pytype/, 2026. Retrieved Jan 29, 2026

  7. [7]

    https://www.typescriptlang.org/, 2026

    Typescript. https://www.typescriptlang.org/, 2026. Retrieved Jan 29, 2026

  8. [8]

    Allamanis, E

    M. Allamanis, E. T. Barr, S. Ducousso, and Z. Gao. Typilus: neural type hints. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 91–105, 2020

  9. [9]

    L. O. Andersen. Program analysis and specialization for the c programming language. Technical Report 148, DIKU, University of Copenhagen, 1994

  10. [10]

    D. F. Bacon and P. F. Sweeney. Fast static analysis of c++ virtual function calls. InProceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 324–341, 1996

  11. [11]

    S. Cui, L. Zhao, X. Li, and J. Huang. Pyinfer: deep learning semantic type inference for python variables. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) Workshops, 2020. Typify: A Lightweight Usage-driven Static Analyzer for Precise Python Type Inference

  12. [12]

    J. Dean, D. Grove, and C. Chambers. Optimization of object-oriented programs using static class hierarchy analysis. InProceedings of the European Conference on Object-Oriented Programming (ECOOP), pages 77–101, 1995

  13. [13]

    Zhang et al

    Y. Zhang et al. Generating python type annotations from type inference.Com- munications of the ACM, 2024

  14. [14]

    Di Grazia and M

    L. Di Grazia and M. Pradel. The evolution of type annotations in python: an empirical study. InProceedings of the 30th ACM Joint European Software Engi- neering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 209–220, 2022

  15. [15]

    Hastings

    L. Hastings. Pep 563: postponed evaluation of annotations. https://peps.python. org/pep-0563/, 2017. Retrieved Jan 29, 2026

  16. [16]

    V. J. Hellendoorn, C. Bird, E. T. Barr, and M. Allamanis. Deep learning type inference. InProceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 152–162, 2018

  17. [17]

    Jesse, V

    K. Jesse, V. Raychev, M. Pradel, and P. Devanbu. Learning type annotation: is big data enough? InProceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2021

  18. [18]

    Lehtosalo

    J. Lehtosalo. Pep 526: syntax for variable annotations. https://peps.python.org/ pep-0526/, 2016. Retrieved Jan 29, 2026

  19. [19]

    Lehtosalo and G

    J. Lehtosalo and G. van Rossum. Pep 561: distributing and packaging type information. https://peps.python.org/pep-0561/, 2017. Retrieved Jan 29, 2026

  20. [20]

    A. M. Mir, E. Latoškinas, and G. Gousios. Manytypes4py: a benchmark python dataset for machine learning-based type inference. InProceedings of the IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pages 585–589, 2021

  21. [21]

    A. M. Mir, E. Latoškinas, S. Proksch, and G. Gousios. Type4py: practical deep similarity learning-based type inference for python. InProceedings of the 44th International Conference on Software Engineering (ICSE), pages 2241–2252, 2022

  22. [22]

    Y. Peng, C. Gao, Z. Li, B. Gao, D. Lo, Q. Zhang, and M. Lyu. Static inference meets deep learning: a hybrid type inference approach for python. InProceedings of the 44th International Conference on Software Engineering (ICSE), pages 2019–2030, 2022

  23. [23]

    Y. Peng, C. Gao, Z. Li, D. Lo, and M. Lyu. Generative type inference for python. arXiv, 2023

  24. [24]

    B. C. Pierce and D. N. Turner. Local type inference.ACM Transactions on Programming Languages and Systems, 22(1):1–44, 2000

  25. [25]

    Pradel, G

    M. Pradel, G. Gousios, J. Liu, and S. Chandra. Typewriter: neural type prediction with search-based validation. InProceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 209–220, 2020

  26. [26]

    Salis, T

    V. Salis, T. Sotiropoulos, P. Louridas, D. Spinellis, and D. Mitropoulos. Pycg: practical call graph generation in python. InProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2021

  27. [27]

    Steensgaard

    B. Steensgaard. Points-to analysis in almost linear time. InProceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 32–41, 1996

  28. [28]

    Tobin-Hochstadt and M

    S. Tobin-Hochstadt and M. Felleisen. Typed racket: a practical gradual type system. InProceedings of the ACM International Conference on Functional Pro- gramming (ICFP), pages 78–89, 2008

  29. [29]

    Traut and J

    E. Traut and J. Zijlstra. Pep 695: type parameter syntax. https://peps.python.org/ pep-0695/, 2023. Retrieved Jan 29, 2026

  30. [30]

    van Rossum, J

    G. van Rossum, J. Lehtosalo, and Ł. Langa. Pep 484: type hints. https://peps. python.org/pep-0484/, 2014. Retrieved Jan 29, 2026

  31. [31]

    A. P. S. Venkatesh, R. Lämmel, and E. Bodden. Typeevalpy: a micro-benchmarking framework for python type inference and checking.arXiv, 2023

  32. [32]

    Y. Wang. Pysonar2. https://github.com/yinwang0/pysonar2, 2026. Retrieved Jan 29, 2026

  33. [33]

    J. Wei, M. Goyal, R. Jain, B. Nieuwenhuis, H. Madhyastha, P. Anderson, and I. Dillig. Lambdanet: probabilistic type inference using graph neural networks. InInternational Conference on Learning Representations (ICLR), 2020

  34. [34]

    Z. Xu, V. Raychev, M. Vechev, and T. Touili. Python probabilistic type inference with natural language support. InProceedings of the ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE), 2016