pith. sign in

arxiv: 1906.08351 · v1 · pith:PCQ7TDC5new · submitted 2019-06-19 · 💻 cs.SE

Towards Lakosian Multilingual Software Design Principles

Pith reviewed 2026-05-25 19:54 UTC · model grok-4.3

classification 💻 cs.SE
keywords multilingual softwarepybind11Lakosian design principlesforeign function interfacephysical designsoftware design rules
0
0 comments X

The pith

Lakosian physical design rules are extended to multilingual software using pybind11.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to create rigorous design advice for software that combines multiple programming languages through foreign function interfaces such as pybind11. It adapts the Lakosian method, which focuses on physical design in large C++ systems, to handle the opacity introduced by these interfaces. The authors propose specific rule extensions for pybind11 and test them against 50 public GitHub repositories to see current compliance levels. If successful, this would allow standard engineering tools like call-graph analysis to work across language boundaries, reducing debugging difficulties, inefficiencies, and security vulnerabilities.

Core claim

An extension to the Lakosian C++ design rules is proposed for multilingual software using pybind11, with compliance measured on 50 repositories using the MLSA toolkit, leading to a generalization for any FFI.

What carries the argument

The proposed extensions to Lakosian physical design rules for the pybind11 foreign function interface, which address the opacity that blocks common analysis tools.

If this is right

  • Physical design principles can apply to cross-language boundaries in FFI-based systems.
  • Automated measurement of compliance becomes possible for multilingual code.
  • General rules can be developed that apply beyond pybind11 to other FFIs.
  • Debugging and security analysis tools can operate more effectively on multilingual software.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar extensions might improve design practices for other combinations of languages and interfaces.
  • Adopting these rules could influence how large systems are structured to minimize cross-language dependencies.
  • The measurement on GitHub repos provides a baseline for tracking improvements in multilingual design over time.

Load-bearing premise

That the Lakosian physical design methodology can be extended to pybind11 while preserving its benefits and that the sample of 50 repositories indicates typical practice.

What would settle it

Finding that the proposed rules do not reduce the opacity of pybind11 calls or that a different sample of repositories shows very different compliance rates.

Figures

Figures reproduced from arXiv: 1906.08351 by Damian M. Lyons, Saba B. Zahra, Thomas M. Marshall.

Figure 1
Figure 1. Figure 1: (a) External linkage between C++ translation units; (b) Foreign-function interface between multilingual translation units. 3.2 Standard pybind11 case A straightforward example of the use of pybind11 to call a C++ function from a Python script A.py is shown in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: MLSA Software Architecture (a); example filter pipeline, from (Lyons, Bogar and Baird 2017) (b). The modularity design principle means that extending MLSA to handle a new cross-language interface is as simple as extending the parts of the processing pipeline with scripts designed for the new interface. 4.1 Multilingual call-graph analysis We have argued in prior work (Lyons, Bogar and Baird 2017) (Lyons, B… view at source ↗
Figure 4
Figure 4. Figure 4: Multilingual call-graph extracted by MLSA from the example in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Misnamed module example (top two rows, explanatory example; bottom row, summarized example call-graph from repository collection). Misnamed module. Included in this category is anything that puts the binding information in a place that is not clearly visible by inspection of the software or static analysis. The obvious example (shown in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Large software systems often comprise programs written in different programming languages. In the case when cross-language interoperability is accomplished with a Foreign Function Interface (FFI), for example pybind11, Boost.Python, Emscripten, PyV8, or JNI, among many others, common software engineering tools, such as call-graph analysis, are obstructed by the opacity of the FFI. This complicates debugging and fosters potential inefficiency and security problems. One contributing issue is that there is little rigorous software design advice for multilingual software. In this paper, we present our progress towards a more rigorous design approach to multilingual software. The approach is based on the existing approach to the design of large-scale C++ systems developed by Lakos. The Lakosian approach is one of the few design methodologies to address physical design rather than just logical design. Using the MLSA toolkit developed in prior work for analysis of multilingual software, we focus in on one FFI -- the pybind11 FFI. An extension to the Lakosian C++ design rules is proposed to address multilingual software that uses pybind11. Using a sample of 50 public GitHub repositories that use pybind11, we measure how many repositories would currently satisfy these rules. We conclude with a proposed generalization of the pybind11-based rules for any multilingual software using an FFI interface.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes an extension of Lakosian physical-design rules (component dependencies, acyclic dependencies) to multilingual C++/Python systems that use the pybind11 FFI, presents explicit new rules for the FFI boundary, and reports a compliance measurement performed with the MLSA toolkit on a sample of 50 public GitHub repositories that employ pybind11. It concludes by sketching a generalization to arbitrary FFIs.

Significance. If the proposed rules can be shown to retain the claimed Lakosian benefits (reduced cross-language opacity, lower debugging and security costs) while remaining practical, the work would supply one of the first rigorous physical-design guidelines for multilingual code. The use of an existing analysis toolkit and a concrete measurement step on real repositories are positive steps toward falsifiable claims.

major comments (3)
  1. [abstract / evaluation section] The evaluation only counts how many of the 50 repositories satisfy the new pybind11 rules; it supplies no measurement (e.g., call-graph opacity, cross-language defect density, or maintainability proxies) comparing compliant versus non-compliant repositories. This leaves the central claim—that the extension preserves Lakosian benefits—untested (see abstract and the measurement paragraph).
  2. [evaluation section] The sample of 50 repositories is described only as “public GitHub repositories that use pybind11”; no selection criteria, stratification by project size or domain, or justification of representativeness is given, so the compliance percentages cannot be interpreted as an indication of current practice.
  3. [rule-proposal and measurement paragraphs] The manuscript states that explicit rules are proposed and that MLSA was used to check them, yet neither the rule statements themselves nor any implementation or error-analysis details appear in the provided text; without these the measurement step cannot be reproduced or assessed for soundness.
minor comments (1)
  1. [conclusion] The abstract claims a “proposed generalization” but the text does not indicate whether this generalization is stated formally or remains informal.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying the paper's scope as an initial step toward Lakosian multilingual design and indicating revisions to improve clarity, reproducibility, and acknowledgment of limitations.

read point-by-point responses
  1. Referee: [abstract / evaluation section] The evaluation only counts how many of the 50 repositories satisfy the new pybind11 rules; it supplies no measurement (e.g., call-graph opacity, cross-language defect density, or maintainability proxies) comparing compliant versus non-compliant repositories. This leaves the central claim—that the extension preserves Lakosian benefits—untested (see abstract and the measurement paragraph).

    Authors: The paper is framed as exploratory ('Towards...') and measures baseline compliance to establish that the proposed rules are applicable to real code; it does not claim or attempt to empirically demonstrate preservation of benefits such as reduced opacity or lower defect rates. Those benefits are hypothesized from the original Lakosian C++ results. We agree the evaluation does not test the benefits and will revise the abstract, introduction, and conclusion to explicitly delimit the scope and flag benefit validation as future work. revision: yes

  2. Referee: [evaluation section] The sample of 50 repositories is described only as “public GitHub repositories that use pybind11”; no selection criteria, stratification by project size or domain, or justification of representativeness is given, so the compliance percentages cannot be interpreted as an indication of current practice.

    Authors: Repositories were obtained via GitHub search for pybind11 usage followed by successful MLSA analysis; no stratification by size or domain was applied because the aim was an initial feasibility demonstration rather than a representative survey. We accept that this restricts interpretation and will expand the evaluation section with explicit selection criteria, summary statistics on repository sizes, and a limitations paragraph on generalizability. revision: yes

  3. Referee: [rule-proposal and measurement paragraphs] The manuscript states that explicit rules are proposed and that MLSA was used to check them, yet neither the rule statements themselves nor any implementation or error-analysis details appear in the provided text; without these the measurement step cannot be reproduced or assessed for soundness.

    Authors: The rules appear in the 'Proposed pybind11 Rules' section and MLSA application is summarized in the evaluation, but we agree that fuller detail would aid reproducibility. We will move the complete rule statements into the main text, add a subsection describing the MLSA extensions and checks performed, and include a brief error-analysis or threats-to-validity discussion. revision: yes

standing simulated objections not resolved
  • Direct empirical comparison of Lakosian benefits (e.g., call-graph opacity or cross-language defect density) between compliant and non-compliant repositories, as no such metrics were collected in the study.

Circularity Check

0 steps flagged

Minor self-citation to analysis toolkit; proposed rules and compliance count remain independent of inputs

full rationale

The paper proposes an extension to Lakosian physical-design rules for the pybind11 FFI and then applies the authors' prior MLSA toolkit to count compliance across an external sample of 50 GitHub repositories. No equations or fitted parameters are present; the rules are introduced as a new extension rather than derived from the sample, and the compliance count is a direct measurement rather than a prediction that reduces to the same data. The self-citation to MLSA is limited to the analysis tool and is not load-bearing for the central proposal or conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on the effectiveness of the Lakosian methodology for C++ and on the utility of the authors' prior MLSA toolkit; no free parameters, invented entities, or ad-hoc mathematical axioms are introduced.

axioms (2)
  • domain assumption Lakosian physical design rules remain beneficial when extended across language boundaries via an FFI.
    The paper treats this as given when proposing the extension.
  • domain assumption The MLSA toolkit can accurately detect pybind11 usage and call-graph structure.
    Invoked for the measurement step.

pith-pipeline@v0.9.0 · 5769 in / 1245 out tokens · 26436 ms · 2026-05-25T19:54:22.800721+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Grosse-Kunstleve

    Abrahams, D., and R.W. Grosse-Kunstleve. 2003. Building Hybrid Systems with Boost.Python. 14 5. https://www.boost.org/doc/libs/1_69_0/libs/python/doc/ html/article.html

  2. [2]

    Application -only Call Graph Construction

    Ali, K., and Ondrej Lhotak. 2012. “Application -only Call Graph Construction.” ECOOP'12 Proceedings of the 26th European Conf. on Object-Oriented Prog.. Beijing

  3. [3]

    Fast static analysis of C++ virtual function calls

    Bacon, D., and P. Sweeney. 1996. “Fast static analysis of C++ virtual function calls.” 11th ACM SIGPLAN Conf. on OO Prog. Sys., Lang & App

  4. [4]

    Automated support for seamless interoperability in polylingual software systems

    Barrett, D., A. Kaplan, and J. Wileden. 1996. “Automated support for seamless interoperability in polylingual software systems.” 4th ACM SIGSOFT symposium on Foundations of software engineering. New York

  5. [5]

    Lightweight Call-Graph Construction for Multilingual Software Analysis

    Bogar, A.M., D. Lyons, and D. Baird. 2018. “Lightweight Call-Graph Construction for Multilingual Software Analysis.” 13th Int. Conf. Soft. Tech.. Porto, Portugal

  6. [6]

    Preventing injection attacks with syntax embeddings

    Bravenboer, M., E. Dolstra, and E. Visser. 2010. “Preventing injection attacks with syntax embeddings.” Sci. Comput. Program. 75 (7): 473-495

  7. [7]

    Communication -Sensitive Static Dataflow for Parallel Message Passing Applications

    Bronevetsky, G. 2009. “Communication -Sensitive Static Dataflow for Parallel Message Passing Applications.” International Symposium on Code Generation and Optimization. Seattle WA

  8. [8]

    Design of large-scale polylingual systems

    Grechanik, M., D. Batory, and D. Perry. 2004. “Design of large-scale polylingual systems.” 26th Int. Conf. on Software Systems. Edinburgh UK

  9. [9]

    Cross -Language Interoperability in a Multi-Language Runtime

    Grimmer, M., R. Schatz, C. Seaton, T. Wurthinger, and M. Lujan. 2018. “Cross -Language Interoperability in a Multi-Language Runtime.” ACM Trans. on Prog. Languages and Systems (ACM) 40 (2): 8:1-8:43

  10. [10]

    Mutation -Based Fault Localization for Real -World Multilingual Programs

    Hong, S., and et al. 2015. “Mutation -Based Fault Localization for Real -World Multilingual Programs.” 30th IEEE/ACM Int. Conf. on Automated Software Eng

  11. [11]

    Lakos, John. 1996. Large-Scale C++ Software Design. Addison-Wesley

  12. [12]

    HybriDroid: static analysis framework for Android hybrid applications

    Lee, S., J. Doby, and S. Ryu. 2016. “HybriDroid: static analysis framework for Android hybrid applications.” 31st IEEE/ACM International Conference on Automated Software Engineering. Singapore

  13. [13]

    Lightweight Multilingual Software Analysis

    Lyons, D., A. Bogar, and D. Baird. 2017. “Lightweight Multilingual Software Analysis.” 12th Int. Conf. on Software Technologies (ICSoft). Madrid, Spain

  14. [14]

    Lightweight Multilingual Software Analysis

    Lyons, D., A.M. Bogar, and D. Baird. 2018. “Lightweight Multilingual Software Analysis.” In Chall. & Opp. in ICT Research Projects, by J. Filipe. SCITEPRESS

  15. [15]

    On multi - language software development, cross-language links and accompanying tools: a survey of professional software developers

    Mayer, P., M. Kirsch, and M -A. Le. 2017. “On multi - language software development, cross-language links and accompanying tools: a survey of professional software developers.” Journal of Software Engineering Research and Development 5 (1)

  16. [16]

    Multilingual source code analysis: State of the art and challenges

    Mushtak, Z., and G. Rasool. 2015. “Multilingual source code analysis: State of the art and challenges.” Int. Conf. Open Source Sys. & Tech

  17. [17]

    Nielson, and C

    Nielson, F., H.R. Nielson, and C. Hankin. 2005. Principles of Program Analysis. Springer. 2010. Python 2.7 doc.. https://docs.python.org/2.7/

  18. [18]

    Seamless operability between C++11 and Python

    Smirnoff, I. 2017. “Seamless operability between C++11 and Python.” EuroPython Conference. Rimini, Italy

  19. [19]

    Cross-Language Program Analysis and Refactoring

    Strien, D., H. Kratz, and W. Lowe. 2006. “Cross-Language Program Analysis and Refactoring.” 6th Int. Workshop on Source Code Analysis and Manipulation

  20. [20]

    Multilingual Source Code Analysis: A Systematic Literature Review

    Zaigham, M., G. Rasool, and B. Shehzad. 2017. “Multilingual Source Code Analysis: A Systematic Literature Review.” IEEE Access PP (99)