The Linux IOCTL Census: A Source-Derived Database of the Linux Kernel Control-Code Surface
Pith reviewed 2026-06-27 13:06 UTC · model grok-4.3
The pith
A deterministic libclang pass over the Linux kernel source builds a queryable census of 586 ioctl dispatch points, 1289 command codes, 3583 input sinks, and 1298 permission gates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An allmodconfig build compiles 878 modules across 169 subtrees. A single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. The census encodes the kernel's documented threat model as a queryable column that separates the capability-ungated surface, an upper bound on unprivileged reach, from the portion placed out of scope by a hard capability gate. The structural tier is released as open data on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems, and the extraction is backtested against 22 recent in-tree ioct
What carries the argument
The deterministic libclang pass that traverses the kernel source to locate ioctl dispatch entry points, decode _IOC command codes, identify controlled-input sinks, and record permission gates.
If this is right
- The census supplies an upper bound on unprivileged reach by isolating capability-ungated ioctls.
- Backtesting shows the extraction matches the 22 recent in-tree ioctl CVEs examined.
- The structural data is released openly with a schema that permits cross-OS queries against the Windows counterpart.
- Individual drivers can be compared by the counts of sinks and gates they expose.
Where Pith is reading between the lines
- The census could be fed directly into static checkers that flag handlers lacking length validation on their argument buffers.
- Re-running the same pass on successive kernel releases would produce a time series of surface growth or shrinkage.
- Pairing the static census entries with runtime syscall traces could reveal which handlers are actually reachable from unprivileged contexts.
Load-bearing premise
A single deterministic libclang pass on an allmodconfig build will correctly and completely identify all ioctl dispatch entry points, command codes, sinks, and permission gates without parsing errors, missed handlers, or false inclusions from the C source structures.
What would settle it
An in-tree ioctl dispatch point or _IOC command code present in the kernel source that the libclang pass fails to recover, or one of the 22 backtested CVEs that cannot be located in the resulting census.
Figures
read the original abstract
The ioctl system call is Linux's catch-all device-control interface. A userspace program opens a device node and hands the driver a numeric command code and an argument buffer, and the driver does whatever that code means, whether configuring hardware, reading back state, or moving data into and out of the kernel. Drivers define these commands themselves, by the thousand, and parse their arguments in kernel context, which makes ioctl handlers one of the broadest and least uniform local attack surfaces in the kernel. A handler that trusts an argument length it never validates can read or write kernel memory out of bounds, and the command space is catalogued in no central place. We present the Linux IOCTL Census, a source-derived and queryable inventory of that surface. An allmodconfig build compiles 878 modules across 169 subtrees, and over them a single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. A second pass encodes the kernel's own documented threat model as a queryable column, separating the capability-ungated ioctl surface, an upper bound on unprivileged reach rather than proven reach, from the part a hard capability gate puts out of scope. We backtest the census against 22 recent in-tree ioctl CVEs and release the structural tier as open data, on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the Linux IOCTL Census, a source-derived database of the Linux kernel's ioctl attack surface. Using an allmodconfig build of 878 modules across 169 subtrees, a single deterministic libclang pass extracts 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. The work encodes the kernel's documented capability-based threat model to separate ungated surfaces from capability-gated ones, backtests the extraction against 22 recent in-tree ioctl CVEs, and releases the structural tier as open data on a schema shared with a companion Windows IOCTL Census.
Significance. If the extraction is shown to be accurate, this provides a valuable, queryable inventory of a broad and nonuniform kernel attack surface that has previously lacked central cataloguing. The open-data release and cross-OS schema compatibility are concrete strengths that enable reproducible security analyses and comparative studies. The deterministic extraction approach and backtesting against known CVEs add empirical grounding, though the absence of reported accuracy metrics limits immediate usability.
major comments (2)
- [Abstract] Abstract and extraction description: the headline quantitative results (586 dispatch entry points, 1,289 codes, 3,583 sinks, 1,298 gates) are presented as the output of a single deterministic libclang pass, yet no parsing accuracy metrics, false-positive/negative rates, or handling details for macro-expanded _IO/_IOR definitions, table-driven dispatch, or conditionally compiled paths are provided. This is load-bearing for the central claim that the census recovers a complete and correct surface.
- [Backtesting] Backtesting section: the manuscript states that the census was backtested against 22 recent in-tree ioctl CVEs but supplies no details on the test procedure, achieved recall, or how the extraction was validated against the known vulnerable handlers. Without these, the backtest cannot establish overall precision or completeness across the 878 modules.
minor comments (1)
- [Abstract] The abstract refers to 'a second pass' that encodes the threat model; the manuscript should clarify whether this is a separate static analysis or a post-processing query on the extracted data.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the potential value of the census as a queryable inventory. We address each major comment below and will incorporate the requested clarifications in a revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and extraction description: the headline quantitative results (586 dispatch entry points, 1,289 codes, 3,583 sinks, 1,298 gates) are presented as the output of a single deterministic libclang pass, yet no parsing accuracy metrics, false-positive/negative rates, or handling details for macro-expanded _IO/_IOR definitions, table-driven dispatch, or conditionally compiled paths are provided. This is load-bearing for the central claim that the census recovers a complete and correct surface.
Authors: We agree that the manuscript would be strengthened by explicit discussion of parsing accuracy and edge-case handling. The extraction relies on a single deterministic libclang pass over preprocessed source from an allmodconfig build, which by design processes macro expansions for _IO/_IOR and similar definitions. However, the current text does not report quantitative false-positive or false-negative rates or detail handling of table-driven dispatch and conditionally compiled paths. In revision we will add a dedicated subsection describing these aspects, any manual verification performed during development, and observed limitations. revision: yes
-
Referee: [Backtesting] Backtesting section: the manuscript states that the census was backtested against 22 recent in-tree ioctl CVEs but supplies no details on the test procedure, achieved recall, or how the extraction was validated against the known vulnerable handlers. Without these, the backtest cannot establish overall precision or completeness across the 878 modules.
Authors: We acknowledge that the backtesting section lacks procedural detail. The 22 CVEs were selected as recent in-tree ioctl vulnerabilities, and each was manually checked to confirm that the corresponding dispatch point and handler appeared in the census output. In revision we will expand this section to describe the selection criteria, the exact validation procedure used for each CVE, and any cases requiring additional context from the CVE report. revision: yes
Circularity Check
Direct source extraction with no circular derivation
full rationale
The paper reports counts obtained by running a deterministic libclang pass over an allmodconfig kernel build to locate ioctl dispatch points, decode _IOC codes, identify sinks, and classify permission gates. No equations, fitted parameters, predictions, or first-principles derivations are present; the headline numbers are simply the output of the described extraction applied to the source. The backtest against 22 CVEs is external validation rather than a self-referential step. No self-citation chains or ansatzes are invoked to justify the core results. The work is therefore self-contained as a census and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Linux kernel source code, when built under allmodconfig, contains the complete set of ioctl definitions that constitute the relevant attack surface.
Reference graph
Works this paper leans on
-
[1]
Rajamani, and Abdullah Ustuner
Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. Thorough static analysis of device drivers. InProceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), pages 73–85, 2006. Static Driver Verifier (SDV).https: //doi.org/10.11...
-
[2]
Bommarito II
Michael J. Bommarito II. The linux IOCTL census (structural tier). Hugging Face dataset,
-
[3]
This work.https://huggingface.co/datasets/mjbommar/linux-ioctl-census
-
[4]
Michael J. Bommarito II. Needles at scale: LLM-assisted target selection for Windows vulnera- bility research. https://arxiv.org/abs/2606.01364, 2026. Companion work on LLM-assisted target selection. arXiv:2606.01364 [cs.CR]
Pith/arXiv arXiv 2026
-
[5]
Michael J. Bommarito II. The Windows IOCTL census: A corpus-scale, multi-architecture database of the driver control-code surface. https://arxiv.org/abs/2606.07732 , 2026. Companion paper on the shared census schema. arXiv:2606.07732 [cs.SE]
Pith/arXiv arXiv 2026
-
[6]
Bommarito II
Michael J. Bommarito II. The windows IOCTL census (structural tier). Hugging Face dataset, 2026.https://huggingface.co/datasets/mjbommar/ioctl-census
2026
-
[7]
DIFUZE: Interface aware fuzzing for kernel drivers
Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang Hao, Christopher Kruegel, and Giovanni Vigna. DIFUZE: Interface aware fuzzing for kernel drivers. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2123–2138, 2017.https://doi.org/10.1145/3133956.3134069
-
[8]
syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026
Google. syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026. Accessed June 2026. syzbot dashboard:https://syzkaller.appspot.co m/
2026
-
[9]
Rajat Gupta, Lukas Patrick Dresel, Noah Spahn, Giovanni Vigna, Christopher Kruegel, and Taesoo Kim. POPKORN: Popping Windows kernel drivers at scale. InProceedings of the 38th Annual Computer Security Applications Conference (ACSAC), pages 854–868, 2022.https: //doi.org/10.1145/3564625.3564631
-
[10]
IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023
Che-Yu Lin. IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023. CODE BLUE 2023; presenter handle zeze-zeze.https: //github.com/zeze-zeze/ioctlance
2023
-
[11]
Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. DR. CHECKER: A soundy analysis for Linux kernel drivers. InProceedings of the 26th USENIX Security Symposium, pages 1007–1024, 2017.https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/machiry. 13
2017
-
[12]
Get Off the Kernel if You Can’t Drive
Mickey Shkatov and Jesse Michael. Screwed Drivers: Signed, sealed, delivered, 2019. Eclypsium research, 2019; presented at DEF CON 27 as “Get Off the Kernel if You Can’t Drive”.https: //eclypsium.com/research/screwed-drivers-signed-sealed-delivered/
2019
-
[13]
Linux kernel security documentation: threat model and security- bugs process
The Linux Kernel community. Linux kernel security documentation: threat model and security- bugs process. https://docs.kernel.org/process/ , 2026. Documentation/process/threat- model.rst and security-bugs.rst. As of Linux v7.0-rc7 (the revision this census was built against)
2026
-
[14]
Linux uapi: ioctl number encoding
The Linux Kernel community. Linux uapi: ioctl number encoding. https://www.kernel .org/doc/html/latest/userspace-api/ioctl/ioctl-number.html , 2026. The _IOC direction/type/ordinal/size encoding. A Reproducibility and the CVE corpus Build provenance.The census is built against Linux v7.0-rc7 with clang/libclang 21.1.8, recorded together with theallmodconf...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.