The Linux IOCTL Census: A Source-Derived Database of the Linux Kernel Control-Code Surface

Michael J. Bommarito II

arxiv: 2606.10290 · v1 · pith:PP6UIVPMnew · submitted 2026-06-09 · 💻 cs.CR · cs.SE

The Linux IOCTL Census: A Source-Derived Database of the Linux Kernel Control-Code Surface

Michael J. Bommarito II This is my paper

Pith reviewed 2026-06-27 13:06 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords ioctlLinux kernelattack surfacestatic analysisdevice driverscommand codespermission gatessecurity database

0 comments

The pith

A deterministic libclang pass over the Linux kernel source builds a queryable census of 586 ioctl dispatch points, 1289 command codes, 3583 input sinks, and 1298 permission gates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a source-derived inventory of the Linux kernel's ioctl interface, the catch-all device-control mechanism in which drivers accept numeric commands and user buffers that can reach deep into kernel memory. It compiles an allmodconfig kernel with 878 modules and runs one deterministic libclang traversal to extract dispatch entry points, decoded _IOC codes, controlled-input sinks, and permission gates, then marks the ungated subset according to the kernel's own capability rules. This produces a structured, open database that backtests cleanly against 22 recent in-tree CVEs and shares its schema with a Windows counterpart. A sympathetic reader cares because ioctl handlers remain one of the largest and least uniform local attack surfaces, and a central, queryable catalog removes the need to rediscover the same command definitions by hand.

Core claim

An allmodconfig build compiles 878 modules across 169 subtrees. A single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. The census encodes the kernel's documented threat model as a queryable column that separates the capability-ungated surface, an upper bound on unprivileged reach, from the portion placed out of scope by a hard capability gate. The structural tier is released as open data on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems, and the extraction is backtested against 22 recent in-tree ioct

What carries the argument

The deterministic libclang pass that traverses the kernel source to locate ioctl dispatch entry points, decode _IOC command codes, identify controlled-input sinks, and record permission gates.

If this is right

The census supplies an upper bound on unprivileged reach by isolating capability-ungated ioctls.
Backtesting shows the extraction matches the 22 recent in-tree ioctl CVEs examined.
The structural data is released openly with a schema that permits cross-OS queries against the Windows counterpart.
Individual drivers can be compared by the counts of sinks and gates they expose.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The census could be fed directly into static checkers that flag handlers lacking length validation on their argument buffers.
Re-running the same pass on successive kernel releases would produce a time series of surface growth or shrinkage.
Pairing the static census entries with runtime syscall traces could reveal which handlers are actually reachable from unprivileged contexts.

Load-bearing premise

A single deterministic libclang pass on an allmodconfig build will correctly and completely identify all ioctl dispatch entry points, command codes, sinks, and permission gates without parsing errors, missed handlers, or false inclusions from the C source structures.

What would settle it

An in-tree ioctl dispatch point or _IOC command code present in the kernel source that the libclang pass fails to recover, or one of the 22 backtested CVEs that cannot be located in the resulting census.

Figures

Figures reproduced from arXiv: 2606.10290 by Michael J. Bommarito II.

**Figure 1.** Figure 1: The Extract-Gate-Rank pipeline. Extract recovers dispatch, decoded _IOC codes, controlled-input sinks, and permission gates from kernel source with no fuzzing or symbolic execution. Gate applies the threat-model reachability filter (a necessary-condition upper bound, not a classifier). Rank serves views over the unaudited reachable surface; the LLM enrichment layer is optional and off the critical path. .k… view at source ↗

**Figure 2.** Figure 2: The resolver recovers the asm-generic _IOC command-code encoding, a 32-bit number split from the top bit down into dir (direction, bits 31:30), size (argument size, 29:16), type (magic byte, 15:8), and nr (command number, 7:0). The worked example is the pinned resolver anchor WDIOC_GETSTATUS = _IOR(’W’, 1, int) = 0x80045701, whose read direction transfers data from the kernel to the user buffer. or handler… view at source ↗

**Figure 3.** Figure 3: The filter, not a classifier, does the narrowing. Of the 337 ioctl modules the threat-model [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Top cross-module shared decoded (type, ordinal, access) triples: the watchdog ’W’- type commands recur across the 31 watchdog drivers (32 counting an ALSA rawmidi driver that collides on the same magic byte), from the v_cross_build view. value, ignoring the argument-size field, which surfaces both driver-class command reuse and magicbyte collisions. The top recurrences are the watchdog control interface: … view at source ↗

**Figure 5.** Figure 5: Permission gates by capability. CAP_SYS_ADMIN dominates by an order of magnitude, and it is exactly the capability the kernel threat model treats as out of model, so much of the gated surface sits outside the in-model attack surface. caught two real nondeterminism sources (a non-deterministic group aggregate and a primary-key collapse on handlers registered for multiple file-operations fields), which we fi… view at source ↗

read the original abstract

The ioctl system call is Linux's catch-all device-control interface. A userspace program opens a device node and hands the driver a numeric command code and an argument buffer, and the driver does whatever that code means, whether configuring hardware, reading back state, or moving data into and out of the kernel. Drivers define these commands themselves, by the thousand, and parse their arguments in kernel context, which makes ioctl handlers one of the broadest and least uniform local attack surfaces in the kernel. A handler that trusts an argument length it never validates can read or write kernel memory out of bounds, and the command space is catalogued in no central place. We present the Linux IOCTL Census, a source-derived and queryable inventory of that surface. An allmodconfig build compiles 878 modules across 169 subtrees, and over them a single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. A second pass encodes the kernel's own documented threat model as a queryable column, separating the capability-ungated ioctl surface, an upper bound on unprivileged reach rather than proven reach, from the part a hard capability gate puts out of scope. We backtest the census against 22 recent in-tree ioctl CVEs and release the structural tier as open data, on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a new source-derived ioctl database from the kernel that organizes a messy attack surface into queryable form, but the extraction accuracy lacks detailed validation.

read the letter

The main thing here is a new inventory of Linux ioctl handlers pulled straight from the source with libclang. It covers an allmodconfig build of 878 modules and reports 586 dispatch points, 1,289 command codes, 3,583 sinks, and 1,298 permission gates, plus a split between capability-gated and ungated surfaces. The shared schema with the Windows census is a practical addition.

What the paper does well is turn scattered driver code into something structured and reusable. The backtest against 22 CVEs shows the extraction can surface real issues, and releasing the data openly lets others query it directly. The deterministic approach avoids any fitting or invented parameters, which keeps the work grounded.

The soft spot is the parsing step itself. The counts come from one libclang pass, but the abstract gives no error rates, no discussion of missed handlers from macros or table-driven dispatch, and no checks for false inclusions across the 169 subtrees. The CVE backtest helps with recall on known bugs but does not establish overall precision or completeness, so the headline numbers carry some uncertainty until more validation appears.

This is for kernel security researchers and tool builders who need a starting inventory of the ioctl surface. A reader who wants the raw data or cross-OS queries will get direct value from the release.

It deserves peer review. The contribution is concrete and the data release is useful even if the method section needs more on accuracy.

Referee Report

2 major / 1 minor

Summary. The paper presents the Linux IOCTL Census, a source-derived database of the Linux kernel's ioctl attack surface. Using an allmodconfig build of 878 modules across 169 subtrees, a single deterministic libclang pass extracts 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. The work encodes the kernel's documented capability-based threat model to separate ungated surfaces from capability-gated ones, backtests the extraction against 22 recent in-tree ioctl CVEs, and releases the structural tier as open data on a schema shared with a companion Windows IOCTL Census.

Significance. If the extraction is shown to be accurate, this provides a valuable, queryable inventory of a broad and nonuniform kernel attack surface that has previously lacked central cataloguing. The open-data release and cross-OS schema compatibility are concrete strengths that enable reproducible security analyses and comparative studies. The deterministic extraction approach and backtesting against known CVEs add empirical grounding, though the absence of reported accuracy metrics limits immediate usability.

major comments (2)

[Abstract] Abstract and extraction description: the headline quantitative results (586 dispatch entry points, 1,289 codes, 3,583 sinks, 1,298 gates) are presented as the output of a single deterministic libclang pass, yet no parsing accuracy metrics, false-positive/negative rates, or handling details for macro-expanded _IO/_IOR definitions, table-driven dispatch, or conditionally compiled paths are provided. This is load-bearing for the central claim that the census recovers a complete and correct surface.
[Backtesting] Backtesting section: the manuscript states that the census was backtested against 22 recent in-tree ioctl CVEs but supplies no details on the test procedure, achieved recall, or how the extraction was validated against the known vulnerable handlers. Without these, the backtest cannot establish overall precision or completeness across the 878 modules.

minor comments (1)

[Abstract] The abstract refers to 'a second pass' that encodes the threat model; the manuscript should clarify whether this is a separate static analysis or a post-processing query on the extracted data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential value of the census as a queryable inventory. We address each major comment below and will incorporate the requested clarifications in a revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and extraction description: the headline quantitative results (586 dispatch entry points, 1,289 codes, 3,583 sinks, 1,298 gates) are presented as the output of a single deterministic libclang pass, yet no parsing accuracy metrics, false-positive/negative rates, or handling details for macro-expanded _IO/_IOR definitions, table-driven dispatch, or conditionally compiled paths are provided. This is load-bearing for the central claim that the census recovers a complete and correct surface.

Authors: We agree that the manuscript would be strengthened by explicit discussion of parsing accuracy and edge-case handling. The extraction relies on a single deterministic libclang pass over preprocessed source from an allmodconfig build, which by design processes macro expansions for _IO/_IOR and similar definitions. However, the current text does not report quantitative false-positive or false-negative rates or detail handling of table-driven dispatch and conditionally compiled paths. In revision we will add a dedicated subsection describing these aspects, any manual verification performed during development, and observed limitations. revision: yes
Referee: [Backtesting] Backtesting section: the manuscript states that the census was backtested against 22 recent in-tree ioctl CVEs but supplies no details on the test procedure, achieved recall, or how the extraction was validated against the known vulnerable handlers. Without these, the backtest cannot establish overall precision or completeness across the 878 modules.

Authors: We acknowledge that the backtesting section lacks procedural detail. The 22 CVEs were selected as recent in-tree ioctl vulnerabilities, and each was manually checked to confirm that the corresponding dispatch point and handler appeared in the census output. In revision we will expand this section to describe the selection criteria, the exact validation procedure used for each CVE, and any cases requiring additional context from the CVE report. revision: yes

Circularity Check

0 steps flagged

Direct source extraction with no circular derivation

full rationale

The paper reports counts obtained by running a deterministic libclang pass over an allmodconfig kernel build to locate ioctl dispatch points, decode _IOC codes, identify sinks, and classify permission gates. No equations, fitted parameters, predictions, or first-principles derivations are present; the headline numbers are simply the output of the described extraction applied to the source. The backtest against 22 CVEs is external validation rather than a self-referential step. No self-citation chains or ansatzes are invoked to justify the core results. The work is therefore self-contained as a census and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper adds no new physical or mathematical entities. Its contribution is the extraction and organization of existing kernel code elements. The central assumption is that static analysis faithfully recovers the dispatch surface from source.

axioms (1)

domain assumption The Linux kernel source code, when built under allmodconfig, contains the complete set of ioctl definitions that constitute the relevant attack surface.
The census depends on the build configuration including all relevant modules and the source being treated as the authoritative representation of the surface.

pith-pipeline@v0.9.1-grok · 5807 in / 1345 out tokens · 32472 ms · 2026-06-27T13:06:47.811515+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 3 canonical work pages

[1]

Rajamani, and Abdullah Ustuner

Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. Thorough static analysis of device drivers. InProceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), pages 73–85, 2006. Static Driver Verifier (SDV).https: //doi.org/10.11...

work page doi:10.1145/1217935.1217943 2006
[2]

Bommarito II

Michael J. Bommarito II. The linux IOCTL census (structural tier). Hugging Face dataset,
[3]

This work.https://huggingface.co/datasets/mjbommar/linux-ioctl-census
[4]

Bommarito II

Michael J. Bommarito II. Needles at scale: LLM-assisted target selection for Windows vulnera- bility research. https://arxiv.org/abs/2606.01364, 2026. Companion work on LLM-assisted target selection. arXiv:2606.01364 [cs.CR]

Pith/arXiv arXiv 2026
[5]

Bommarito II

Michael J. Bommarito II. The Windows IOCTL census: A corpus-scale, multi-architecture database of the driver control-code surface. https://arxiv.org/abs/2606.07732 , 2026. Companion paper on the shared census schema. arXiv:2606.07732 [cs.SE]

Pith/arXiv arXiv 2026
[6]

Bommarito II

Michael J. Bommarito II. The windows IOCTL census (structural tier). Hugging Face dataset, 2026.https://huggingface.co/datasets/mjbommar/ioctl-census

2026
[7]

DIFUZE: Interface aware fuzzing for kernel drivers

Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang Hao, Christopher Kruegel, and Giovanni Vigna. DIFUZE: Interface aware fuzzing for kernel drivers. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2123–2138, 2017.https://doi.org/10.1145/3133956.3134069

work page doi:10.1145/3133956.3134069 2017
[8]

syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026

Google. syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026. Accessed June 2026. syzbot dashboard:https://syzkaller.appspot.co m/

2026
[9]

ACSAC ’22, pp

Rajat Gupta, Lukas Patrick Dresel, Noah Spahn, Giovanni Vigna, Christopher Kruegel, and Taesoo Kim. POPKORN: Popping Windows kernel drivers at scale. InProceedings of the 38th Annual Computer Security Applications Conference (ACSAC), pages 854–868, 2022.https: //doi.org/10.1145/3564625.3564631

work page doi:10.1145/3564625.3564631 2022
[10]

IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023

Che-Yu Lin. IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023. CODE BLUE 2023; presenter handle zeze-zeze.https: //github.com/zeze-zeze/ioctlance

2023
[11]

Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. DR. CHECKER: A soundy analysis for Linux kernel drivers. InProceedings of the 26th USENIX Security Symposium, pages 1007–1024, 2017.https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/machiry. 13

2017
[12]

Get Off the Kernel if You Can’t Drive

Mickey Shkatov and Jesse Michael. Screwed Drivers: Signed, sealed, delivered, 2019. Eclypsium research, 2019; presented at DEF CON 27 as “Get Off the Kernel if You Can’t Drive”.https: //eclypsium.com/research/screwed-drivers-signed-sealed-delivered/

2019
[13]

Linux kernel security documentation: threat model and security- bugs process

The Linux Kernel community. Linux kernel security documentation: threat model and security- bugs process. https://docs.kernel.org/process/ , 2026. Documentation/process/threat- model.rst and security-bugs.rst. As of Linux v7.0-rc7 (the revision this census was built against)

2026
[14]

Linux uapi: ioctl number encoding

The Linux Kernel community. Linux uapi: ioctl number encoding. https://www.kernel .org/doc/html/latest/userspace-api/ioctl/ioctl-number.html , 2026. The _IOC direction/type/ordinal/size encoding. A Reproducibility and the CVE corpus Build provenance.The census is built against Linux v7.0-rc7 with clang/libclang 21.1.8, recorded together with theallmodconf...

2026

[1] [1]

Rajamani, and Abdullah Ustuner

Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. Thorough static analysis of device drivers. InProceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), pages 73–85, 2006. Static Driver Verifier (SDV).https: //doi.org/10.11...

work page doi:10.1145/1217935.1217943 2006

[2] [2]

Bommarito II

Michael J. Bommarito II. The linux IOCTL census (structural tier). Hugging Face dataset,

[3] [3]

This work.https://huggingface.co/datasets/mjbommar/linux-ioctl-census

[4] [4]

Bommarito II

Michael J. Bommarito II. Needles at scale: LLM-assisted target selection for Windows vulnera- bility research. https://arxiv.org/abs/2606.01364, 2026. Companion work on LLM-assisted target selection. arXiv:2606.01364 [cs.CR]

Pith/arXiv arXiv 2026

[5] [5]

Bommarito II

Michael J. Bommarito II. The Windows IOCTL census: A corpus-scale, multi-architecture database of the driver control-code surface. https://arxiv.org/abs/2606.07732 , 2026. Companion paper on the shared census schema. arXiv:2606.07732 [cs.SE]

Pith/arXiv arXiv 2026

[6] [6]

Bommarito II

Michael J. Bommarito II. The windows IOCTL census (structural tier). Hugging Face dataset, 2026.https://huggingface.co/datasets/mjbommar/ioctl-census

2026

[7] [7]

DIFUZE: Interface aware fuzzing for kernel drivers

Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang Hao, Christopher Kruegel, and Giovanni Vigna. DIFUZE: Interface aware fuzzing for kernel drivers. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2123–2138, 2017.https://doi.org/10.1145/3133956.3134069

work page doi:10.1145/3133956.3134069 2017

[8] [8]

syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026

Google. syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026. Accessed June 2026. syzbot dashboard:https://syzkaller.appspot.co m/

2026

[9] [9]

ACSAC ’22, pp

Rajat Gupta, Lukas Patrick Dresel, Noah Spahn, Giovanni Vigna, Christopher Kruegel, and Taesoo Kim. POPKORN: Popping Windows kernel drivers at scale. InProceedings of the 38th Annual Computer Security Applications Conference (ACSAC), pages 854–868, 2022.https: //doi.org/10.1145/3564625.3564631

work page doi:10.1145/3564625.3564631 2022

[10] [10]

IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023

Che-Yu Lin. IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023. CODE BLUE 2023; presenter handle zeze-zeze.https: //github.com/zeze-zeze/ioctlance

2023

[11] [11]

Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. DR. CHECKER: A soundy analysis for Linux kernel drivers. InProceedings of the 26th USENIX Security Symposium, pages 1007–1024, 2017.https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/machiry. 13

2017

[12] [12]

Get Off the Kernel if You Can’t Drive

Mickey Shkatov and Jesse Michael. Screwed Drivers: Signed, sealed, delivered, 2019. Eclypsium research, 2019; presented at DEF CON 27 as “Get Off the Kernel if You Can’t Drive”.https: //eclypsium.com/research/screwed-drivers-signed-sealed-delivered/

2019

[13] [13]

Linux kernel security documentation: threat model and security- bugs process

The Linux Kernel community. Linux kernel security documentation: threat model and security- bugs process. https://docs.kernel.org/process/ , 2026. Documentation/process/threat- model.rst and security-bugs.rst. As of Linux v7.0-rc7 (the revision this census was built against)

2026

[14] [14]

Linux uapi: ioctl number encoding

The Linux Kernel community. Linux uapi: ioctl number encoding. https://www.kernel .org/doc/html/latest/userspace-api/ioctl/ioctl-number.html , 2026. The _IOC direction/type/ordinal/size encoding. A Reproducibility and the CVE corpus Build provenance.The census is built against Linux v7.0-rc7 with clang/libclang 21.1.8, recorded together with theallmodconf...

2026