pith. sign in

arxiv: 2606.10290 · v1 · pith:PP6UIVPMnew · submitted 2026-06-09 · 💻 cs.CR · cs.SE

The Linux IOCTL Census: A Source-Derived Database of the Linux Kernel Control-Code Surface

Pith reviewed 2026-06-27 13:06 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords ioctlLinux kernelattack surfacestatic analysisdevice driverscommand codespermission gatessecurity database
0
0 comments X

The pith

A deterministic libclang pass over the Linux kernel source builds a queryable census of 586 ioctl dispatch points, 1289 command codes, 3583 input sinks, and 1298 permission gates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a source-derived inventory of the Linux kernel's ioctl interface, the catch-all device-control mechanism in which drivers accept numeric commands and user buffers that can reach deep into kernel memory. It compiles an allmodconfig kernel with 878 modules and runs one deterministic libclang traversal to extract dispatch entry points, decoded _IOC codes, controlled-input sinks, and permission gates, then marks the ungated subset according to the kernel's own capability rules. This produces a structured, open database that backtests cleanly against 22 recent in-tree CVEs and shares its schema with a Windows counterpart. A sympathetic reader cares because ioctl handlers remain one of the largest and least uniform local attack surfaces, and a central, queryable catalog removes the need to rediscover the same command definitions by hand.

Core claim

An allmodconfig build compiles 878 modules across 169 subtrees. A single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. The census encodes the kernel's documented threat model as a queryable column that separates the capability-ungated surface, an upper bound on unprivileged reach, from the portion placed out of scope by a hard capability gate. The structural tier is released as open data on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems, and the extraction is backtested against 22 recent in-tree ioct

What carries the argument

The deterministic libclang pass that traverses the kernel source to locate ioctl dispatch entry points, decode _IOC command codes, identify controlled-input sinks, and record permission gates.

If this is right

  • The census supplies an upper bound on unprivileged reach by isolating capability-ungated ioctls.
  • Backtesting shows the extraction matches the 22 recent in-tree ioctl CVEs examined.
  • The structural data is released openly with a schema that permits cross-OS queries against the Windows counterpart.
  • Individual drivers can be compared by the counts of sinks and gates they expose.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The census could be fed directly into static checkers that flag handlers lacking length validation on their argument buffers.
  • Re-running the same pass on successive kernel releases would produce a time series of surface growth or shrinkage.
  • Pairing the static census entries with runtime syscall traces could reveal which handlers are actually reachable from unprivileged contexts.

Load-bearing premise

A single deterministic libclang pass on an allmodconfig build will correctly and completely identify all ioctl dispatch entry points, command codes, sinks, and permission gates without parsing errors, missed handlers, or false inclusions from the C source structures.

What would settle it

An in-tree ioctl dispatch point or _IOC command code present in the kernel source that the libclang pass fails to recover, or one of the 22 backtested CVEs that cannot be located in the resulting census.

Figures

Figures reproduced from arXiv: 2606.10290 by Michael J. Bommarito II.

Figure 1
Figure 1. Figure 1: The Extract-Gate-Rank pipeline. Extract recovers dispatch, decoded _IOC codes, controlled-input sinks, and permission gates from kernel source with no fuzzing or symbolic execution. Gate applies the threat-model reachability filter (a necessary-condition upper bound, not a classifier). Rank serves views over the unaudited reachable surface; the LLM enrichment layer is optional and off the critical path. .k… view at source ↗
Figure 2
Figure 2. Figure 2: The resolver recovers the asm-generic _IOC command-code encoding, a 32-bit number split from the top bit down into dir (direction, bits 31:30), size (argument size, 29:16), type (magic byte, 15:8), and nr (command number, 7:0). The worked example is the pinned resolver anchor WDIOC_GETSTATUS = _IOR(’W’, 1, int) = 0x80045701, whose read direction transfers data from the kernel to the user buffer. or handler… view at source ↗
Figure 3
Figure 3. Figure 3: The filter, not a classifier, does the narrowing. Of the 337 ioctl modules the threat-model [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top cross-module shared decoded (type, ordinal, access) triples: the watchdog ’W’- type commands recur across the 31 watchdog drivers (32 counting an ALSA rawmidi driver that collides on the same magic byte), from the v_cross_build view. value, ignoring the argument-size field, which surfaces both driver-class command reuse and magic￾byte collisions. The top recurrences are the watchdog control interface: … view at source ↗
Figure 5
Figure 5. Figure 5: Permission gates by capability. CAP_SYS_ADMIN dominates by an order of magnitude, and it is exactly the capability the kernel threat model treats as out of model, so much of the gated surface sits outside the in-model attack surface. caught two real nondeterminism sources (a non-deterministic group aggregate and a primary-key collapse on handlers registered for multiple file-operations fields), which we fi… view at source ↗
read the original abstract

The ioctl system call is Linux's catch-all device-control interface. A userspace program opens a device node and hands the driver a numeric command code and an argument buffer, and the driver does whatever that code means, whether configuring hardware, reading back state, or moving data into and out of the kernel. Drivers define these commands themselves, by the thousand, and parse their arguments in kernel context, which makes ioctl handlers one of the broadest and least uniform local attack surfaces in the kernel. A handler that trusts an argument length it never validates can read or write kernel memory out of bounds, and the command space is catalogued in no central place. We present the Linux IOCTL Census, a source-derived and queryable inventory of that surface. An allmodconfig build compiles 878 modules across 169 subtrees, and over them a single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. A second pass encodes the kernel's own documented threat model as a queryable column, separating the capability-ungated ioctl surface, an upper bound on unprivileged reach rather than proven reach, from the part a hard capability gate puts out of scope. We backtest the census against 22 recent in-tree ioctl CVEs and release the structural tier as open data, on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents the Linux IOCTL Census, a source-derived database of the Linux kernel's ioctl attack surface. Using an allmodconfig build of 878 modules across 169 subtrees, a single deterministic libclang pass extracts 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. The work encodes the kernel's documented capability-based threat model to separate ungated surfaces from capability-gated ones, backtests the extraction against 22 recent in-tree ioctl CVEs, and releases the structural tier as open data on a schema shared with a companion Windows IOCTL Census.

Significance. If the extraction is shown to be accurate, this provides a valuable, queryable inventory of a broad and nonuniform kernel attack surface that has previously lacked central cataloguing. The open-data release and cross-OS schema compatibility are concrete strengths that enable reproducible security analyses and comparative studies. The deterministic extraction approach and backtesting against known CVEs add empirical grounding, though the absence of reported accuracy metrics limits immediate usability.

major comments (2)
  1. [Abstract] Abstract and extraction description: the headline quantitative results (586 dispatch entry points, 1,289 codes, 3,583 sinks, 1,298 gates) are presented as the output of a single deterministic libclang pass, yet no parsing accuracy metrics, false-positive/negative rates, or handling details for macro-expanded _IO/_IOR definitions, table-driven dispatch, or conditionally compiled paths are provided. This is load-bearing for the central claim that the census recovers a complete and correct surface.
  2. [Backtesting] Backtesting section: the manuscript states that the census was backtested against 22 recent in-tree ioctl CVEs but supplies no details on the test procedure, achieved recall, or how the extraction was validated against the known vulnerable handlers. Without these, the backtest cannot establish overall precision or completeness across the 878 modules.
minor comments (1)
  1. [Abstract] The abstract refers to 'a second pass' that encodes the threat model; the manuscript should clarify whether this is a separate static analysis or a post-processing query on the extracted data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential value of the census as a queryable inventory. We address each major comment below and will incorporate the requested clarifications in a revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and extraction description: the headline quantitative results (586 dispatch entry points, 1,289 codes, 3,583 sinks, 1,298 gates) are presented as the output of a single deterministic libclang pass, yet no parsing accuracy metrics, false-positive/negative rates, or handling details for macro-expanded _IO/_IOR definitions, table-driven dispatch, or conditionally compiled paths are provided. This is load-bearing for the central claim that the census recovers a complete and correct surface.

    Authors: We agree that the manuscript would be strengthened by explicit discussion of parsing accuracy and edge-case handling. The extraction relies on a single deterministic libclang pass over preprocessed source from an allmodconfig build, which by design processes macro expansions for _IO/_IOR and similar definitions. However, the current text does not report quantitative false-positive or false-negative rates or detail handling of table-driven dispatch and conditionally compiled paths. In revision we will add a dedicated subsection describing these aspects, any manual verification performed during development, and observed limitations. revision: yes

  2. Referee: [Backtesting] Backtesting section: the manuscript states that the census was backtested against 22 recent in-tree ioctl CVEs but supplies no details on the test procedure, achieved recall, or how the extraction was validated against the known vulnerable handlers. Without these, the backtest cannot establish overall precision or completeness across the 878 modules.

    Authors: We acknowledge that the backtesting section lacks procedural detail. The 22 CVEs were selected as recent in-tree ioctl vulnerabilities, and each was manually checked to confirm that the corresponding dispatch point and handler appeared in the census output. In revision we will expand this section to describe the selection criteria, the exact validation procedure used for each CVE, and any cases requiring additional context from the CVE report. revision: yes

Circularity Check

0 steps flagged

Direct source extraction with no circular derivation

full rationale

The paper reports counts obtained by running a deterministic libclang pass over an allmodconfig kernel build to locate ioctl dispatch points, decode _IOC codes, identify sinks, and classify permission gates. No equations, fitted parameters, predictions, or first-principles derivations are present; the headline numbers are simply the output of the described extraction applied to the source. The backtest against 22 CVEs is external validation rather than a self-referential step. No self-citation chains or ansatzes are invoked to justify the core results. The work is therefore self-contained as a census and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper adds no new physical or mathematical entities. Its contribution is the extraction and organization of existing kernel code elements. The central assumption is that static analysis faithfully recovers the dispatch surface from source.

axioms (1)
  • domain assumption The Linux kernel source code, when built under allmodconfig, contains the complete set of ioctl definitions that constitute the relevant attack surface.
    The census depends on the build configuration including all relevant modules and the source being treated as the authoritative representation of the surface.

pith-pipeline@v0.9.1-grok · 5807 in / 1345 out tokens · 32472 ms · 2026-06-27T13:06:47.811515+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 3 canonical work pages

  1. [1]

    Rajamani, and Abdullah Ustuner

    Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. Thorough static analysis of device drivers. InProceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), pages 73–85, 2006. Static Driver Verifier (SDV).https: //doi.org/10.11...

  2. [2]

    Bommarito II

    Michael J. Bommarito II. The linux IOCTL census (structural tier). Hugging Face dataset,

  3. [3]

    This work.https://huggingface.co/datasets/mjbommar/linux-ioctl-census

  4. [4]

    Bommarito II

    Michael J. Bommarito II. Needles at scale: LLM-assisted target selection for Windows vulnera- bility research. https://arxiv.org/abs/2606.01364, 2026. Companion work on LLM-assisted target selection. arXiv:2606.01364 [cs.CR]

  5. [5]

    Bommarito II

    Michael J. Bommarito II. The Windows IOCTL census: A corpus-scale, multi-architecture database of the driver control-code surface. https://arxiv.org/abs/2606.07732 , 2026. Companion paper on the shared census schema. arXiv:2606.07732 [cs.SE]

  6. [6]

    Bommarito II

    Michael J. Bommarito II. The windows IOCTL census (structural tier). Hugging Face dataset, 2026.https://huggingface.co/datasets/mjbommar/ioctl-census

  7. [7]

    DIFUZE: Interface aware fuzzing for kernel drivers

    Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang Hao, Christopher Kruegel, and Giovanni Vigna. DIFUZE: Interface aware fuzzing for kernel drivers. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2123–2138, 2017.https://doi.org/10.1145/3133956.3134069

  8. [8]

    syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026

    Google. syzkaller: an unsupervised coverage-guided kernel fuzzer.https://github.com/googl e/syzkaller, 2026. Accessed June 2026. syzbot dashboard:https://syzkaller.appspot.co m/

  9. [9]

    ACSAC ’22, pp

    Rajat Gupta, Lukas Patrick Dresel, Noah Spahn, Giovanni Vigna, Christopher Kruegel, and Taesoo Kim. POPKORN: Popping Windows kernel drivers at scale. InProceedings of the 38th Annual Computer Security Applications Conference (ACSAC), pages 854–868, 2022.https: //doi.org/10.1145/3564625.3564631

  10. [10]

    IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023

    Che-Yu Lin. IOCTLance: Enhanced vulnerability hunting in WDM drivers with symbolic execution and taint analysis, 2023. CODE BLUE 2023; presenter handle zeze-zeze.https: //github.com/zeze-zeze/ioctlance

  11. [11]

    Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. DR. CHECKER: A soundy analysis for Linux kernel drivers. InProceedings of the 26th USENIX Security Symposium, pages 1007–1024, 2017.https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/machiry. 13

  12. [12]

    Get Off the Kernel if You Can’t Drive

    Mickey Shkatov and Jesse Michael. Screwed Drivers: Signed, sealed, delivered, 2019. Eclypsium research, 2019; presented at DEF CON 27 as “Get Off the Kernel if You Can’t Drive”.https: //eclypsium.com/research/screwed-drivers-signed-sealed-delivered/

  13. [13]

    Linux kernel security documentation: threat model and security- bugs process

    The Linux Kernel community. Linux kernel security documentation: threat model and security- bugs process. https://docs.kernel.org/process/ , 2026. Documentation/process/threat- model.rst and security-bugs.rst. As of Linux v7.0-rc7 (the revision this census was built against)

  14. [14]

    Linux uapi: ioctl number encoding

    The Linux Kernel community. Linux uapi: ioctl number encoding. https://www.kernel .org/doc/html/latest/userspace-api/ioctl/ioctl-number.html , 2026. The _IOC direction/type/ordinal/size encoding. A Reproducibility and the CVE corpus Build provenance.The census is built against Linux v7.0-rc7 with clang/libclang 21.1.8, recorded together with theallmodconf...