pith. sign in

arxiv: 2604.10501 · v1 · submitted 2026-04-12 · 💻 cs.CR

MuSimA: A Tool with Multi-modal Input for Generating Bespoke ABAC Datasets

Pith reviewed 2026-05-10 16:19 UTC · model grok-4.3

classification 💻 cs.CR
keywords ABACsynthetic datasetsdataset generation toolmulti-modal inputaccess controlprobability distributionsLLM-assisted specification
0
0 comments X

The pith

MuSimA generates synthetic ABAC datasets matching user-specified attribute value distributions from JSON files or hand-drawn sketches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper notes that while many methods exist for representing and enforcing ABAC policies, few tools support creation of large synthetic datasets for testing them. It fills this gap by presenting MuSimA, a web tool that produces datasets where attribute values follow user-chosen probability distributions. Input can be a full structured JSON or a minimal JSON plus hand-drawn sketches, with an LLM used to read parameters from the sketches. The tool outputs downloadable synthetic data at chosen sizes and complexities, making it available to the research community for ABAC algorithm evaluation.

Core claim

MuSimA is a web-based tool that generates bespoke ABAC datasets with user-specified probability distributions of attribute values. Specifications are accepted either as structured JSON or as minimal JSON combined with hand-drawn distribution sketches, from which a Large Language Model extracts the parameters. The resulting synthetic data can be generated at varying scales and downloaded for use in testing ABAC systems.

What carries the argument

Multi-modal input handler that accepts JSON specifications or hand-drawn sketches and uses an LLM to convert sketches into distribution parameters for ABAC dataset generation.

If this is right

  • ABAC researchers can produce datasets whose attribute statistics exactly match chosen probability distributions.
  • Scalability experiments become feasible by generating data at multiple sizes and policy complexities.
  • The tool supports both precise file-based input and intuitive visual specification of distributions.
  • The generated data is intended to be downloaded and used directly for ABAC method evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of such a generator could produce more consistent benchmark datasets across ABAC papers.
  • The sketch-to-parameter step could be extended to support additional visual cues like histograms or curves.
  • Direct linkage of the output data to policy simulators would allow immediate testing of enforcement correctness.

Load-bearing premise

The LLM reliably converts hand-drawn sketches into accurate distribution parameters and the resulting synthetic data behaves like real ABAC data for system testing.

What would settle it

A side-by-side run of the same ABAC enforcement algorithm on MuSimA-generated data and on a real collected ABAC dataset, where the two produce substantially different performance or policy outcomes.

Figures

Figures reproduced from arXiv: 2604.10501 by India), Karthikeya S. M. Yelisetty (Indian Institute of Technology Kharagpur, Saket Jha (Indian Institute of Technology Kharagpur, Shamik Sural (Indian Institute of Technology Kharagpur, Singabattu Sathya (Indian Institute of Technology Kharagpur.

Figure 1
Figure 1. Figure 1: MuSimA Web Interface • Entity counts: ns = |S|, no = |O|, ne = |E| representing the number of subject, object and environment entities. • Attribute counts: nsa = |As |, noa = |Ao |, nea = |Ae | for subject, object, and environmental attributes, respectively. • Attribute value cardinalities: nsa,i, noa,j , nea,k representing the number of distinct values for each attribute, where for example, nsa,1 denotes … view at source ↗
Figure 2
Figure 2. Figure 2: Example hand-drawn images and corresponding LLM-generated distributions [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Error Distribution for (a) Subject Attribute (SA) (b) Object Attribute (OA) (c) Environment Attribute (EA). [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Expected vs Actual Count for (a) EA 19 (b) EA 16 (c) OA 7 (d) OA 10 (e) SA 5 (f) EA 9 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Recent advances in research on Attribute-based Access Control (ABAC) has led to the development of several ingenious methods for representing and enforcing organizational security policies. However, so far little effort has been spent towards building a tool for generating large-scale synthetic datasets that can be used to test the developed ABAC systems. In this paper, we address this shortcoming by building MuSimA - a web-based tool for generating ABAC datasets with user-specified probability distributions of attribute values. It supports multi-modal input, i.e., users can provide specifications either as a structured JSON file or as a combination of a minimal JSON along with hand-drawn distribution sketches. In the latter case, a Large Language Model is used to automatically extract appropriate distribution parameters from the sketches. The generated synthetic ABAC data matching the input specifications can be downloaded by the user. For studying scalability of algorithms and methods related to ABAC, data can be generated for varying sizes and complexities. We make MuSimA freely available for use by the research community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents MuSimA, a web-based tool for generating synthetic ABAC datasets with user-specified probability distributions over attribute values. It supports multi-modal input via either a full structured JSON specification or a minimal JSON paired with hand-drawn distribution sketches, where an LLM extracts the distribution parameters. The tool allows generation and download of datasets at varying sizes and complexities for evaluating ABAC systems and is made freely available.

Significance. If the LLM reliably interprets sketches and the output data proves realistic and useful, MuSimA could help address the scarcity of customizable ABAC datasets for research. However, the manuscript supplies no implementation details, accuracy metrics, error analysis, statistical fidelity checks, or user studies, so the practical significance cannot be assessed from the provided description alone.

major comments (2)
  1. [Abstract] Abstract: The central claim that MuSimA 'addresses this shortcoming' by supporting multi-modal input with LLM-based sketch interpretation is load-bearing for the paper's contribution, yet the abstract (and manuscript) provides no accuracy metrics, example sketch-to-parameter mappings, error rates, or validation experiments for the LLM component.
  2. [Abstract] Abstract: The statement that 'data can be generated for varying sizes and complexities' to study ABAC scalability is presented without any reported performance measurements, generation times, dataset examples, or comparisons to prior synthetic ABAC data generators.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'minimal JSON along with hand-drawn distribution sketches' would benefit from a brief clarification of the exact interface and how the two inputs are merged before LLM processing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing MuSimA. We address each major comment point by point below, indicating planned revisions where we can strengthen the paper without altering its core contribution as a tool description.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that MuSimA 'addresses this shortcoming' by supporting multi-modal input with LLM-based sketch interpretation is load-bearing for the paper's contribution, yet the abstract (and manuscript) provides no accuracy metrics, example sketch-to-parameter mappings, error rates, or validation experiments for the LLM component.

    Authors: We acknowledge that the LLM-based sketch interpretation is a highlighted feature and that supporting evidence would strengthen the claim. The manuscript is structured as a tool paper focused on design, architecture, and user workflow rather than an empirical evaluation of the LLM. No accuracy metrics or validation experiments were conducted. In revision we will add concrete examples of sketch-to-parameter mappings (with before/after illustrations) and a limitations paragraph discussing LLM reliability. This constitutes a partial revision; full error analysis and user studies remain outside the current scope but can be flagged for future work. revision: partial

  2. Referee: [Abstract] Abstract: The statement that 'data can be generated for varying sizes and complexities' to study ABAC scalability is presented without any reported performance measurements, generation times, dataset examples, or comparisons to prior synthetic ABAC data generators.

    Authors: The statement reflects the tool's parametric design, which permits users to control dataset size and attribute complexity via the input specification. Specific benchmarks were not reported because the primary novelty lies in the multi-modal input mechanism. We will revise the manuscript to include sample generation times for datasets of different sizes, one or two downloadable example datasets, and a brief reference to prior ABAC generators for context. This addresses the request directly. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive tool paper with no derivations or fitted predictions

full rationale

The manuscript presents MuSimA as a web tool for ABAC dataset generation from user-specified distributions (JSON or LLM-processed sketches). No equations, parameter fitting, predictions, or derivation chains appear in the abstract or described content. The contribution is a feature description and availability statement, not a claimed first-principles result or statistical model. Self-citations are absent from the provided text, and the LLM component is presented as an implementation detail without any reduction to fitted inputs or self-referential definitions. This is a standard non-circular engineering/tool paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software tool description paper with no theoretical derivations. No free parameters, axioms, or invented entities are introduced beyond standard web technologies and LLM usage.

pith-pipeline@v0.9.0 · 5531 in / 1158 out tokens · 77320 ms · 2026-05-10T16:19:19.221265+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [2]

    RanSAM: Randomized Search for ABAC Policy Mining

    Nakul Aggarwal and Shamik Sural. “RanSAM: Randomized Search for ABAC Policy Mining”. In:ACM Con- ference on Data and Application Security and Privacy. 2023, pp. 291–293.DOI:10.1145/3577923.3585050. [3]Amazon Employee Access challenge:2014.URL:https : / / www . kaggle . com / competitions / amazon - employee-access-challenge/data. [4]Amazon UCI Dataset. 20...

  2. [3]

    From Static to Dynamic Access Control Policies via Attribute-Based Category Mining

    Anna Bamberger and Maribel Fern ´andez. “From Static to Dynamic Access Control Policies via Attribute-Based Category Mining”. In:33rd International Symposium on Logic-Based Program Synthesis and Transformation. 2023, pp. 188–197.DOI:10.1007/978-3-031-45784-5_12

  3. [4]

    Extraction of Machine Enforceable ABAC Policies from Natural Language Text using LLM Knowledge Distillation

    Thang Bui et al. “ABAC Lab: An Interactive Platform for Attribute-based Access Control Policy Analysis, Tools, and Datasets [Dataset/Tool Paper]”. In:ACM Symposium on Access Control Models and Technologies. 2025, pp. 111–116.DOI:10.1145/3734436.3734441

  4. [5]

    SharAcc: Enhancing scalability and security in Attribute-Based Access Control with sharding-based blockchain and full decentralization

    Yuqing Ding et al. “SharAcc: Enhancing scalability and security in Attribute-Based Access Control with sharding-based blockchain and full decentralization”. In:Computer Networks257 (2025), p. 110992.DOI: 10.1016/j.comnet.2024.110992

  5. [6]

    Attribute-Based Access Control Scheme for Secure Identity Resolution in Prognostics and Health Management

    Yunhua He et al. “Attribute-Based Access Control Scheme for Secure Identity Resolution in Prognostics and Health Management”. In:IEEE Internet of Things Journal11.13 (2024), pp. 23140–23155.DOI:10.1109/ JIOT.2024.3387079

  6. [7]

    Traceable and revocable large universe multi-authority attribute-based access control with resisting key abuse

    Kaiqing Huang. “Traceable and revocable large universe multi-authority attribute-based access control with resisting key abuse”. In:Comput. Networks272 (2025), p. 111694.DOI:10.1016/j.comnet.2025.111694

  7. [8]

    Performance analysis of dynamic ABAC systems using a queuing theoretic frame- work

    Gaurav Madkaikar et al. “Performance analysis of dynamic ABAC systems using a queuing theoretic frame- work”. In:Comput. Secur .154 (2025), p. 104432.DOI:10.1016/j.cose.2025.104432

  8. [9]

    Towards ABAC policy mining from logs with deep learning

    Decebal Mocanu et al. “Towards ABAC policy mining from logs with deep learning”. In:18th International Multiconference on Intelligent Systems. 2015.URL:https : / / pure . tue . nl / ws / files / 9876041 / ABACPolicyMining_author_version.pdf

  9. [10]

    Toward Deep Learning Based Access Control

    Mohammad Nur Nobi et al. “Toward Deep Learning Based Access Control”. In:ACM Conference on Data and Application Security and Privacy. 2022, pp. 143–154.DOI:10.1145/3508398.3511497

  10. [11]

    Tool/Dataset Paper: Realistic ABAC Data Generation using Conditional Tabular GAN

    Ritwik Rai and Shamik Sural. “Tool/Dataset Paper: Realistic ABAC Data Generation using Conditional Tabular GAN”. In:ACM Conference on Data and Application Security and Privacy. 2023, pp. 273–278.DOI:10.1145/ 3577923.3583635

  11. [12]

    ABAC policy mining method based on hierarchical clustering and relationship extraction

    Siyuan Shang et al. “ABAC policy mining method based on hierarchical clustering and relationship extraction”. In:Comput. Secur .139 (2024), p. 103717.DOI:10.1016/j.cose.2024.103717

  12. [13]

    ABAC policy mining method for heterogeneous access control system

    Siyuan Shang et al. “ABAC policy mining method for heterogeneous access control system”. In:J. Supercom- put.81.9 (2025), p. 1065.DOI:10.1007/s11227-025-07539-6

  13. [14]

    Pratik Sonune et al.LMN: A Tool for Generating Machine Enforceable Policies from Natural Language Access Control Rules using LLMs. 2025. arXiv:2502.12460 [cs.CR].URL:https://arxiv.org/abs/2502. 12460

  14. [15]

    IEEE Transactions on Mobile Computing , month = apr, pages =

    Zihao Wang et al. “Attribute-Based Bilateral Access Control With Sanitization and Trust Management for IIoT”. In:IEEE Internet Things J.12.8 (2025), pp. 10818–10833.DOI:10.1109/JIOT.2024.3513454

  15. [16]

    Efficient Registered Attribute Based Access Control With Same Sub-Policies in Mobile Cloud Computing

    Wuwei Weng et al. “Efficient Registered Attribute Based Access Control With Same Sub-Policies in Mobile Cloud Computing”. In:IEEE Transactions on Mobile Computing24.9 (2025), pp. 8441–8453.DOI:10.1109/ TMC.2025.3556279

  16. [17]

    Mining Attribute-Based Access Control Policies

    Zhongyuan Xu and Scott D. Stoller. “Mining Attribute-Based Access Control Policies”. In:IEEE Transactions on Dependable and Secure Computing12.5 (2015), pp. 533–545.DOI:10.1109/TDSC.2014.2369048

  17. [18]

    In: Proc

    Zhongyuan Xu and Scott D. Stoller. “Mining Attribute-Based Access Control Policies from Logs”. In:IFIP WG 11.3 Conference on Data and Applications Security and Privacy. 2014, pp. 276–291.DOI:10.1007/978- 3-662-43936-4_18

  18. [19]

    Extraction of Machine Enforceable ABAC Policies from Natural Language Text using LLM Knowledge Distillation

    Mian Yang et al. “Extraction of Machine Enforceable ABAC Policies from Natural Language Text using LLM Knowledge Distillation”. In:30th ACM SACMAT. 2025, pp. 157–168.DOI:10.1145/3734436.3734447

  19. [20]

    Privacy-Preservation Enhanced and Efficient Attribute-Based Access Control for Smart Health in Cloud-Assisted Internet of Things

    Hui Yin et al. “Privacy-Preservation Enhanced and Efficient Attribute-Based Access Control for Smart Health in Cloud-Assisted Internet of Things”. In:IEEE Internet Things J.12.1 (2025), pp. 894–903.DOI:10.1109/ JIOT.2024.3470891

  20. [21]

    Attribute-Based Access Control With Credible Outsourcing and Collusion-Resistant Revocation Based on Blockchain for Iomt

    Zhaoqian Zhang et al. “Attribute-Based Access Control With Credible Outsourcing and Collusion-Resistant Revocation Based on Blockchain for Iomt”. In:Concurr . Comput. Pract. Exp.37.12-14 (2025), pp. 1–17. 10