pith. sign in

arxiv: 1907.07303 · v2 · pith:5IZR3BKNnew · submitted 2019-07-17 · 💻 cs.DB

Effcient logging and querying for Blockchain-based cross-site genomic dataset access audit

Pith reviewed 2026-05-24 20:18 UTC · model grok-4.3

classification 💻 cs.DB
keywords blockchaingenomic dataaudit logaccess controlrange queryAND querydata sharingaccountability
0
0 comments X

The pith

A blockchain-based log with hierarchical timestamps enables efficient range and AND queries for cross-site genomic dataset access audits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an immutable blockchain ledger can host a lightweight audit trail for genomic data sharing across institutions when augmented with a hierarchical timestamp structure. This structure indexes the timestamp field to accelerate range queries while also supporting complex queries that combine multiple predicates. Implementation in Python3 on competition-supplied test data produced at least a tenfold speedup on range queries, faster retrieval on AND queries, and a 25 percent reduction in storage. The resulting module remains compatible with existing blockchain platforms and preserves the ledger's immutability and security properties. Such a system addresses the accountability requirement that arises when genomic datasets must be shared between separate sites.

Core claim

By layering a hierarchical timestamp structure onto an immutable blockchain ledger, the system supports efficient logging and querying of genomic dataset access records. The structure enables fast range queries on timestamps and complex AND queries containing multiple predicates while retaining the ledger's security, compatibility, and immutability guarantees. Tests on supplied data showed at least an order-of-magnitude improvement in range-query speed, boosted AND-query retrieval, and 25 percent lower storage use.

What carries the argument

Hierarchical timestamp structure layered on the blockchain ledger to index the timestamp field for range and compound queries.

If this is right

  • Range queries on the timestamp field run at least ten times faster than without the structure.
  • Complex AND queries that combine multiple predicates retrieve results more quickly.
  • Overall storage footprint drops by 25 percent relative to the baseline method.
  • The module can be added to existing blockchain platforms without altering their core protocols.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same timestamp hierarchy could be reused for audit logs in other regulated data-sharing settings such as electronic health records.
  • Because the structure sits on top of the ledger, it could be ported to different blockchain implementations with only minor adjustments.
  • Longer-term tests on production-scale genomic access logs would reveal whether query speed and storage gains hold when the number of sites and records grows beyond competition test sizes.

Load-bearing premise

The hierarchical timestamp structure can be added to an immutable blockchain ledger without losing the ledger's security, immutability, or platform compatibility.

What would settle it

Deploy the structure on an actual blockchain instance and observe that range-query latency shows no improvement over a naive full scan or that total storage exceeds the baseline implementation.

read the original abstract

Background: Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. Methods: We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. Results: We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. Conclusion: We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that a blockchain-based logging and querying system, using baseline and enhanced methods with a novel hierarchical timestamp structure, enables efficient range and AND queries on immutable ledgers for cross-site genomic dataset access audits. Implemented in Python and evaluated on iDASH-supplied test data, it reports ≥10× speedup on range queries, improved AND-query performance, 25% storage reduction, and a third-place competition result, concluding that blockchain provides a promising solution for accountable genomic data sharing.

Significance. If the empirical results hold, the work is significant as a practical demonstration of adapting blockchain for efficient audit logging in a high-stakes domain, preserving immutability while adding query performance via the hierarchical structure. The competition placement and concrete performance numbers on supplied data provide external grounding for the efficiency claims in genomic data accountability.

major comments (2)
  1. [Results] Results section: the performance claims (order-of-magnitude range-query gains and 25% storage reduction) rest on competition data but lack explicit baseline implementation details, query workload specifications, or error/variance analysis, which are load-bearing for verifying the central efficiency claim.
  2. [Methods] Methods section: the hierarchical timestamp structure is presented as overcoming immutable-ledger indexing obstacles, but without pseudocode, formal invariants, or analysis of its effect on blockchain security/compatibility properties, the claim that it preserves ledger guarantees while enabling queries cannot be fully assessed.
minor comments (1)
  1. The abstract and title contain minor typographical issues (e.g., 'Effcient' in the provided title) that should be corrected for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will incorporate clarifications and additions in a revised manuscript to strengthen verifiability of the efficiency claims and the hierarchical timestamp design.

read point-by-point responses
  1. Referee: [Results] Results section: the performance claims (order-of-magnitude range-query gains and 25% storage reduction) rest on competition data but lack explicit baseline implementation details, query workload specifications, or error/variance analysis, which are load-bearing for verifying the central efficiency claim.

    Authors: We agree that additional implementation and workload details are required for independent verification. In the revision we will expand the Results section with: (1) explicit description of the baseline method implementation (Python3 code structure and data structures used), (2) the precise query workload specifications supplied by the iDASH organizers (number and types of range and AND queries), and (3) any available run-time statistics or variance from the competition evaluation runs. The reported ≥10× range-query speedup and 25% storage reduction were measured on the organizer-provided test data; we will make these measurement conditions explicit. revision: yes

  2. Referee: [Methods] Methods section: the hierarchical timestamp structure is presented as overcoming immutable-ledger indexing obstacles, but without pseudocode, formal invariants, or analysis of its effect on blockchain security/compatibility properties, the claim that it preserves ledger guarantees while enabling queries cannot be fully assessed.

    Authors: We accept that the current presentation is insufficient for full assessment. In the revised Methods section we will add: (1) pseudocode for the hierarchical timestamp construction and query traversal, (2) the key invariants maintained by the structure (e.g., monotonicity and completeness with respect to the underlying ledger), and (3) a short argument that the structure is an auxiliary indexing layer that does not modify block contents, consensus rules, or cryptographic commitments, thereby preserving the original blockchain's security and compatibility properties. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an engineering system for blockchain-based genomic audit logging, with a baseline method and an enhanced hierarchical-timestamp indexing approach. All load-bearing claims rest on concrete implementation details, Python code, and empirical benchmarks (range-query speedup, storage reduction, AND-query performance) measured against iDASH-supplied test data. No equations, fitted parameters, or predictions are present that reduce by construction to the inputs; standard blockchain immutability is treated as an external property rather than derived internally. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that blockchain ledgers can host an auxiliary indexing structure without violating immutability or security, plus standard data-structure properties for hierarchical timestamps. No free parameters or new entities with independent evidence are introduced.

axioms (1)
  • domain assumption Blockchain platforms provide immutability and transparency sufficient for audit logs while allowing auxiliary indexing structures.
    Invoked when the authors state the system is a light-weight module for existing blockchain platforms.
invented entities (1)
  • hierarchical timestamp structure no independent evidence
    purpose: Enable efficient range queries on the timestamp field despite blockchain immutability.
    New design element presented to overcome indexing obstacles on immutable ledgers.

pith-pipeline@v0.9.0 · 5802 in / 1271 out tokens · 25290 ms · 2026-05-24T20:18:12.527711+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

  1. [1]

    Science 300(5617), 286–290 (2003)

    Collins, F.S., Morgan, M., Patrinos, A.: The human genome project: lessons from large-scale biology. Science 300(5617), 286–290 (2003)

  2. [2]

    Nature 426(6968), 789 (2003)

    Consortium, I.H.: The international HapMap project. Nature 426(6968), 789 (2003)

  3. [3]

    Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad, S., Hasz, R., Walters, G., Garcia, F., Young, N., Foster, B., Moser, M., Karasik, E., Gillard, B., Ramsey, K., Sullivan, S., Bridge, J., Magazine, H., Syron, J., Fleming, J., Siminoff, L., Traino, H., Mosavel, M., Barker, L., Jewell, S., Rohrer, D., Maxim, D., Filkins, D., Harbach, P., Co...

  4. [4]

    Wetterstrand, K.A.: DNA sequencing costs: data from the NHGRI genome sequencing program (GSP) (2013)

  5. [5]

    Journal of the American Medical Informatics Association 20(1), 2–6 (2013)

    Malin, B.A., Emam, K.E., O’Keefe, C.M.: Biomedical data privacy: problems, perspectives, and recent advances. Journal of the American Medical Informatics Association 20(1), 2–6 (2013)

  6. [6]

    Journal of Biomedical Informatics 50, 4–19 (2014)

    Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: A survey of algorithms. Journal of Biomedical Informatics 50, 4–19 (2014)

  7. [7]

    ACM Comput

    Naveed, M., Ayday, E., Clayton, E.W., Fellay, J., Gunter, C.A., Hubaux, J.-P., Malin, B.A., Wang, X.: Privacy in the genomic era. ACM Comput. Surv. 48(1), 6–1644 (2015)

  8. [8]

    In: Security and Privacy Workshops (SPW), 2015 IEEE, pp

    Zyskind, G., Nathan, O., Pentland, A.: Decentralizing privacy: Using blockchain to protect personal data. In: Security and Privacy Workshops (SPW), 2015 IEEE, pp. 180–184 (2015)

  9. [9]

    Enigma: Decentralized Computation Platform with Guaranteed Privacy

    Zyskind, G., Nathan, O., Pentland, A.: Enigma: Decentralized computation platform with guaranteed privacy. arXiv:1506.03471 [cs] (2015)

  10. [10]

    Proceedings on Privacy Enhancing Technologies 2017(4), 232–250 (2017)

    Froelicher, D., Egger, P., Sousa, J.S., Raisaro, J.L., Huang, Z., Mouchet, C., Ford, B., Hubaux, J.-P.: UnLynx: a decentralized system for privacy-conscious data sharing. Proceedings on Privacy Enhancing Technologies 2017(4), 232–250 (2017)

  11. [11]

    3–18 (2017)

    Hackius, N., Petersen, M.: Blockchain in logistics and supply chain : trick or treat? In: Proceedings of the Hamburg International Conference of Logistics (HICL), pp. 3–18 (2017)

  12. [12]

    In: Lecture Notes in Computer Science, pp

    Garc ´ ıa-Ba˜ nuelos, L., Ponomarev, A., Dumas, M., Weber, I.: Optimized execution of business processes on blockchain. In: Lecture Notes in Computer Science, pp. 130–146 (2017)

  13. [13]

    Abeyratne, S.A., Monfared, R.P.: Blockchain ready manufacturing supply chain using distributed ledger (2016)

  14. [14]

    In: Lecture Notes in Computer Science, pp

    Azouvi, S., Al-Bassam, M., Meiklejohn, S.: Who am i? secure identity registration on distributed ledgers. In: Lecture Notes in Computer Science, pp. 373–389 (2017)

  15. [15]

    In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol

    Yasin, A., Liu, L.: An online identity and smart contract management system. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 192–198 (2016)

  16. [16]

    ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks

    Kuo, T.-T., Ohno-Machado, L.: ModelChain: decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks. arXiv:1802.01746 [cs] (2018)

  17. [17]

    Journal of Medical Systems 40(10), 218 (2016)

    Yue, X., Wang, H., Jin, D., Li, M., Jiang, W.: Healthcare data gateways: Found healthcare intelligence on blockchain with novel privacy risk control. Journal of Medical Systems 40(10), 218 (2016)

  18. [18]

    IEEE Access 5, 14757–14767 (2017)

    Xia, Q., Sifah, E.B., Asamoah, K.O., Gao, J., Du, X., Guizani, M.: MeDShare: trust-less medical data sharing among cloud service providers via blockchain. IEEE Access 5, 14757–14767 (2017)

  19. [19]

    In: 2016 2nd International Conference on Open and Big Data (OBD), pp

    Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: MedRec: using blockchain for medical data access and permission management. In: 2016 2nd International Conference on Open and Big Data (OBD), pp. 25–30 (2016)

  20. [20]

    Journal of the International Society for Telemedicine and eHealth 5, 24 (2017)

    Genestier, P., Zouarhi, S., Limeux, P., Excoffier, D., Prola, A., Sandon, S., Temerson, J.-M.: Blockchain for consent management in the ehealth environment: A nugget for privacy and security challenges. Journal of the International Society for Telemedicine and eHealth 5, 24 (2017)

  21. [21]

    Blockchain in Healthcare Today (2018)

    Choudhury, O., Sarker, H., Rudolph, N., Foreman, M., Fay, N., Dhuliawala, M., Sylla, I., Fairoza, N., Das, A.K.: Enforcing human subject regulations using blockchain and smart contracts. Blockchain in Healthcare Today (2018)

  22. [22]

    In: 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), pp

    Li, C., Cao, Y., Hu, Z., Yoshikawa, M.: Blockchain-based bidirectional updates on fine-grained medical data. In: 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), pp. 22–27 (2019)

  23. [23]

    Narayanan, A., Clark, J.: Bitcoin’s academic pedigree. Commun. ACM 60(12), 36–45 (2017)

  24. [24]

    Journal of the American Medical Informatics Association 24(6), 1211–1220 (2017)

    Kuo, T.-T., Kim, H.-E., Ohno-Machado, L.: Blockchain distributed ledger technologies for biomedical and health care applications. Journal of the American Medical Informatics Association 24(6), 1211–1220 (2017)

  25. [25]

    Underwood, S.: Blockchain beyond bitcoin. Commun. ACM 59(11), 15–17 (2016)

  26. [26]

    Financial Innovation 2(1), 26 (2016)

    Sun, J., Yan, J., Zhang, K.Z.K.: Blockchain-based sharing services: What blockchain technology can contribute to smart cities. Financial Innovation 2(1), 26 (2016)

  27. [27]

    W¨ orner, D., von Bomhard, T., Schreier, Y.-P., Bilgeri, D.: The bitcoin ecosystem: Disruption beyond financial services? (2016)

  28. [28]

    In: 2015 IEEE Symposium on Security and Privacy, pp

    Bonneau, J., Miller, A., Clark, J., Narayanan, A., Kroll, J.A., Felten, E.W.: SoK: research perspectives and challenges for bitcoin and cryptocurrencies. In: 2015 IEEE Symposium on Security and Privacy, pp. 104–121 (2015)

  29. [29]

    IEEE Communications Surveys Tutorials 18(3), 2084–2123 (2016)

    Tschorsch, F., Scheuermann, B.: Bitcoin and beyond: A technical survey on decentralized digital currencies. IEEE Communications Surveys Tutorials 18(3), 2084–2123 (2016)

  30. [30]

    Research handbook on digital transformations, 225 (2016)

    Pilkington, M.: Blockchain technology: principles and applications. Research handbook on digital transformations, 225 (2016)

  31. [31]

    In: 2017 IEEE International Congress on Big Data (BigData Congress), pp

    Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology: Architecture, consensus, and future trends. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp. 557–564 (2017)

  32. [32]

    In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol

    Suzuki, S., Murai, J.: Blockchain as an audit-able communication channel. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 516–522 (2017)

  33. [33]

    In: Gelenbe, E., Campegiani, P., Czach´ orski, T., Katsikas, S.K., Komnios, I., Romano, L., Tzovaras, D

    Castaldo, L., Cinque, V.: Blockchain-based logging for the cross-border exchange of eHealth data in europe. In: Gelenbe, E., Campegiani, P., Czach´ orski, T., Katsikas, S.K., Komnios, I., Romano, L., Tzovaras, D. (eds.) Communications in Computer and Information Science, pp. 46–56 (2018)

  34. [34]

    In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp

    Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., Njilla, L.: ProvChain: a blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 468–477 (2017)

  35. [35]

    TKDE (2017)

    Dinh, T.T.A., Liu, R., Zhang, M., Chen, G., Ooi, B.C., Wang, J.: Untangling blockchain: A data processing view of blockchain systems. TKDE (2017)

  36. [36]

    In: SIGMOD, pp

    Dinh, T.T.A., Wang, J., Chen, G., Liu, R., Ooi, B.C., Tan, K.-L.: BLOCKBENCH: a framework for analyzing private blockchains. In: SIGMOD, pp. 1085–1100 (2017)

  37. [37]

    Proceedings of the VLDB Endowment 11(10), 1137–1150 (2018)

    Wang, S., Dinh, T.T.A., Lin, Q., Xie, Z., Zhang, M., Cai, Q., Chen, G., Ooi, B.C., Ruan, P.: Forkbase: an efficient storage engine for blockchain and forkable applications. Proceedings of the VLDB Endowment 11(10), 1137–1150 (2018)

  38. [38]

    In: ICDE, p

    Xu, Z., Han, S., Chen, L.: CUB, a consensus unit-based storage scheme for blockchain system. In: ICDE, p. 12 (2018)

  39. [39]

    iDASH Secure Genome Analysis Competition 2018, GMC Medical Genomics, 2019

  40. [40]

    https: //www.multichain.com/download/MultiChain-White-Paper.pdf Accessed 4 June 2019

    MultiChain Private Blockchain White Paper. https: //www.multichain.com/download/MultiChain-White-Paper.pdf Accessed 4 June 2019

  41. [41]

    In: Financial Cryptography and Data Security (FC), pp

    Croman, K., Decker, C., Eyal, I., Gencer, A.E., Juels, A., Kosba, A., Miller, A., Saxena, P., Shi, E., Sirer, E.G., Song, D., Wattenhofer, R.: On scaling decentralized blockchains. In: Financial Cryptography and Data Security (FC), pp. 106–125 (2016)

  42. [42]

    Studies in Health Technology and Informatics 210, 617–621 (2015)

    Fonseca, M., Karkaletsis, K., Cruz, I.A., Berler, A., Oliveira, I.C.: OpenNCP: a novel framework to foster cross-border e-health services. Studies in Health Technology and Informatics 210, 617–621 (2015)

  43. [43]

    https://bitcoin.org/en/ Accessed 4 June 2019

    Bitcoin. https://bitcoin.org/en/ Accessed 4 June 2019

  44. [44]

    https://www.ethereum.org/ Accessed 4 June 2019

    Ethereum. https://www.ethereum.org/ Accessed 4 June 2019

  45. [45]

    Roselli, D., Anderson, T.E.: Characteristics of file system workloads (1998)

  46. [46]

    https://github.com/DXMarkets/Savoir Accessed 4 June 2019

    A Python Wrapper for Multichain Json-RPC API. https://github.com/DXMarkets/Savoir Accessed 4 June 2019

  47. [47]

    https://www.docker.com/ Accessed 4 June 2019

    Docker. https://www.docker.com/ Accessed 4 June 2019

  48. [48]

    https://github.com/mshuaic/Blockchain_med Accessed 4 June 2019

    Our Code at Github. https://github.com/mshuaic/Blockchain_med Accessed 4 June 2019

  49. [49]

    https://github.com/google/leveldb Accessed 4 June 2019

    LevelDB. https://github.com/google/leveldb Accessed 4 June 2019

  50. [50]

    Page 11 of 11 De Caro, A., Enyeart, D., Ferris, C., Laventman, G., Manevich, Y.: Hyperledger fabric: a distributed operating system for permissioned blockchains

    Androulaki, E., Barger, A., Bortnikov, V., Cachin, C., Christidis, K., Shuaicheng Ma et al. Page 11 of 11 De Caro, A., Enyeart, D., Ferris, C., Laventman, G., Manevich, Y.: Hyperledger fabric: a distributed operating system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference, p. 30 (2018)

  51. [51]

    Hynes, N., Dao, D., Yan, D., Cheng, R., Song, D.: A demonstration of sterling: A privacy-preserving data marketplace. Proc. VLDB Endow. 11(12), 2086–2089 (2018)

  52. [52]

    ACM Trans

    Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10(1), 26–52 (1992). doi:10.1145/146941.146943