Learned Static Function Data Structures
Pith reviewed 2026-05-21 19:53 UTC · model grok-4.3
The pith
Learned static functions use machine learning to predict per-key value probabilities and build shorter prefix codes, breaking the zero-order entropy barrier for point queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a machine-learning model can capture non-trivial correlations between keys and values, allowing the construction of a key-specific prefix code whose expected length is shorter than the zero-order entropy of the value sequence; these compact codewords are then stored in a conventional static function data structure that supports point queries without materializing the key set.
What carries the argument
Per-key prefix code derived from a machine-learning model's predicted probability distribution over values, stored inside a classic static function.
If this is right
- Space usage can fall below the zero-order entropy bound while point queries remain supported.
- Savings scale with the strength of correlations the model can learn.
- The approach works for any static function that can store the derived codewords.
- Real-world data sets with natural key-value dependencies yield up to 10x compression; synthetic data with engineered correlations yield up to 1000x compression.
Where Pith is reading between the lines
- The same prediction-plus-prefix-code idea could be layered on top of existing compressed static functions to obtain further gains.
- If the model is cheap to evaluate, the method might extend to semi-static or slowly changing data sets without rebuilding from scratch.
- Applications that already use static functions for dictionaries or Bloom-filter alternatives could adopt this technique to reduce memory footprint.
Load-bearing premise
The machine learning model must capture non-trivial correlations between keys and values so that the per-key prefix codes are shorter on average than any single global code.
What would settle it
Run the construction on a data set in which keys and values are generated independently; if measured space usage remains strictly above the zero-order entropy of the values, the claimed improvement does not occur.
Figures
read the original abstract
We consider the task of constructing a data structure for associating a static set of keys with values, while allowing arbitrary output values for queries involving keys outside the set. Compared to hash tables, these so-called static function data structures do not need to store the key set and thus use significantly less memory. Several techniques are known, with compressed static functions approaching the zero-order empirical entropy of the value sequence. In this paper, we introduce learned static functions, which use machine learning to capture correlations between keys and values. For each key, a model predicts a probability distribution over the values, from which we derive a key-specific prefix code to compactly encode the true value. The resulting codeword is stored in a classic static function data structure. This design allows learned static functions to break the zero-order entropy barrier while still supporting point queries. Our experiments show substantial space savings: up to one order of magnitude on real data, and up to three orders of magnitude on synthetic data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces learned static functions for static key-value associations that support point queries. An ML model predicts a per-key distribution over values; a key-specific prefix code is derived from this distribution and the codeword is stored in a standard static-function data structure. The construction is claimed to break the zero-order entropy barrier of the value sequence when the model captures non-trivial key-value correlations. Experiments report concrete space savings of up to 10× on real data and up to three orders of magnitude on synthetic data.
Significance. If the central claim holds, the result is significant for the data-structures community: it supplies a concrete, implementable method that integrates standard ML training with existing static-function primitives to obtain compression strictly below H0 while preserving O(1) queries. The approach is not circular (savings are measured against external baselines) and the reported ratios on both real and synthetic instances provide direct, falsifiable evidence that the inequality can be realized in practice.
major comments (2)
- [Experiments] Experiments section: the total-space figures must explicitly include the size of the trained ML model (plus any auxiliary structures) in addition to the static-function storage of the codewords; without this accounting the claim that the construction breaks the zero-order entropy barrier cannot be verified.
- [§4] §4 (or equivalent experimental subsection): the paper should state the experimental protocol for instance selection and report whether the quoted savings ratios come from a pre-specified set of runs or were obtained after observing the data; inclusion of standard error bars or multiple independent trials would directly address this.
minor comments (2)
- [Abstract] Abstract: the phrases 'up to one order of magnitude' and 'up to three orders of magnitude' would be clearer if the paper distinguished maximum observed savings from average savings across the test suite.
- [Construction] Notation: the manuscript should adopt a uniform symbol for the learned distribution (e.g., p̂_v|k) and for the resulting code length to avoid ambiguity when comparing against H0.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on the experimental reporting. We address each major comment below and will incorporate the requested clarifications in a revised version of the manuscript.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the total-space figures must explicitly include the size of the trained ML model (plus any auxiliary structures) in addition to the static-function storage of the codewords; without this accounting the claim that the construction breaks the zero-order entropy barrier cannot be verified.
Authors: We agree that total-space figures must account for the trained ML model and auxiliary structures to allow verification of the entropy-barrier claim. In the revised manuscript we will add explicit reporting of model sizes (and any auxiliary data) alongside the static-function storage costs, and we will recompute and present the space savings ratios using these complete figures. revision: yes
-
Referee: [§4] §4 (or equivalent experimental subsection): the paper should state the experimental protocol for instance selection and report whether the quoted savings ratios come from a pre-specified set of runs or were obtained after observing the data; inclusion of standard error bars or multiple independent trials would directly address this.
Authors: We will expand §4 to describe the instance-selection protocol in detail and to state whether the reported ratios derive from a pre-specified set of runs. Where feasible we will also add results from multiple independent trials together with standard error bars. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper's construction trains an ML model on the key-value pairs to output per-key value distributions, derives Huffman-style prefix codes from those distributions, and stores the resulting codewords inside an off-the-shelf static-function data structure whose space is bounded by the sum of the code lengths plus the model size. The claimed improvement below zero-order entropy H0 follows directly from the information-theoretic property that a code whose lengths are -log p_i (where p_i is the model-predicted probability) is shorter on average precisely when the model assigns higher probability to the true value than the uniform or global distribution; this inequality is measured on held-out or synthetic instances against external baselines and does not reduce to a fitted parameter being renamed as a prediction. No load-bearing step invokes a self-citation whose content is itself unverified, nor does any equation equate a derived quantity to its own input by definition. The experiments therefore constitute independent evidence rather than a tautology.
Axiom & Free-Parameter Ledger
free parameters (1)
- ML model architecture and hyperparameters
Reference graph
Works this paper leans on
-
[1]
The Gene Ontology resource: enriching a GOld mine.Nucleic acids research 49, D1 (2021), D325–D334
2021. The Gene Ontology resource: enriching a GOld mine.Nucleic acids research 49, D1 (2021), D325–D334
work page 2021
-
[2]
Stephen Alstrup, Gerth Brodal, and Theis Rauhe. 2001. Optimal static range reporting in one dimension. InProc. 33rd Annual ACM Symposium on Theory of Computing (STOC). 476–482
work page 2001
-
[3]
Stones, Gang Wang, Xiaoguang Liu, Jing Liu, and Sheng Lin
Naiyong Ao, Fan Zhang, Di Wu, Douglas S. Stones, Gang Wang, Xiaoguang Liu, Jing Liu, and Sheng Lin. 2011. Efficient Parallel Lists Intersection and Index Compression Algorithms Using Graphics Processing Units.PVLDB4, 8 (2011), 470–481. https://doi.org/10.14778/2002974.2002975 12
-
[4]
Sepehr Assadi, Martin Farach-Colton, and William Kuszmaul. 2023. Tight Bounds for Monotone Minimal Perfect Hashing. InProc. 34th Annual ACM- SIAM Symposium on Discrete Algorithms (SODA). 456–476. https://doi.org/10. 1137/1.9781611977554.CH20
work page 2023
-
[5]
Garofalakis, and Rajeev Rastogi
Shivnath Babu, Minos N. Garofalakis, and Rajeev Rastogi. 2001. SPARTAN: A Model-Based Semantic Compression System for Massive Data Tables. InProc. 28th ACM International Conference on Management of Data (SIGMOD). 283–294. https://doi.org/10.1145/375663.375693
-
[6]
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. 2010. Fast prefix search in little space, with applications. InProc. 18th Annual European Symposium on Algorithms (ESA). 427–438
work page 2010
-
[7]
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. 2011. The- ory and practice of monotone minimal perfect hashing.ACM J. Exp. Algorithmics 16 (2011). https://doi.org/10.1145/1963190.2025378
-
[8]
Djamal Belazzougui and Rossano Venturini. 2013. Compressed static functions with applications. InProc. 24th Annual ACM-SIAM Symposium on Discrete Algo- rithms (SODA). SIAM, 229–240. https://doi.org/10.1137/1.9781611973105.17
-
[9]
Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, and Guido Tagliavini
Michael A. Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2023. Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once.J. ACM70, 6 (2023), 40:1–40:51. https://doi.org/10.1145/3625817
-
[10]
Bercea, Jakob Bæk Tejs Houen, and Rasmus Pagh
Ioana O. Bercea, Jakob Bæk Tejs Houen, and Rasmus Pagh. 2024. Daisy Bloom Filters. InProc. 19th Scandinavian Symposium and Workshops on Algorithm Theory (SW AT). 9:1–9:19. https://doi.org/10.4230/LIPICS.SWAT.2024.9
-
[11]
Jock Blackard. 1998. Covertype. UCI Machine Learning Repository. https: //doi.org/10.24432/C50K5N
-
[12]
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors.Commun. ACM13, 7 (1970), 422–426. https://doi.org/10.1145/362686. 362692
-
[13]
Antonio Boffa, Paolo Ferragina, and Giorgio Vinciguerra. 2022. A Learned Approach to Design Compressed Rank/Select Data Structures.ACM Trans. on Algorithms18, 3, Article 24 (oct 2022), 28 pages. https://doi.org/10.1145/3524060
-
[14]
Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani
Fabiano C. Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. 2005. A Prac- tical Minimal Perfect Hashing Method. InProc. 4th International Workshop on Experimental and Efficient Algorithms (WEA). 488–500. https://doi.org/10.1007/ 11427186_42
work page 2005
-
[15]
Botelho, Rasmus Pagh, and Nivio Ziviani
Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. 2013. Practical Perfect Hashing in Nearly Optimal Space.Inf. Syst.38, 1 (2013), 108–131. https://doi. org/10.1016/j.is.2012.06.002
-
[16]
Jehoshua Bruck, Jie Gao, and Anxiao Jiang. 2006. Weighted Bloom filter. In Proc. 39th IEEE International Symposium on Information Theory (ISIT). 2304–2308. https://doi.org/10.1109/ISIT.2006.261978
-
[17]
Hayoung Byun and Hyesook Lim. 2022. Learned FBF: Learning-Based Functional Bloom Filter for Key-Value Storage.IEEE Trans. Computers71, 8 (2022), 1928–1938. https://doi.org/10.1109/TC.2021.3112079
-
[18]
Benjamin Coleman, David Torres Ramos, Vihan Lakshman, Chen Luo, and An- shumali Shrivastava. 2023. CARAMEL: A Succinct Read-Only Lookup Table via Compressed Static Functions. (2023). https://doi.org/10.48550/arXiv.2305.16545
-
[19]
Colin Cooper. 2000. On the rank of random matrices.Random Struct. Algorithms 16, 2 (2000), 209–232. https://doi.org/10.1002/(SICI)1098-2418(200003)16:2% 3C209::AID-RSA6%3E3.0.CO;2-1
-
[20]
Zhenwei Dai and Anshumali Shrivastava. 2020. Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web. InProc. 33rd Annual Conference on Neural Information Processing Systems (NeurIPS). https://proceedings.neurips.cc/paper/ 2020/hash/86b94dae7c6517ec1ac767fd2c136580-Abst...
work page 2020
-
[21]
Zhenwei Dai, Anshumali Shrivastava, Pedro Reviriego, and José Alberto Hernán- dez. 2022. Optimizing Learned Bloom Filters: How Much Should Be Learned? IEEE Embed. Syst. Lett.14, 3 (2022), 123–126. https://doi.org/10.1109/LES.2022. 3156019
-
[22]
Scott Davies and Andrew W. Moore. 1999. Bayesian Networks for Lossless Dataset Compression. InProc. 5th ACM International Conference on Knowledge Discovery and Data Mining, (KDD). 387–391. https://doi.org/10.1145/312129. 312289
-
[23]
1981.Assessing Probability Assessors: Calibration and Refinement.Technical Report
Morris H DeGroot and Stephen E Fienberg. 1981.Assessing Probability Assessors: Calibration and Refinement.Technical Report
work page 1981
-
[24]
Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, and Joel Veness. 2024. Language Modeling Is Compression. InProc. 12th International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=jznbgiynus
work page 2024
-
[25]
Martin Dietzfelbinger and Rasmus Pagh. 2008. Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). InProc. 35th Inter- national Colloquium on Automata, Languages, and Programming (ICALP). 385–396. https://doi.org/10.1007/978-3-540-70575-8_32
-
[26]
Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer
Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer
-
[27]
Fast Succinct Retrieval and Approximate Membership Using Ribbon. In Proc. 20th International Symposium on Experimental Algorithms (SEA). 4:1–4:20. https://doi.org/10.4230/LIPICS.SEA.2022.4
-
[28]
Paolo Ferragina, Hans-Peter Lehmann, Peter Sanders, and Giorgio Vinciguerra
-
[29]
CPM.2016.13,doi:10.4230/LIPICS.CPM.2016.13
Learned monotone minimal perfect hashing. InProceedings of the 31st Annual European Symposium on Algorithms (ESA). https://doi.org/10.4230/LIPIcs. ESA.2023.46
-
[30]
Paolo Ferragina, Giovanni Manzini, and Giorgio Vinciguerra. 2022. Compressing and Querying Integer Dictionaries Under Linearities and Repetitions.IEEE Access 10 (2022), 118831–118848. https://doi.org/10.1109/ACCESS.2022.3221520
-
[31]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds.PVLDB13, 8 (2020), 1162–1175. https://doi.org/10.14778/3389133.3389135
-
[32]
Fredman, János Komlós, and Endre Szemerédi
Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a Sparse Table with 𝑂( 1) Worst Case Access Time.J. ACM31, 3 (1984), 538–544. https: //doi.org/10.1145/828.1884
-
[33]
Robert G. Gallager. 1978. Variations on a theme by Huffman.IEEE Trans. Inf. Theory24, 6 (1978), 668–674. https://doi.org/10.1109/TIT.1978.1055959
-
[34]
Yihan Gao and Aditya G. Parameswaran. 2016. Squish: Near-Optimal Com- pression for Archival of Relational Datasets. InProc. 22nd ACM International Conference on Knowledge Discovery and Data Mining (KDD). 1575–1584. https: //doi.org/10.1145/2939672.2939867
-
[35]
Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. 2020. Fast scalable construction of ([compressed] static | minimal perfect hash) functions.Inf. Comput.273 (2020), 104517. https://doi.org/10.1016/J.IC.2020.104517
-
[36]
Dave Gomboc, Christian R. Shelton, Andrew S. Miner, and Gianfranco Ciardo
-
[37]
Comparing Lossless Compression Methods for Chess Endgame Data. In Proc. 27th European Conference on Artificial Intelligence (ECAI), Vol. 392. IOS Press, 4116–4123. https://doi.org/10.3233/FAIA240982
-
[38]
Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, and Idoia Ochoa. 2019. DeepZip: Lossless Data Compression Using Recurrent Neural Networks. InProc. 29th Data Compression Conference (DCC). 575. https://doi.org/10.1109/DCC.2019. 00087
-
[39]
Thomas Mueller Graf and Daniel Lemire. 2020. Xor Filters.ACM J. Exp. Algo- rithmics25 (2020), 1–16. https://doi.org/10.1145/3376122
-
[40]
Thomas Mueller Graf and Daniel Lemire. 2022. Binary Fuse Filters: Fast and Smaller Than Xor Filters.ACM J. Exp. Algorithmics27 (2022), 1.5:1–1.5:15. https://doi.org/10.1145/3510449
-
[41]
Andrea Guerra, Giorgio Vinciguerra, Antonio Boffa, and Paolo Ferragina. 2025. Learned Compression of Nonlinear Time Series With Random Access. InProc. 41st IEEE International Conference on Data Engineering (ICDE). 1579–1592. https: //doi.org/10.1109/ICDE65448.2025.00122
-
[42]
Torben Hagerup and Torsten Tholey. 2001. Efficient Minimal Perfect Hashing in Nearly Minimal Space. InProc. 18th Annual Symposium on Theoretical As- pects of Computer Science (STACS), Vol. 2010. 317–326. https://doi.org/10.1007/ 3-540-44693-1_28
work page 2001
- [43]
-
[44]
Hreinsson, Morten Krøyer, and Rasmus Pagh
Jóhannes B. Hreinsson, Morten Krøyer, and Rasmus Pagh. 2009. Storing a Compressed Function with Constant Time Access. InProc. 17th Annual Eu- ropean Symposium on Algorithms (ESA). 730–741. https://doi.org/10.1007/ 978-3-642-04128-0_65
work page 2009
-
[45]
David A Huffman. 2007. A method for the construction of minimum-redundancy codes.Proceedings of the IRE40, 9 (2007), 1098–1101
work page 2007
-
[46]
Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur Cetintemel. 2020. DeepSqueeze: Deep Semantic Compression for Tabular Data. InProc. 46th ACM International Conference on Management of Data (SIGMOD). 1733–1746. https://doi.org/10.1145/3318464.3389734
-
[47]
Amitansh Joshi, Amit Parolkar, and Vedant Das. 2023. Spotify_1Million_Tracks. https://doi.org/10.34740/KAGGLE/DSV/5987852
-
[48]
Minoru Kanehisa, Miho Furumichi, Yoko Sato, Mari Ishiguro-Watanabe, and Mao Tanabe. 2021. KEGG: integrating viruses and cellular organisms.Nucleic acids research49, D1 (2021), D545–D551
work page 2021
-
[49]
Kiss, Éva Hosszu, János Tapolcai, Lajos Rónyai, and Ori Rottenstreich
Sándor Z. Kiss, Éva Hosszu, János Tapolcai, Lajos Rónyai, and Ori Rottenstreich
-
[50]
Bloom Filter With a False Positive Free Zone.IEEE Trans. Netw. Serv. Manag.18, 2 (2021), 2334–2349. https://doi.org/10.1109/TNSM.2021.3059075
-
[51]
1998.The art of computer programming: Volume 3: Sorting and Searching
Donald E Knuth. 1998.The art of computer programming: Volume 3: Sorting and Searching. Addison-Wesley Professional
work page 1998
-
[52]
Dominik Köppl, Simon J. Puglisi, and Rajeev Raman. 2022. Fast and Simple Compact Hashing via Bucketing.Algorithmica84, 9 (2022), 2735–2766. https: //doi.org/10.1007/S00453-022-00996-Y
-
[53]
Chi, Jeffrey Dean, and Neoklis Polyzotis
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. InProc. 44th ACM International Conference on Management of Data (SIGMOD). 489–504. https://doi.org/10.1145/3183713. 3196909
-
[54]
M. Oguzhan Külekci. 2014. Enhanced Variable-Length Codes: Improved Com- pression with Efficient Random Access. InProc. 24th Data Compression Conference (DCC). 362–371. https://doi.org/10.1109/DCC.2014.74
-
[55]
Florian Kurpicz, Hans-Peter Lehmann, and Peter Sanders. 2023. PaCHash: Packed and compressed hash tables. InProc. 25th Symposium on Algorithm Engineering 13 and Experiments (ALENEX). 162–175. https://doi.org/10.1137/1.9781611977561. ch14
-
[56]
Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer. 2023. SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. InProc. 25th Symposium on Algo- rithm Engineering and Experiments (ALENEX). 176–189. https://doi.org/10.1137/ 1.9781611977561.ch15
work page 2023
-
[57]
Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer. 2024. ShockHash: To- wards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force. InProc. 26th Symposium on Algorithm Engineering and Experiments (ALENEX). 194–206. https://doi.org/10.1137/1.9781611977929.15
- [58]
-
[59]
Hanwen Liu, Mihail Stoian, Alexander van Renen, and Andreas Kipf. 2024. Corra: Correlation-Aware Column Compression. InProc. 2nd Workshop on Cloud Databases (CloudDB). https://vldb.org/workshops/2024/proceedings/CloudDB/ clouddb-2.pdf
work page 2024
-
[60]
Yihao Liu, Xinyu Zeng, and Huanchen Zhang. 2024. LeCo: Lightweight Com- pression via Learning Serial Correlations.Proc. ACM Manag. Data2, 1, Article 65 (mar 2024), 28 pages. https://doi.org/10.1145/3639320
-
[61]
Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli, Raffaele Giancarlo, and Marco Frasca. 2024. The role of classifiers and data complexity in learned Bloom filters: insights and recommendations.J. Big Data11, 1 (2024), 45. https: //doi.org/10.1186/S40537-024-00906-9
-
[62]
Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. InProc. 31st Annual Conference on Neural Information Processing Systems (NeurIPS). 462–471. https://proceedings.neurips.cc/paper/2018/hash/ 0f49c89d1e7298bb9930789c8ed59d48-Abstract.html
work page 2018
-
[63]
Christian Worm Mortensen, Rasmus Pagh, and Mihai Pˇatraçcu. 2005. On dynamic range reporting in one dimension. InProc. 37th annual ACM Symposium on Theory of Computing (STOC). 104–111
work page 2005
-
[64]
Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. 2015. Obtain- ing well calibrated probabilities using bayesian binning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 29
work page 2015
-
[65]
2016.Compact data structures: a practical approach
Gonzalo Navarro. 2016.Compact data structures: a practical approach. Cambridge University Press
work page 2016
-
[66]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing.J. Algorithms 51, 2 (2004), 122–144. https://doi.org/10.1016/j.jalgor.2003.12.002
-
[67]
Ripon Patgiri, Anupam Biswas, and Sabuzima Nayak. 2023. deepBF: Malicious URL detection using learned Bloom Filter and evolutionary deep learning.Com- put. Commun.200 (2023), 30–41. https://doi.org/10.1016/J.COMCOM.2022.12.027
-
[68]
Mihai Pătraşcu. 2008. Succincter. InProc. 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS). 305–313. https://doi.org/10.1109/FOCS. 2008.83
-
[69]
Ely Porat. 2009. An Optimal Bloom Filter Replacement Based on Matrix Solving. InProc. 4th International Computer Science Symposium in Russia (CSR). 263–273. https://doi.org/10.1007/978-3-642-03351-3_25
-
[70]
Arvind Prasad and Shalini Chandra. 2024. PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning.Comput. Secur.136 (2024), 103545. https://doi.org/10.1016/ J.COSE.2023.103545 Dataset available at https://archive.ics.uci.edu/dataset/967/ phiusiil+phishing+url+dataset
-
[71]
Rae, Sergey Bartunov, and Timothy P
Jack W. Rae, Sergey Bartunov, and Timothy P. Lillicrap. 2019. Meta-Learning Neural Bloom Filters. InProc. 36th International Conference on Machine Learning (ICML). 5271–5280. http://proceedings.mlr.press/v97/rae19a.html
work page 2019
- [72]
-
[73]
Pedro Reviriego, José Alberto Hernández, Zhenwei Dai, and Anshumali Shrivas- tava. 2021. Learned Bloom Filters in Adversarial Environments: A Malicious URL Detection Use-Case. InProc. 22nd IEEE International Conference on High Perfor- mance Switching and Routing (HPSR). 1–6. https://doi.org/10.1109/HPSR52026. 2021.9481857
-
[74]
Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf, Michael Mitzen- macher, and Tim Kraska. 2022. Can Learned Models Replace Hash Functions? PVLDB16, 3 (2022), 532–545. https://doi.org/10.14778/3570690.3570702
-
[75]
Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, and Marius Portmann. 2020. NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Sys- tems. InProc. 10th EAI International Conference on Big Data Technologies and Appli- cations (LNICST), Vol. 371. 117–135. https://doi.org/10.1007/978-3-030-72802-1_9 Dataset available at https://staff.itee.u...
-
[76]
Atsuki Sato and Yusuke Matsui. 2023. Fast Partitioned Learned Bloom Filter. InProc. 36th Annual Conference on Neural Information Process- ing Systems (NeurIPS). http://papers.nips.cc/paper_files/paper/2023/hash/ 7b2e844c52349134268e819a9b56b9e8-Abstract-Conference.html
work page 2023
-
[77]
Jürgen Schmidhuber and Stefan Heil. 1996. Sequential neural text compression. IEEE Trans. Neural Networks7, 1 (1996), 142–146. https://doi.org/10.1109/72. 478398
work page doi:10.1109/72 1996
-
[78]
Claude E Shannon. 1948. A mathematical theory of communication.The Bell system technical journal27, 3 (1948), 379–423
work page 1948
-
[79]
Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov. 2022. Space- efficient representation of genomic k-mer count tables.Algorithms Mol. Biol.17, 1 (2022), 5. https://doi.org/10.1186/S13015-022-00212-0
-
[80]
Samuel D. Stearns. 1995. Arithmetic coding in lossless waveform compression. IEEE Trans. Signal Process.43, 8 (1995), 1874–1879. https://doi.org/10.1109/78. 403346
work page doi:10.1109/78 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.