Efficient privacy preservation of big data for accurate data mining

D. Liu; I. Khalil; M.A.P. Chamikara; P. Bertok; S. Camtepe

arxiv: 1906.08149 · v1 · pith:65DNVIG3new · submitted 2019-06-19 · 💻 cs.DB · cs.CR

Efficient privacy preservation of big data for accurate data mining

M.A.P. Chamikara , P. Bertok , D. Liu , S. Camtepe , I. Khalil This is my paper

Pith reviewed 2026-05-25 19:50 UTC · model grok-4.3

classification 💻 cs.DB cs.CR

keywords privacy preservationbig datadata miningperturbation algorithmgeometric transformationsdata classificationnonreversiblescalability

0 comments

The pith

PABIDOT uses optimal geometric transformations to perturb big data while preserving classification accuracy and privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PABIDOT as a nonreversible perturbation algorithm for privacy preservation of big data. It targets limitations in existing methods that struggle with efficiency, scalability, data utility, or privacy strength. The approach relies on optimal geometric transformations to create the perturbations. Experiments using nine datasets and five classification algorithms show PABIDOT outperforms two related algorithms in execution speed, scalability, attack resistance, and accuracy. This would matter if it allows organizations to perform data mining on large sensitive collections without major privacy or performance costs.

Core claim

PABIDOT is an efficient and scalable nonreversible perturbation algorithm for privacy preservation of big data via optimal geometric transformations. When tested with nine datasets and five classification algorithms, it excels in execution speed, scalability, attack resistance and accuracy in large-scale privacy-preserving data classification when compared with two other related privacy-preserving algorithms.

What carries the argument

PABIDOT, a perturbation algorithm that applies optimal geometric transformations to achieve non-reversibility while supporting downstream classification.

If this is right

Privacy-preserving classification on big data can scale without major losses in speed or accuracy.
Nonreversible perturbation can provide stronger attack resistance than prior geometric methods while keeping utility high.
The same transformation approach works across multiple classification algorithms without per-algorithm redesign.
Execution time for privacy steps becomes short enough for routine use on large datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same geometric approach might extend to regression or clustering tasks on sensitive data with similar utility retention.
Widespread use could reduce reliance on heavier anonymization techniques that distort data more severely.
Testing on streaming or real-time big data sources would check whether the speed gains hold under continuous processing.

Load-bearing premise

The chosen geometric transformations can simultaneously prevent reversal to recover original data and retain enough statistical structure for high classification accuracy.

What would settle it

A replication experiment in which the perturbed data can be reversed to recover original sensitive values or in which classification accuracy falls below the two compared algorithms on the same nine datasets.

Figures

Figures reproduced from arXiv: 1906.08149 by D. Liu, I. Khalil, M.A.P. Chamikara, P. Bertok, S. Camtepe.

**Figure 1.** Figure 1: Basic flow and the architecture of PABIDOT. In this setting, the data owner is considered to be the trusted [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 2.** Figure 2: Effect of Randomized Expansion. The red arrows of the right-hand side show a positive shift where a calibrated positive random value is added to the positive value to increase the positiveness of the original value. The left-hand side which is represented by the blue arrows show a negative shift where a calibrated negative random value is added to the negative value to increase the negativeness of the orig… view at source ↗

**Figure 3.** Figure 3: Time consumption of PABIDOT. PABIDOT shows linear time complexity for the number of instances, and [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗

**Figure 4.** Figure 4: Time Consumption of PABIDOT before and after the efficiency optimization. Both PABIDOT and [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Time consumption comparison of the three methods. Due to the extremely low time consumption of PABIDOT, [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: The process used to generate the classification models trained by the perturbed data. This figure represents [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗

**Figure 7.** Figure 7: Box plots for the datasets listed in Table 5. The boxplots in the figure show how each perturbation algorithm [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

**Figure 8.** Figure 8: φ vs. θ. Figure 8a shows variation of the local minimum privacy guarantee (φi) curves for each attribute of the WCDS dataset. The φi values are utilized to generate the global minimum privacy guarantee (φ) curve as shown in Figure 8b; PABIDOT considers the global maximum of φ to select the best perturbation parameters. For the WCDS dataset the best perturbation parameters are θoptimal = 35 and Rifoptimal =… view at source ↗

**Figure 9.** Figure 9: minimum std(D-Dr ) and average std(D-Dr ) of the reconstructed datasets produced by Naive Snooping, ICA and known I/O. The red vertical lines show the instance of optimal perturbation parameter selection of PABIDOT. The red lines nearly indicate the point at which the corresponding perturbed dataset provides the highest privacy guarantee. This provides empirical evidence on PABIDOT providing the optimal pr… view at source ↗

**Figure 10.** Figure 10: Effect of σ on min(std(D − Dr )) and classification accuracy. When the σ of the randomized expansion is increased, the minimum std(D − Dp) increases as shown in Figure 10a. However, the classification accuracy shows only a minimal decrease against increasing σ. This confirms PABIDOT’s capability of maintaining utility at a constant level while providing increased resistance to increasing randomized expans… view at source ↗

**Figure 1.** Figure 1: We assume that only the perturbed data is released and the original data is not accessible [PITH_FULL_IMAGE:figures/full_fig_p037_1.png] view at source ↗

read the original abstract

Computing technologies pervade physical spaces and human lives, and produce a vast amount of data that is available for analysis. However, there is a growing concern that potentially sensitive data may become public if the collected data are not appropriately sanitized before being released for investigation. Although there are more than a few privacy-preserving methods available, they are not efficient, scalable or have problems with data utility, and/or privacy. This paper addresses these issues by proposing an efficient and scalable nonreversible perturbation algorithm, PABIDOT, for privacy preservation of big data via optimal geometric transformations. PABIDOT was tested for efficiency, scalability, resistance, and accuracy using nine datasets and five classification algorithms. Experiments show that PABIDOT excels in execution speed, scalability, attack resistance and accuracy in large-scale privacy-preserving data classification when compared with two other, related privacy-preserving algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PABIDOT adds a named geometric perturbation variant with speed and accuracy tests on nine datasets, but the core non-reversibility claim rests on thin technical detail and narrow baselines.

read the letter

The main takeaway is that this paper names a new algorithm, PABIDOT, that applies geometric transformations to perturb data for privacy while aiming to keep it usable for classification. It reports faster execution, better scalability, stronger attack resistance, and higher accuracy than two other perturbation methods across nine datasets and five classifiers. That empirical spread is the clearest positive element here. The work targets a genuine practical gap: privacy methods that scale to big data without killing downstream utility. Running the same tests on multiple real datasets and different classifiers gives the accuracy claims more weight than a single-dataset study would. The emphasis on execution time and scalability also matches what people actually need when handling large collections. The soft spots sit in the missing mechanics. The abstract gives no equations for the transformations, no description of how non-reversibility is proven or measured, and no attack model details. Without those, it is difficult to judge whether the claimed resistance holds or whether the utility preservation is more than parameter tuning. The comparison set is also narrow; only two related algorithms are used, so it is unclear how PABIDOT would fare against stronger or more recent baselines. No statistical significance tests or error analysis appear in the reported outcomes, which weakens the superiority statements. This paper is for researchers and engineers working on privacy-preserving data release who need something that runs quickly on large tables. It contains enough concrete experiments on a relevant problem to justify sending it to a serious referee, even though the technical exposition would need expansion and the evaluation would need broader baselines and more rigorous attack testing.

Referee Report

0 major / 2 minor

Summary. The paper proposes PABIDOT, a non-reversible perturbation algorithm for privacy preservation of big data that relies on optimal geometric transformations. It evaluates the algorithm on nine datasets with five classification algorithms, reporting superior execution speed, scalability, attack resistance, and classification accuracy relative to two existing privacy-preserving methods.

Significance. If the empirical claims hold, the work provides a practical, scalable technique for privacy-preserving classification on large datasets that improves upon prior methods in both efficiency and the utility-privacy balance. The breadth of evaluation across multiple datasets and classifiers supplies concrete evidence that could inform deployment decisions in data-mining applications.

minor comments (2)

The abstract asserts positive experimental outcomes without supplying algorithm equations, attack-model definitions, or statistical tests; the full manuscript should make these elements explicit in the method and evaluation sections to allow independent verification of the superiority claims.
The description of the geometric transformations should include a clear statement of the attack model and a formal argument (or empirical test) establishing non-invertibility, as this property is load-bearing for the privacy guarantee.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation of minor revision. The referee's description accurately captures the PABIDOT proposal, its evaluation across nine datasets and five classifiers, and the reported advantages in speed, scalability, attack resistance, and accuracy.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes the PABIDOT algorithm based on geometric transformations and reports empirical results on nine datasets with five classifiers, comparing speed, scalability, resistance, and accuracy to two baselines. No equations, derivations, or load-bearing steps are present in the provided text that reduce any claimed prediction, uniqueness, or result to a fitted parameter, self-citation chain, or definitional tautology. The evaluation is self-contained against external benchmarks and does not invoke prior author work as a substitute for independent verification.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5687 in / 1013 out tokens · 27099 ms · 2026-05-25T19:50:10.736580+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

[1]

Aggarwal, C. C. (2015). Privacy-preserving data mining. In Data Mining (pp. 663–693). Springer. doi:https://doi.org/10.1007/978-3-319-14142-8

work page doi:10.1007/978-3-319-14142-8 2015
[2]

C., & Yu, P

Aggarwal, C. C., & Yu, P. S. (2004). A condensation approach to privacy preserving data mining. In EDBT (pp. 183–199). Springer volume 4. doi: https://doi.org/10.1007/ 978-3-540-24741-8_12

work page 2004
[3]

Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. In ACM Sigmod Record (pp. 439–450). ACM volume 29. doi: https://doi.org/10.1145/335191.335438

work page doi:10.1145/335191.335438 2000
[4]

Aldeen, Y. A. A. S., Salleh, M., & Razzaque, M. A. (2015). A comprehensive review on privacy pre- serving data mining. SpringerPlus, 4, 694. doi:https://doi.org/10.1186/s40064-015-1481-x

work page doi:10.1186/s40064-015-1481-x 2015
[5]

A., Hoehle, H., Goodarzi, S., & Venkatesh, V

Aloysius, J. A., Hoehle, H., Goodarzi, S., & Venkatesh, V. (2018). Big data initiatives in retail environments: Linking service process perceptions to shopping outcomes. Annals of operations research, 270, 25–51. doi: https://doi.org/10.1007/s10479-016-2276-3

work page doi:10.1007/s10479-016-2276-3 2018
[6]

Bettini, C., & Riboni, D. (2015). Privacy protection in pervasive systems: State of the art and technical challenges. Pervasive and Mobile Computing , 17, 159–174. doi: https://doi.org/10. 1016/j.pmcj.2014.09.010

work page 2015
[7]

Buccafurri, F., Lax, G., Nicolazzo, S., & Nocera, A. (2016). A threat to friendship privacy in facebook. In International Conference on Availability, Reliability, and Security (pp. 96–105). Springer. doi: https://doi.org/10.1007/978-3-319-45507-5_7

work page doi:10.1007/978-3-319-45507-5_7 2016
[8]

Capraro, V., & Perc, M. (2018). Grand challenges in social physics: In pursuit of moral behavior. Frontiers in Physics , 6, 107. doi: https://doi.org/10.3389/fphy.2018.00107

work page doi:10.3389/fphy.2018.00107 2018
[9]

Chamikara, M. A. P., Bertok, P., Liu, D., Camtepe, S., & Khalil, I. (2018). Eﬃcient data perturbation for privacy preserving and accurate data stream mining. Pervasive and Mobile Computing, 48, 1–19. doi: https://doi.org/10.1016/j.pmcj.2018.05.003. 42

work page doi:10.1016/j.pmcj.2018.05.003 2018
[10]

Chen, K., & Liu, L. (2005). A random rotation perturbation approach to privacy preserving data classiﬁcation. The Ohio Center of Excellence in Knowledge-Enabled Computing , . URL: https://corescholar.libraries.wright.edu/knoesis/916/

work page 2005
[11]

Chen, K., & Liu, L. (2011). Geometric data perturbation for privacy preserving outsourced data mining. Knowledge and Information Systems , 29, 657–695. doi:https://doi.org/10.1007/ s10115-010-0362-4

work page 2011
[12]

Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., & Zhu, M. Y. (2002). Tools for privacy preserving distributed data mining. ACM Sigkdd Explorations Newsletter , 4, 28–34. doi: https: //doi.org/10.1145/772862.772867

work page doi:10.1145/772862.772867 2002
[13]

Cuzzocrea, A. (2015). Privacy-preserving big data management: The case of olap. Big Data: Algorithms, Analytics, and Applications , (pp. 301–326;). URL: https://books.google.com.au/ books?isbn=1482240564

work page 2015
[14]

Dwork, C., Roth, A. et al. (2014). The algorithmic foundations of diﬀerential privacy. Foundations and Trends R⃝ in Theoretical Computer Science , 9, 211–407. doi: http://dx.doi.org/10.1561/ 0400000042

work page 2014
[15]

Erlingsson, ´U., Pihur, V., & Korolova, A. (2014). Rappor: Randomized aggregatable privacy- preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security (pp. 1054–1067). ACM. doi: https://doi.org/10.1145/2660267. 2660348

work page doi:10.1145/2660267 2014
[16]

Gai, K., Qiu, M., Zhao, H., & Xiong, J. (2016). Privacy-aware adaptive data encryption strategy of big data in cloud computing. In Cyber Security and Cloud Computing (CSCloud), 2016 IEEE 3rd International Conference on (pp. 273–278). IEEE. doi: http://doi.ieeecomputersociety. org/10.1109/CSCloud.2016.52

work page doi:10.1109/cscloud.2016.52 2016
[17]

G¨ avert, H., Hurri, J., S¨ arel¨ a, J., & Hyv¨ arinen, A. (2005). The fastica package for matlab.Lab Com- put Inf Sci Helsinki Univ. Technol , . URL: https://research.ics.aalto.fi/ica/fastica/

work page 2005
[18]

Hasan, A., Jiang, Q., Luo, J., Li, C., & Chen, L. (2016). An eﬀective value swapping method for privacy preserving data publishing. Security and Communication Networks , 9, 3219–3228. doi:https://doi.org/10.1002/sec.1527. 43

work page doi:10.1002/sec.1527 2016
[19]

Helbing, D., Brockmann, D., Chadefaux, T., Donnay, K., Blanke, U., Woolley-Meza, O., Mous- said, M., Johansson, A., Krause, J., Schutte, S. et al. (2015). Saving human lives: What complex- ity science and information systems can contribute. Journal of statistical physics , 158, 735–781. doi:https://doi.org/10.1007/s10955-014-1024-9

work page doi:10.1007/s10955-014-1024-9 2015
[20]

Howell, D. C. (2016). Fundamental statistics for the behavioral sciences. Cengage Learning. URL: https://books.google.com.au/books?isbn=1305652975

work page 2016
[21]

Jalili, M., & Perc, M. (2017). Information cascades in complex networks. Journal of Complex Networks, 5, 665–693. doi: https://doi.org/10.1093/comnet/cnx019

work page doi:10.1093/comnet/cnx019 2017
[22]

Jones, H. (2012). Computer Graphics through Key Mathematics . Springer London : Imprint: Springer. URL: https://books.google.com.au/books?id=f7gPBwAAQBAJ

work page 2012
[23]

O., & Swamy, M

Kabir, W., Ahmad, M. O., & Swamy, M. (2015). A novel normalization technique for multimodal biometric systems. In Circuits and Systems (MWSCAS), 2015 IEEE 58th International Midwest Symposium on (pp. 1–4). IEEE. doi: https://doi.org/10.1109/MWSCAS.2015.7282214

work page doi:10.1109/mwscas.2015.7282214 2015
[24]

Kairouz, P., Oh, S., & Viswanath, P. (2014). Extremal mechanisms for local diﬀerential privacy. In Advances in neural information processing systems (pp. 2879–2887). URL: http://papers. nips.cc/paper/5392-extremal-mechanisms-for-local-differential-privacy

work page 2014
[25]

Kerschbaum, F., & H¨ arterich, M. (2017). Searchable encryption to reduce encryption degradation in adjustably encrypted databases. In IFIP Annual Conference on Data and Applications Security and Privacy (pp. 325–336). Springer. doi: https://doi.org/10.1007/978-3-319-61176-1_18

work page doi:10.1007/978-3-319-61176-1_18 2017
[26]

Kieseberg, P., & Weippl, E. (2018). Security challenges in cyber-physical production systems. In International Conference on Software Quality (pp. 3–16). Springer. doi: https://doi.org/10. 1007/978-3-319-71440-0_1

work page 2018
[27]

Li, P., Li, J., Huang, Z., Gao, C.-Z., Chen, W.-B., & Chen, K. (2017). Privacy-preserving outsourced classiﬁcation in cloud computing. Cluster Computing , (pp. 1–10.). doi: https://doi. org/10.1007/s10586-017-0849-9

work page doi:10.1007/s10586-017-0849-9 2017
[28]

Liu, K., Kargupta, H., & Ryan, J. (2006). Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Transactions on knowledge and Data Engineering, 18, 92–106. doi: https://doi.org/10.1109/TKDE.2006.14. 44

work page doi:10.1109/tkde.2006.14 2006
[29]

M., & Sundarsekar, R

Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K. M., & Sundarsekar, R. (2017). Big data knowledge system in healthcare. In Internet of things and big data technolo- gies for next generation healthcare (pp. 133–157). Springer. doi: https://doi.org/10.1007/ 978-3-319-49736-5_7

work page 2017
[30]

Maruskin, J. (2012). Essential Linear Algebra . Solar Crest Publishing, LLC. URL: https: //books.google.com.au/books?id=aOF3-hx3u1kC

work page 2012
[31]

Muralidhar, K., Parsa, R., & Sarathy, R. (1999). A general additive data perturbation method for database security.management science, 45, 1399–1415. doi:https://doi.org/10.1287/mnsc. 45.10.1399

work page doi:10.1287/mnsc 1999
[32]

Nell, W., & Shure, L. (2011). Memory proﬁling. URL: https://patents.google.com/patent/ US7908591B1/en uS Patent 7,908,591

work page 2011
[33]

D., Okkalioglu, M., Koc, M., & Polat, H

Okkalioglu, B. D., Okkalioglu, M., Koc, M., & Polat, H. (2015). A survey: deriving private information from perturbed data. Artiﬁcial Intelligence Review , 44, 547–569. doi: https://doi. org/10.1007/s10462-015-9439-5

work page doi:10.1007/s10462-015-9439-5 2015
[34]

Paeth, A. W. (2014). Graphics Gems V (Macintosh Version) . Academic Press. URL: https: //books.google.com.au/books?isbn=1483296695

work page 2014
[35]

Park, K.-j., & Ryou, H.-b. (2003). Anomaly detection scheme using data mining in mobile environment. Computational Science and Its Applications ICCSA , (pp. 978–978.). doi: https: //doi.org/10.1007/3-540-44843-8_3

work page doi:10.1007/3-540-44843-8_3 2003
[36]

Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., & Ren, K. (2016). Heavy hitter estimation over set- valued data with local diﬀerential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 192–203). ACM. doi: https://doi.org/10. 1145/2976749.2978409

work page arXiv 2016
[37]

Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on (pp. 3–18). IEEE. doi: https://doi.org/10.1109/SP.2017.41

work page doi:10.1109/sp.2017.41 2017
[38]

Soria-Comas, J., & Domingo-Ferrer, J. (2016). Big data privacy: challenges to privacy prin- ciples and models. Data Science and Engineering , 1, 21–28. doi: https://doi.org/10.1007/ s41019-015-0001-x . 45

work page 2016
[39]

Steel, E., & Fowler, G. (2010). Facebook in privacy breach. The Wall Street Journal , 18. URL: https://www.wsj.com/articles/SB10001424052702304772804575558484075236968

work page 2010
[40]

Tang, J., Korolova, A., Bai, X., Wang, X., & Wang, X. (2017). Privacy loss in apple’s im- plementation of diﬀerential privacy on macos 10.12. arXiv preprint arXiv:1709.02753 , . URL: https://arxiv.org/abs/1709.02753

work page internal anchor Pith review Pith/arXiv arXiv 2017
[41]

Torra, V. (2017). Data Privacy: Foundations, New Developments and the Big Data Challenge . Springer. doi: https://doi.org/10.1007/978-3-319-57358-8

work page doi:10.1007/978-3-319-57358-8 2017
[42]

Torra, V. (2017). Fuzzy microaggregation for the transparency principle. Journal of Applied Logic, 23, 70–80. doi: https://doi.org/10.1016/j.jal.2016.11.007

work page doi:10.1016/j.jal.2016.11.007 2017
[43]

Vatsalan, D., Sehili, Z., Christen, P., & Rahm, E. (2017). Privacy-preserving record linkage for big data: Current approaches and research challenges. In Handbook of Big Data Technologies (pp. 851–895). Springer. doi: https://doi.org/10.1007/978-3-319-49340-4_25

work page doi:10.1007/978-3-319-49340-4_25 2017
[44]

Wei, Z., Wu, Y., Yang, Y., Yan, Z., Pei, Q., Xie, Y., & Weng, J. (2018). Autoprivacy: automatic privacy protection and tagging suggestion for mobile social photo. Computers & Security , . doi:https://doi.org/10.1016/j.cose.2017.12.002

work page doi:10.1016/j.cose.2017.12.002 2018
[45]

Wen, Y., Liu, J., Dou, W., Xu, X., Cao, B., & Chen, J. (2018). Scheduling workﬂows with privacy protection constraints for big data applications on cloud. Future Generation Computer Systems , . doi:https://doi.org/10.1016/j.future.2018.03.028

work page doi:10.1016/j.future.2018.03.028 2018
[46]

L., & Rosen, P

Wilson, R. L., & Rosen, P. A. (2008). Protecting data through’perturbation’techniques: The impact on knowledge discovery in databases. In Information Security and Ethics: Concepts, Methodologies, Tools, and Applications (pp. 1550–1561). IGI Global. doi: https://doi.org/10. 4018/978-1-59904-937-3

work page 2008
[47]

H., Frank, E., Hall, M

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques . Morgan Kaufmann. URL: https://books.google.com.au/books?isbn= 0128043571

work page 2016
[48]

C.-W., Fu, A

Wong, R. C.-W., Fu, A. W.-C., Wang, K., & Pei, J. (2007). Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd international conference on Very large data bases (pp. 543–554). VLDB Endowment. URL: https://dl.acm.org/citation.cfm?id=1325914. 46

work page 2007
[49]

Xu, L., Jiang, C., Chen, Y., Ren, Y., & Liu, K. R. (2015). Privacy or utility in data collection? a contract theoretic approach. IEEE Journal of Selected Topics in Signal Processing , 9, 1256–1269. doi:https://doi.org/10.1109/JSTSP.2015.2425798

work page doi:10.1109/jstsp.2015.2425798 2015
[50]

Zhou, J., Cao, Z., Dong, X., & Lin, X. (2015). Ppdm: A privacy-preserving protocol for cloud- assisted e-healthcare systems. IEEE Journal of Selected Topics in Signal Processing, 9, 1332–1344. doi:https://doi.org/10.1109/JSTSP.2015.2427113. 47

work page doi:10.1109/jstsp.2015.2427113 2015

[1] [1]

Aggarwal, C. C. (2015). Privacy-preserving data mining. In Data Mining (pp. 663–693). Springer. doi:https://doi.org/10.1007/978-3-319-14142-8

work page doi:10.1007/978-3-319-14142-8 2015

[2] [2]

C., & Yu, P

Aggarwal, C. C., & Yu, P. S. (2004). A condensation approach to privacy preserving data mining. In EDBT (pp. 183–199). Springer volume 4. doi: https://doi.org/10.1007/ 978-3-540-24741-8_12

work page 2004

[3] [3]

Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. In ACM Sigmod Record (pp. 439–450). ACM volume 29. doi: https://doi.org/10.1145/335191.335438

work page doi:10.1145/335191.335438 2000

[4] [4]

Aldeen, Y. A. A. S., Salleh, M., & Razzaque, M. A. (2015). A comprehensive review on privacy pre- serving data mining. SpringerPlus, 4, 694. doi:https://doi.org/10.1186/s40064-015-1481-x

work page doi:10.1186/s40064-015-1481-x 2015

[5] [5]

A., Hoehle, H., Goodarzi, S., & Venkatesh, V

Aloysius, J. A., Hoehle, H., Goodarzi, S., & Venkatesh, V. (2018). Big data initiatives in retail environments: Linking service process perceptions to shopping outcomes. Annals of operations research, 270, 25–51. doi: https://doi.org/10.1007/s10479-016-2276-3

work page doi:10.1007/s10479-016-2276-3 2018

[6] [6]

Bettini, C., & Riboni, D. (2015). Privacy protection in pervasive systems: State of the art and technical challenges. Pervasive and Mobile Computing , 17, 159–174. doi: https://doi.org/10. 1016/j.pmcj.2014.09.010

work page 2015

[7] [7]

Buccafurri, F., Lax, G., Nicolazzo, S., & Nocera, A. (2016). A threat to friendship privacy in facebook. In International Conference on Availability, Reliability, and Security (pp. 96–105). Springer. doi: https://doi.org/10.1007/978-3-319-45507-5_7

work page doi:10.1007/978-3-319-45507-5_7 2016

[8] [8]

Capraro, V., & Perc, M. (2018). Grand challenges in social physics: In pursuit of moral behavior. Frontiers in Physics , 6, 107. doi: https://doi.org/10.3389/fphy.2018.00107

work page doi:10.3389/fphy.2018.00107 2018

[9] [9]

Chamikara, M. A. P., Bertok, P., Liu, D., Camtepe, S., & Khalil, I. (2018). Eﬃcient data perturbation for privacy preserving and accurate data stream mining. Pervasive and Mobile Computing, 48, 1–19. doi: https://doi.org/10.1016/j.pmcj.2018.05.003. 42

work page doi:10.1016/j.pmcj.2018.05.003 2018

[10] [10]

Chen, K., & Liu, L. (2005). A random rotation perturbation approach to privacy preserving data classiﬁcation. The Ohio Center of Excellence in Knowledge-Enabled Computing , . URL: https://corescholar.libraries.wright.edu/knoesis/916/

work page 2005

[11] [11]

Chen, K., & Liu, L. (2011). Geometric data perturbation for privacy preserving outsourced data mining. Knowledge and Information Systems , 29, 657–695. doi:https://doi.org/10.1007/ s10115-010-0362-4

work page 2011

[12] [12]

Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., & Zhu, M. Y. (2002). Tools for privacy preserving distributed data mining. ACM Sigkdd Explorations Newsletter , 4, 28–34. doi: https: //doi.org/10.1145/772862.772867

work page doi:10.1145/772862.772867 2002

[13] [13]

Cuzzocrea, A. (2015). Privacy-preserving big data management: The case of olap. Big Data: Algorithms, Analytics, and Applications , (pp. 301–326;). URL: https://books.google.com.au/ books?isbn=1482240564

work page 2015

[14] [14]

Dwork, C., Roth, A. et al. (2014). The algorithmic foundations of diﬀerential privacy. Foundations and Trends R⃝ in Theoretical Computer Science , 9, 211–407. doi: http://dx.doi.org/10.1561/ 0400000042

work page 2014

[15] [15]

Erlingsson, ´U., Pihur, V., & Korolova, A. (2014). Rappor: Randomized aggregatable privacy- preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security (pp. 1054–1067). ACM. doi: https://doi.org/10.1145/2660267. 2660348

work page doi:10.1145/2660267 2014

[16] [16]

Gai, K., Qiu, M., Zhao, H., & Xiong, J. (2016). Privacy-aware adaptive data encryption strategy of big data in cloud computing. In Cyber Security and Cloud Computing (CSCloud), 2016 IEEE 3rd International Conference on (pp. 273–278). IEEE. doi: http://doi.ieeecomputersociety. org/10.1109/CSCloud.2016.52

work page doi:10.1109/cscloud.2016.52 2016

[17] [17]

G¨ avert, H., Hurri, J., S¨ arel¨ a, J., & Hyv¨ arinen, A. (2005). The fastica package for matlab.Lab Com- put Inf Sci Helsinki Univ. Technol , . URL: https://research.ics.aalto.fi/ica/fastica/

work page 2005

[18] [18]

Hasan, A., Jiang, Q., Luo, J., Li, C., & Chen, L. (2016). An eﬀective value swapping method for privacy preserving data publishing. Security and Communication Networks , 9, 3219–3228. doi:https://doi.org/10.1002/sec.1527. 43

work page doi:10.1002/sec.1527 2016

[19] [19]

Helbing, D., Brockmann, D., Chadefaux, T., Donnay, K., Blanke, U., Woolley-Meza, O., Mous- said, M., Johansson, A., Krause, J., Schutte, S. et al. (2015). Saving human lives: What complex- ity science and information systems can contribute. Journal of statistical physics , 158, 735–781. doi:https://doi.org/10.1007/s10955-014-1024-9

work page doi:10.1007/s10955-014-1024-9 2015

[20] [20]

Howell, D. C. (2016). Fundamental statistics for the behavioral sciences. Cengage Learning. URL: https://books.google.com.au/books?isbn=1305652975

work page 2016

[21] [21]

Jalili, M., & Perc, M. (2017). Information cascades in complex networks. Journal of Complex Networks, 5, 665–693. doi: https://doi.org/10.1093/comnet/cnx019

work page doi:10.1093/comnet/cnx019 2017

[22] [22]

Jones, H. (2012). Computer Graphics through Key Mathematics . Springer London : Imprint: Springer. URL: https://books.google.com.au/books?id=f7gPBwAAQBAJ

work page 2012

[23] [23]

O., & Swamy, M

Kabir, W., Ahmad, M. O., & Swamy, M. (2015). A novel normalization technique for multimodal biometric systems. In Circuits and Systems (MWSCAS), 2015 IEEE 58th International Midwest Symposium on (pp. 1–4). IEEE. doi: https://doi.org/10.1109/MWSCAS.2015.7282214

work page doi:10.1109/mwscas.2015.7282214 2015

[24] [24]

Kairouz, P., Oh, S., & Viswanath, P. (2014). Extremal mechanisms for local diﬀerential privacy. In Advances in neural information processing systems (pp. 2879–2887). URL: http://papers. nips.cc/paper/5392-extremal-mechanisms-for-local-differential-privacy

work page 2014

[25] [25]

Kerschbaum, F., & H¨ arterich, M. (2017). Searchable encryption to reduce encryption degradation in adjustably encrypted databases. In IFIP Annual Conference on Data and Applications Security and Privacy (pp. 325–336). Springer. doi: https://doi.org/10.1007/978-3-319-61176-1_18

work page doi:10.1007/978-3-319-61176-1_18 2017

[26] [26]

Kieseberg, P., & Weippl, E. (2018). Security challenges in cyber-physical production systems. In International Conference on Software Quality (pp. 3–16). Springer. doi: https://doi.org/10. 1007/978-3-319-71440-0_1

work page 2018

[27] [27]

Li, P., Li, J., Huang, Z., Gao, C.-Z., Chen, W.-B., & Chen, K. (2017). Privacy-preserving outsourced classiﬁcation in cloud computing. Cluster Computing , (pp. 1–10.). doi: https://doi. org/10.1007/s10586-017-0849-9

work page doi:10.1007/s10586-017-0849-9 2017

[28] [28]

Liu, K., Kargupta, H., & Ryan, J. (2006). Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Transactions on knowledge and Data Engineering, 18, 92–106. doi: https://doi.org/10.1109/TKDE.2006.14. 44

work page doi:10.1109/tkde.2006.14 2006

[29] [29]

M., & Sundarsekar, R

Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K. M., & Sundarsekar, R. (2017). Big data knowledge system in healthcare. In Internet of things and big data technolo- gies for next generation healthcare (pp. 133–157). Springer. doi: https://doi.org/10.1007/ 978-3-319-49736-5_7

work page 2017

[30] [30]

Maruskin, J. (2012). Essential Linear Algebra . Solar Crest Publishing, LLC. URL: https: //books.google.com.au/books?id=aOF3-hx3u1kC

work page 2012

[31] [31]

Muralidhar, K., Parsa, R., & Sarathy, R. (1999). A general additive data perturbation method for database security.management science, 45, 1399–1415. doi:https://doi.org/10.1287/mnsc. 45.10.1399

work page doi:10.1287/mnsc 1999

[32] [32]

Nell, W., & Shure, L. (2011). Memory proﬁling. URL: https://patents.google.com/patent/ US7908591B1/en uS Patent 7,908,591

work page 2011

[33] [33]

D., Okkalioglu, M., Koc, M., & Polat, H

Okkalioglu, B. D., Okkalioglu, M., Koc, M., & Polat, H. (2015). A survey: deriving private information from perturbed data. Artiﬁcial Intelligence Review , 44, 547–569. doi: https://doi. org/10.1007/s10462-015-9439-5

work page doi:10.1007/s10462-015-9439-5 2015

[34] [34]

Paeth, A. W. (2014). Graphics Gems V (Macintosh Version) . Academic Press. URL: https: //books.google.com.au/books?isbn=1483296695

work page 2014

[35] [35]

Park, K.-j., & Ryou, H.-b. (2003). Anomaly detection scheme using data mining in mobile environment. Computational Science and Its Applications ICCSA , (pp. 978–978.). doi: https: //doi.org/10.1007/3-540-44843-8_3

work page doi:10.1007/3-540-44843-8_3 2003

[36] [36]

Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., & Ren, K. (2016). Heavy hitter estimation over set- valued data with local diﬀerential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 192–203). ACM. doi: https://doi.org/10. 1145/2976749.2978409

work page arXiv 2016

[37] [37]

Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on (pp. 3–18). IEEE. doi: https://doi.org/10.1109/SP.2017.41

work page doi:10.1109/sp.2017.41 2017

[38] [38]

Soria-Comas, J., & Domingo-Ferrer, J. (2016). Big data privacy: challenges to privacy prin- ciples and models. Data Science and Engineering , 1, 21–28. doi: https://doi.org/10.1007/ s41019-015-0001-x . 45

work page 2016

[39] [39]

Steel, E., & Fowler, G. (2010). Facebook in privacy breach. The Wall Street Journal , 18. URL: https://www.wsj.com/articles/SB10001424052702304772804575558484075236968

work page 2010

[40] [40]

Tang, J., Korolova, A., Bai, X., Wang, X., & Wang, X. (2017). Privacy loss in apple’s im- plementation of diﬀerential privacy on macos 10.12. arXiv preprint arXiv:1709.02753 , . URL: https://arxiv.org/abs/1709.02753

work page internal anchor Pith review Pith/arXiv arXiv 2017

[41] [41]

Torra, V. (2017). Data Privacy: Foundations, New Developments and the Big Data Challenge . Springer. doi: https://doi.org/10.1007/978-3-319-57358-8

work page doi:10.1007/978-3-319-57358-8 2017

[42] [42]

Torra, V. (2017). Fuzzy microaggregation for the transparency principle. Journal of Applied Logic, 23, 70–80. doi: https://doi.org/10.1016/j.jal.2016.11.007

work page doi:10.1016/j.jal.2016.11.007 2017

[43] [43]

Vatsalan, D., Sehili, Z., Christen, P., & Rahm, E. (2017). Privacy-preserving record linkage for big data: Current approaches and research challenges. In Handbook of Big Data Technologies (pp. 851–895). Springer. doi: https://doi.org/10.1007/978-3-319-49340-4_25

work page doi:10.1007/978-3-319-49340-4_25 2017

[44] [44]

Wei, Z., Wu, Y., Yang, Y., Yan, Z., Pei, Q., Xie, Y., & Weng, J. (2018). Autoprivacy: automatic privacy protection and tagging suggestion for mobile social photo. Computers & Security , . doi:https://doi.org/10.1016/j.cose.2017.12.002

work page doi:10.1016/j.cose.2017.12.002 2018

[45] [45]

Wen, Y., Liu, J., Dou, W., Xu, X., Cao, B., & Chen, J. (2018). Scheduling workﬂows with privacy protection constraints for big data applications on cloud. Future Generation Computer Systems , . doi:https://doi.org/10.1016/j.future.2018.03.028

work page doi:10.1016/j.future.2018.03.028 2018

[46] [46]

L., & Rosen, P

Wilson, R. L., & Rosen, P. A. (2008). Protecting data through’perturbation’techniques: The impact on knowledge discovery in databases. In Information Security and Ethics: Concepts, Methodologies, Tools, and Applications (pp. 1550–1561). IGI Global. doi: https://doi.org/10. 4018/978-1-59904-937-3

work page 2008

[47] [47]

H., Frank, E., Hall, M

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques . Morgan Kaufmann. URL: https://books.google.com.au/books?isbn= 0128043571

work page 2016

[48] [48]

C.-W., Fu, A

Wong, R. C.-W., Fu, A. W.-C., Wang, K., & Pei, J. (2007). Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd international conference on Very large data bases (pp. 543–554). VLDB Endowment. URL: https://dl.acm.org/citation.cfm?id=1325914. 46

work page 2007

[49] [49]

Xu, L., Jiang, C., Chen, Y., Ren, Y., & Liu, K. R. (2015). Privacy or utility in data collection? a contract theoretic approach. IEEE Journal of Selected Topics in Signal Processing , 9, 1256–1269. doi:https://doi.org/10.1109/JSTSP.2015.2425798

work page doi:10.1109/jstsp.2015.2425798 2015

[50] [50]

Zhou, J., Cao, Z., Dong, X., & Lin, X. (2015). Ppdm: A privacy-preserving protocol for cloud- assisted e-healthcare systems. IEEE Journal of Selected Topics in Signal Processing, 9, 1332–1344. doi:https://doi.org/10.1109/JSTSP.2015.2427113. 47

work page doi:10.1109/jstsp.2015.2427113 2015