查看论文信息

中文题名：	shuffle 模型中的差分隐私
姓名：	唐鸿宇
学号：	20011210574
保密级别：	公开
论文语种：	chi
学科代码：	110505
学科名称：	军事学 - 军队指挥学 - 密码学
学生类型：	硕士
学位：	军事学硕士
学校：	西安电子科技大学
院系：	通信工程学院
专业：	军队指挥学
研究方向：	隐私保护
第一导师姓名：	陈原
第一导师单位：	西安电子科技大学
完成日期：	2023-03-20
答辩日期：	2023-05-22
外文题名：	Differential Privacy in the Shuffle Model
中文关键词：	差分隐私 ; shuffle模型 ; 键值对 ; 差分遗忘shuffle
外文关键词：	differential privacy ; shuffle model ; key-value ; differentially oblivious shuffle
中文摘要：	︿随着数字化时代的到来，大数据的采集和使用已经成为很多应用不可或缺的一部分。然而，收集过程中数据的泄露会对用户的个人隐私带来严重威胁，导致各种潜在的安全问题。差分隐私作为一种有效的隐私保护手段，已引起学术界和业界的广泛关注，其基本思想是在保证数据可用性的前提下，对数据添加一定的噪声或随机扰动以保护隐私。差分隐私一般分为中心化模型和本地化模型，新近提出的shuffle模型通过对本地化扰动后的数据添加shuffle操作，能实现隐私放大，且提高对数据进行分析估计时的准确性。本文首先研究了差分隐私的三种模型的工作原理和相关算法；其次研究了针对键值对数据的差分隐私数据收集问题和已有的本地化机制： PrivKVM和PCKV，因PrivKVM 机制需要多轮通信提升估计的准确性，而PCKV机制对不同数据集参数选取困难，提出一种基于shuffle模型的改进机制，并且评估了污染攻击对该算法的影响；最后研究了shuffle 模型中shuffle操作的具体实现方法：差分遗忘shuffle算法，已有的基于洋葱路由的算法需要通过多轮通信，而基于可逆式布隆查找表（IBLT）的算法虽然只需一轮通信，但无法扩展以抵抗恶意攻击，因此提出一种改进方法。本文取得的主要成果如下： 1.提出了一种基于shuffle 模型的键值对数据收集机制，采用更精确的离散化范围，使键值对数据中的值在扰动过后更接近真实值，在用户端生成额外的随机数据，扩大shuffle 能提供的隐私性，而且提供更多的参数选择以适用于更多场景。我们对该机制的差分隐私性和估计准确性进行了理论证明和仿真实验。另外，分析了三种污染攻击对于该机制的收益，结果均优于已有算法。 2.提出了一种两轮通信的差分遗忘shuffle实现算法，基于ElGamal加密，使用两轮重加密的方式设计了一种IBLT向量安全求和算法，不仅只需两轮通信，而且容易扩展到有恶意攻击者的情况。本文给出了相关的理论证明和实验验证。﹀
外文摘要：	︿ With the advent of the digital age, the collection and use of big data has become an essential part of many applications. However, data leakage during the collection process can pose a serious threat to the users’ personal privacy, leading to various potential security issues. Differential privacy, as an effective means of privacy protection, has attracted widespread attention from academia and industry. Its basic idea is to add certain noise or random perturbations to the data to protect privacy while ensuring data availability. Differential privacy is generally divided into centralized and localized models. The newly proposed shuffle model can achieve privacy amplification by adding shuffle operations to locally perturbed data and improve the accuracy of data analysis and estimation. In this thesis, we first study the working principles and related algorithms of the three models of differential privacy. Secondly, we investigate the differential private data collection problem for key-value pair data and existing localized mechanisms: PrivKVM and PCKV. We observe that PrivKVM requires multi-round communication to improve the estimation accuracy, while for PCKV it is difficult to select parameters for different dataset. We propose an improved algorithm based on the shuffle model, and evaluate the impact of pollution attacks on it. Finally, we study the specific implementation methods of the shuffle operation in the shuffle model, that is, differentially oblivious shuffle algorithm. For existing algorithms, the onion routing based one requires multi-round communication, although the invertible Bloom lookup table (IBLT) based one requires only one-round communication, it cannot be extended to resist malicious attacks. We propose an improved algorithm. The main achievements are as follows: 1. A key-value pair data collection mechanism based on the shuffle model is proposed. This mechanism adopts a more accurate discretization range, making the values in the perturbed key-value pair data closer to the true value. Additional random data is generated at the user end to further expand the privacy protection provided by the shuffle operation, and more parameter choices are provided for applying in more scenarios. We provide theoretical proofs and simulation experiments for the differential privacy and estimation accuracy of our mechanisim. In addition, we analyze the benefits of three kinds of pollution attacks against our mechanism, which are smaller than those of all existing algorithms. 2. A two-round communication differentially oblivious shuffle implementation algorithm is proposed. In which, we design a two-round, ElGamal encryption based re-encryption IBLT vector secure summation algorithm. It requires only two-round communication, and is easy to extend to resist malicious attackers. We provide relevant theoretical proofs and experimental verification. ﹀
参考文献：	︿ [1] Facebook-Cambridge Analytica data scandal[EB/OL]. Wikipedia. [2023-02-28]. https://en.wikipedia.org/wiki/Facebook%E2%80%93Cambridge_Analytica_data_scandal. [2] PERLROTH N, SINGER N. Hackers are targeting healthcare facilities in a ’massive campaign’[EB/OL]. 2020 [2023-02-28]. https://www.nytimes.com/2020/05/05/us/hospitalscyberattackscoronavirus.html. [3] Equifax data breach: What you need to know[EB/OL]. CNN Business. [2023-02-28]. https://www.cnn.com/2017/09/08/tech/equifax-hack-what-to-do/index.html. [4] POELL D M, STORTZ M J. Delta Airlines Sues Vendor for Data Breach[EB/OL]. 2019 [2023-03-13]. https://www.lexology.com/library/detail.aspx?g=1ecf6c7a-332b-4c17-92ba-3bb2085e6bd9. [5] CONSULTING I. General Data Protection Regulation[EB/OL]. [2021-04-20]. https://gdpr-info.eu/. [6] DALENIUS T. Towards a methodology for statistical disclosure control[J]. 1977. [7] SWEENEY L. k-anonymity: A model for protecting privacy[J]. International Journal of Uncertainty Fuzziness & Knowledge-Based Systems, 2002, 10(5): 557-570. [8] MACHANAVAJJHALA A, KIFER D, GEHRKE J, et al. l-diversity: Privacy beyond k-anonymity[J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2007, 1(1): 3-es. [9] LI N, LI T, VENKATASUBRAMANIAN S. t-closeness: Privacy beyond k-anonymity and l-diversity[C]//2007 IEEE 23rd International Conference on Data Engineering (ICDE). 2007: 106-115. [10] SAMARATI P, SWEENEY L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression[C]//Proceedings of IEEE Symposium on Research in Security and Privacy. 1998: 384-385. [11] XU J, WANG W, PEI J, et al. Utility-based anonymization using local recoding[C]//Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006:785-790. [12] XU J, WANG W, PEI J. Maintaining k-anonymity against incremental updates[C]//Proceedings ofSIAM International Conference on Data Mining (SDM). 2007: 758-759. [13] NERGIZ M E, ATZORI M, SAYGIN Y. Towards trajectory anonymization: a generalization-based approach[C]//Proceedings of the SIGSPATIAL ACM GIS 2008 International Workshop on Security and Privacy in GIS and LBS. 2008: 52-61. [14] ZHOU B, PEI J. Preserving privacy in social networks against neighborhood attacks[C]//2008 IEEE24th International Conference on Data Engineering. 2008: 506-515. [15] LIU K, TERZI E. Towards identity anonymization on graphs[C]//Proceedings of the 2008 ACMSIGMOD international conference on Management of data. 2008: 93-106. [16] ZOU L, CHEN L, ÖZSU M T. K-automorphism: A general framework for privacy preserving network publication[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 946-957. [17] DWORK C. Differential privacy[C]//Automata, Languages and Programming: 33rd InternationalColloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33. 2006: 1-12. [18] DWORK C, KENTHAPADI K, MCSHERRY F, et al. Our data, ourselves: Privacy via distributednoise generation[C]//Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 2006: 486-503. [19] BLUM A, DWORK C, MCSHERRY F, et al. Practical privacy: The SuLQ framework[C]//Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 2008: 128-138. [20] CHAUDHURI K, MONTELEONI C, SARWATE A D. Differentially private empirical risk minimization[J]. Journal of Machine Learning Research, 2011, 12(3): 1069-1109. [21] XIONG L, CHEN T, HUANG X, et al. A centralized differential privacy-based framework for healthcare data sharing[J]. Journal of medical systems, 2016, 40(3): 60. [22] ZHU Y, LIU W, ZHANG Z, et al. A centralized differential privacy-based social network user data aggregation scheme[J]. Journal of Ambient Intelligence and Humanized Computing, 2019, 10(1): 275-285. [23] WARNER S L. Randomized response: A survey technique for eliminating evasive answer bias[J].Journal of the American Statistical Association, 1965, 60(309): 63-69. [24] DUCHI J C, JORDAN M I, WAINWRIGHT M J. Local privacy and statistical minimax rates[C]//IEEE Conference on Foundations of Computer Science. IEEE, 2013: 429-438. [25] YANG Q, NSOESIE E O. Search Trends Symptoms Dataset: An Early Indicator of COVID-19 Outbreaks[EB/OL]. 2020. https://arxiv.org/abs/2009.01265. [26] YANG Q, NSOESIE E O. Vaccination Search Insights: Mining Search Engine Data for COVID-19 Vaccine Hesitancy and Sentiment Analysis[EB/OL]. 2021. https://arxiv.org/abs/2107.01179. [27] Apple Inc. Differential Privacy Overview[EB/OL]. 2020. https://www.apple.com/privacy/docs/Diff erential_Privacy_Overview.pdf. [28] Apple Machine Learning Research. Learning with Privacy at Scale[EB/OL]. 2017. https://docs-ass ets.developer.apple.com/ml-research/papers/learning-with-privacy-at-scale.pdf. [29] YANG Q, NSOESIE E O, ZHANG Z L. U.S. Broadband Coverage Dataset: A Large-scale Dataset for Broadband Coverage Analysis[EB/OL]. 2021. https://arxiv.org/abs/2103.14035. [30] ERLINGSSON Ú, PIHUR V, KOROLOVA A. Rappor: Randomized aggregatable privacy- preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 2014: 1054-1067. [31] WANG T, BLOCKI J, LI N, et al. Locally differentially private protocols for frequency estimation [C]//26th USENIX Security Symposium (USENIX Security 17). 2017: 729-745. [32] DUCHI J C, JORDAN M I, WAINWRIGHT M J. Local privacy, data processing inequalities, and statistical minimax rates[Z]. 2013. arXiv: 1302.3203 [cs.IT]. [33] NGUYÊN T T, XIAO X, YANG Y, et al. Collecting and analyzing data from smart device users with local differential privacy[J]. arXiv preprint arXiv:1606.05053, 2016. [34] YE Q, HU H, MENG X, et al. PrivKV: Key-value data collection with local differential privacy[C] //2019 IEEE Symposium on Security and Privacy (SP). 2019: 317-331. [35] GU X, LI M, CHENG Y, et al. PCKV: Locally differentially private correlated key-value data col- lection with optimized utility[C]//Proceedings of the 29th USENIX Conference on Security Sympo- sium. 2020: 967-984. [36] BEIMEL A, NISSIM K, OMRI E. Distributed private data analysis: Simultaneously solving how and what[C]//WAGNER D A. Lecture Notes in Computer Science: Advances in Cryptology - CRYPTO 2008: vol. 5157. Springer, 2008: 451-468. [37] CHAN T H, SHI E, SONG D. Optimal lower bound for differentially private multiparty aggregation [C]//European Symposium on Algorithms. Springer, 2012: 277-288. [38] BITTAU A, ERLINGSSON Ú, MANIATIS P, et al. Prochlo: Strong privacy for analytics in the crowd[C]//Proceedings of the 26th symposium on operating systems principles. 2017: 441-459. [39] CHEU A, SMITH A, ULLMAN J, et al. Distributed differential privacy via shuffling[C]//Advances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Ap- plications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part I 38. 2019: 375-403. [40] ERLINGSSON Ú, FELDMAN V, MIRONOV I, et al. Amplification by shuffling: From local to cen- tral differential privacy via anonymity[C]//Proceedings of the Thirtieth Annual ACM-SIAM Sym- posium on Discrete Algorithms. 2019: 2468-2479. [41] BALLE B, BELL J, GASCÓN A, et al. The privacy blanket of the shuffle model[C]//Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part II 39. 2019: 638-667. [42] FELDMAN V, MCMILLAN A, TALWAR K. Hiding among the clones: A simple and nearly op- timal analysis of privacy amplification by shuffling[C]//2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). 2022: 954-964. [43] WANG T, DING B, XU M, et al. Improving utility and security of the shuffler-based differential privacy[J]. arXiv preprint arXiv:1908.11515, 2019. [44] CHEU A, ZHILYAEV M. Differentially private histograms in the shuffle model from fake users[C] //2022 IEEE Symposium on Security and Privacy (SP). 2022: 440-457. [45] LI X, LIU W, FENG H, et al. Privacy enhancement via dummy points in the shuffle model[J]. arXiv preprint arXiv:2009.13738, 2020. [46] SCOTT M, CORMODE G, MAPLE C. Aggregation and Transformation of Vector-Valued Messages in the Shuffle Model of Differential Privacy[J]. IEEE Transactions on Information Forensics and Security, 2022, 17: 612-627. [47] LI X, LI B, VENKATESAN R. Secure data aggregation with differential privacy for mobile crowd- sensing[J]. IEEE Transactions on Dependable and Secure Computing, 2019, 17(4): 839-853. [48] WANG T, CHEN M, LI Y, et al. A hybrid approach to privacy-preserving federated learning based on differential privacy and secure multi-party computation[J]. IEEE Transactions on Services Com- puting, 2021, 14(2): 283-297. [49] SAMPIGETHAYA K, POOVENDRAN R. A survey on mix networks and their secure applications [J]. Proceedings of the IEEE, 2006, 94(12): 2142-2181. [50] CHAUM D. The dining cryptographers problem[J]. J. cryptology, 1988, 1: 65-75. [51] NEFF C A. A verifiable secret shuffle and its application to e-voting[C]//Proceedings of the 8th ACM conference on Computer and Communications Security. 2001: 116-125. [52] DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters[J]. 2004. [53] DAVIDSON A, OR A. Optimizing shuffle performance in spark[J]. University of California, Berkeley-Department of Electrical Engineering and Computer Sciences, Tech. Rep, 2013. [54] CHAN T H H, CHUNG K M, MAGGS B, et al. Foundations of differentially oblivious algorithms [J]. ACM Journal of the ACM (JACM), 2022, 69(4): 1-49. [55] GORDON D, KATZ J, LIANG M, et al. Spreading the privacy blanket: Differentially oblivious shuffling for differential privacy[C]//Applied Cryptography and Network Security: 20th International Conference, ACNS 2022, Rome, Italy, June 20–23, 2022, Proceedings. 2022: 501-520. [56] ZHOU M, SHI E. The Power of the Differentially Oblivious Shuffle in Distributed Privacy Mecha- nisms[J]. Cryptology ePrint Archive, 2022. [57] ELGAMAL T. A public key cryptosystem and a signature scheme based on discrete logarithms[J]. IEEE transactions on information theory, 1985, 31(4): 469-472. [58] VAN OORSCHOT P C, MENEZES A J, VANSTONE S A. Handbook of applied cryptography[M]. CRC press, 1996. [59] GOODRICH M T, MITZENMACHER M. Invertible bloom lookup tables[C]//2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). 2011: 792-799. [60] DWORK C, ROTH A, et al. The algorithmic foundations of differential privacy[J]. Foundations andTrends® in Theoretical Computer Science, 2014, 9(3–4): 211-407. [61] KASIVISWANATHAN S P, LEE H K, NISSIM K, et al. What can we learn privately?[J]. SIAM Journal on Computing, 2011, 40(3): 793-826. [62] CHEU A, SMITH A, ULLMAN J. Manipulation attacks in local differential privacy[C]//2021 IEEE Symposium on Security and Privacy (SP). 2021: 883-900. [63] CAO X, JIA J, GONG N Z. Data poisoning attacks to local differential privacy protocols[C]//30th {USENIX} Security Symposium ({USENIX} Security 21). 2021. [64] WU Y, CAO X, JIA J, et al. Poisoning Attacks to Local Differential Privacy Protocols for {Key- Value} Data[C]//31st USENIX Security Symposium (USENIX Security 22). 2022: 519-536. [65] WANG N, ZHENG W, WANG Z, et al. Collecting and analyzing key-value data under shuffled differential privacy[J]. Frontiers of Computer Science, 2023, 17(2): 172606. [66] BELL J H, BONAWITZ K A, GASCÓN A, et al. Secure single-server aggregation with (poly) log- arithmic overhead[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Com- munications Security. 2020: 1253-1269. [67] ACHARYA J, SUN Z, ZHANG H. Hadamard response: Estimating distributions privately, efficiently, and with little communication[C]//The 22nd International Conference on Artificial Intelligence and Statistics. 2019: 1120-1129 ﹀
中图分类号：	TN91
馆藏号：	59503
开放日期：	2023-12-23

附件下载