Geometric Data Perturbation for
Privacy-preserving Data Classification
Keke
Chen and Ling Liu
|
|
This project investigates a
random-geometric-transformation based data-perturbation approach for privacy
preserving data classification. The goal of this perturbation approach is
two-fold: preserving the utility of data in terms of classification modeling,
and preserving the privacy of data. To achieve the first goal, we identify that
many classification models utilize the geometric properties of datasets,
which can be preserved by geometric transformation. We prove that the three
types of well-known classifiers will deliver the same (or very similar)
performance over the geometrically perturbed dataset as over the original
dataset. As a result, this perturbation approach guarantees almost no loss of
accuracy for three popular classification methods. To reach the second goal,
we propose a multi-column privacy model to address the problems of evaluating
privacy quality for multidimensional perturbation, and develop an
attack-resilient perturbation optimization method. We analyze three types of
inference attacks: naive estimation, ICA-based reconstruction, and
distribution-based attacks with the proposed privacy metric. Based on the
attack analysis, a randomized optimization method is developed to optimize
perturbation. Our initial experiments show that this approach can provide
high privacy guarantee while preserving the accuracy for the discussed
classifiers. More related geometric transformations will
be investigated to meet the requirements of different privacy-preserving
mining tasks and models. |
Find the Matlab code. |
Representative papers:
|