Hosting data query services in public clouds is an attractive solution for its great scalability and significant cost savings. However, data owners also have concerns on data privacy due to the lost control of the infrastructure. This demonstration shows a prototype for efficient and confidential range query services built on top of the random space perturbation (RASP) method. The RASP approach provides a privacy guarantee practical to the setting of cloud-based computing, while enabling much faster query processing compared to the encryption-based approach. This demonstration will allow users to more intuitively understand the technical merits of the RASP approach via interactive exploration of the visual interface.
Download: version 1.0
- Zohreh Alavi, Lu Zhou, James Powers, and Keke Chen, "RASP-QS: Efficient and Confidential Query Services in the Cloud", International Conference on Very Large Databases (VLDB), Demonstration Session, 2014
- Huiqi Xu, Shumin Guo, and Keke Chen: "Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation ", accepted by IEEE TKDE in Nov. 2012, appears in Volume 26, Issue 2, 2014
- Keke Chen, Ramakanth Kavuluru, Shumin Guo " RASP: Efficient Multidimensional Range Query on Attack-Resilient Encrypted Databases ", ACM Conference on Data and Application Security and Privacy (CODASPY), 2011
You need to install the following items before you can run the demo system.
- The latest version Java
- Python 2.7 and numpy
After that, simply extract the downloaded zip file. The first time running the demo, you will need to configure the major paths: the python installation directory, the python code directory, and the working directory, with the “Config” button. The default setting works for Windows environments in normal cases. For linux, you will definitely need to change the setting.
The current version of the demo system is a simple client-only visual interface to show
- how the data is perturbed,
- how the query is transformed,
- and the result of two-stage query processing.
The visualization part uses our previously developed VISTA system for visual exploration of multidimensional datasets.
The target datasets are multi-dimensional numeric data. Each row in the data file should contain a comma separated data record. We have included three sample datasets in the demo package:
- adult.d5.20k, which is derived from the Adult dataset in the UCI database (http://archive.ics.uci.edu/ml/datasets/Adult) . It only contains 5 dimensions.
- uniform.d5.100k and normal.d5.100k are two synthetic datasets, with multi-dimensional uniform and normal distributions, respectively. These two datasets are dense, and thus the labeled items might be overlapped by other items. Interactively explore the visualization to find the labeled items. Both have 5 dimensions.
On the top side of the window are the dataset and query setup controls. From this area, you can select a dataset, generate perturbation parameter, perturb data, generate queries, and transform queries. When the datasets and query results are ready, the results will be visualized on the bottom two panels.
To test queries, you can choose to generate random queries and optionally tune it manually with the “Query” controls. Once the original query is composed, you can transform it and observe the transformed query that contains the Minimum Bounding Box (MBR) and the query matrix for each bound (the lower and upper bounds for each dimension).
We do not include the index-supported query processing in this demo. Therefore, for large datasets it will take some time to process the query. The query result on the “Original Data” panel is highlighted with red points. Correspondingly, on the “RASP Perturbed Data” panel you can find these red points, but in the perturbed space. The “RASP” visualization also includes the blue points, which are the result of the first stage of RASP query processing.
Figure 1: the UI overview
After choosing dataset, the dataset will be visualized automatically. The initial Python program is set to “C:/Python27/python”, the python code is in the subdirectory “./python/”of the current directory, and the working directory is set to “./tmp”. You will need to change them if your system setting is not like this, especially for Linux systems. Once the dataset is loaded, some buttons are enabled to allow further steps.
Figure 2. Visualization of the original dataset. The five blue dots are for dimensional parameter control. You can press on any one of them to change the weights of that dimension.
Click “Perturb Dataset” to generate perturbation matrix A, perturb the original data, and visualize the perturbed data. Click “Show RASP Parameters”, you will see the Matrix A.
Figure 3. Perturb data, visualize the perturbed data, and check the matrix “A” for perturbation.
On “Query” control section, you can choose the span of the random query first. Due to the sparsity of the multidimensional space, larger spans will give you better chance to include some records in query results. Typically Spans >30% are good for higher dimensional datasets. By clicking “Get Random Query”, you can generate one random query for the original dataset. The result can be observed in the combobox. You can further tune each dimension by selecting it from the dropdown list, type in values, and click “OK”.
Figure 4. Generate a random range query for the original dataset
After you are done with editing the range. By clicking “Run Original Query”, you can see the result of the query are highlighted as red points on the left bottom panel. If you cannot see them, probably the result set contains 0 records, or it contains only a few points that are buried by other points. In the latter case, by tuning the visualization you can see them.
Figure 5. Visualization of the query results on the original dataset.
Click “Transform Query” to transform the query to the perturb space and then you can click the next button to observe the details of the transformed query. After that the “Run RASP Query” button should be enabled. Click it you will see the results are highlighted in blue (the 1st stage result – the final result) and in red (the final result). Tune the visualization parameters to find the best result.
Figure 6. Visualization of the RASP query result in the perturbed space.