#+TITLE: The Cloud Vista Demo System 
#+AUTHOR: Shumin Guo
#+EMAIL: guo.18@wright.edu
#+DATE: 2012-11-20
#+OPTIONS: toc:nil
#+OPTIONS: skip:t
#+OPTIONS: author:nil
#+OPTIONS: date:nil
#+OPTIONS: email:nil
#+STYLE: <link rel="stylesheet" type="text/css" href="cloudvistadoc.css" />

* Introduction

Analysis of big data has become an important problem for many business
and scientific applications, among which clustering and visualizing
clusters in big data raise some unique challenges. This demonstration
presents the CloudVista prototype system to address the problems with
big data caused by using existing data reduction approaches. It
promotes a whole-big-data visualization approach that preserves the
details of clustering structure. The prototype system has several
merits. (1) Its visualization model is naturally parallel, which
guarantees the scalability. (2) The visual frame structure minimizes
the data transferred between the cloud and the client. (3) The RandGen
algorithm is used to achieve a good balance between interactivity and
batch processing. (4) This approach is also designed to minimize the
Financial cost of interactive exploration in the cloud. The
demonstration will highlight the problems with existing approaches and
show the advantages of the CloudVista approach. The viewers will have
the chance to play with the CloudVista prototype system and compare
the visualization results generated with different approaches.

* The demo system

** Sample exploration video
[[http://www.youtube.com/watch?v=IxXx-fFC8Rk][Demo Video]]

** Client-cloud demo system
   Download demo system [[file:cloudvista-1.0.zip][cloudvista-1.0.zip]].

** Related publications
   [[http://www.cs.wright.edu/~keke.chen/papers/cloudvista_demo.pdf][CloudVista: Interactive and Economical Visual Cluster Analysis for Big
   Data in the Cloud]] Demo, VLDB, Istanbul Turkey, 2012. 

   [[http://www.cs.wright.edu/~keke.chen/papers/cloudvis-ssdbm.pdf][CloudVista: visual cluster exploration for extreme scale data in the
   cloud]] Full paper, SSDBM, Portland, Oregon, 2011.

* Installation Instructions

** Prerequisites
   Java version >= 1.6 

   Working Hadoop cluster

** Hadoop Cluster Configuration. 
   The demo system does not have special requirement on the hadoop
   cluster. You can use default settings for all the hadoop
   configuration parameters, for more configuration details, please refer to
   the official hadoop [[http://hadoop.apache.org/][document]]. 

** Private Key Generation for Passwordless login to hadoop server. 
   The demo system uses the ssh protocol for the communication between
   client and the hadoop server. So, a private key file is required for the
   login purposes. The private key generation for windows and linux are
   different, please reference the respective documentation in the following
   sections. 

- Generate private key in windows system.
   We recommend to use the open source putty suite software PuTTYgen,
   which can be downloaed from [[http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html][here]]. 
   
   Another nice reference to generate private key with putty is the
   aws document in [[http://docs.amazonwebservices.com/AmazonEC2/gsg/2007-01-19/putty.html][this]] link. 

   After generting the private key, you need to upload/copy the public key
   into the hadoop server and configure the demo system the path to the
   private key under local file system. 

- Generate private key in linux system.
   We recommend openSSH to generate key files under linux, and it is installed
   by default in most linux distributions. 
   
   You can use the following command to generate key-pair: 

#BEGIN_SRC
   ssh-keygen 
#END_SRC

   Next, use the following command to copy your public key to the
   cloud/hadoop server. 

#BEGIN_SRC
   *ssh-copy-id <username>@<hostname>*
#END_SRC

   Make sure the passwordless login is correctly configured by logging out from the
   server and login again. If login is sucessful without prompting for
   password, it means the configuration is successful. 

   Next step is to tell the demo system where to find the private key
   file from local file system. 

   Please see figure 

** Demo System Configuration. 

   Users can modify configurations through user interface. Please see figure 

#+CAPTION: CloudVista Client main window.
#+ATTR_HTML: alt="cloud vista" title="cloudvista" align="center" width="100%"
     [[file:cloudvista.png][file:cloudvista.png]]

# TODO include figure here. 

   The following configuration items are required to run the demo system:
   
   - *Local working directory*, default value is the directory where the
     program is run. 

   - *Remote/Server working directory*, this one is required
     (non-optional), it is used to store a mirror of the local working
     directory, and will be used to sync from the server to local
     machine. This directory is usually under the user's (whose
     private key is generated) home directory. 

   - *Map/Reduce* jar file(RR.jar) on the server, put it into the root working directory
     of the server. 

   - *Remote/Server hadoop HDFS directory* , where dataset data is hosted for
     mapreduce jobs. 

   - *Path to the private key file* in local file system. 

   - *The user name* to login to the cloud server. 

   - *The server host name* (DNS name) or *ip* address. 

   - *The server SSHD service port number* (default is 22). 

* Running the demo

** Start program. 
To start up the demo program, you need to unzip the who software
package with un-packing tools, such as gunzip. 

From the command line run the startup script to run the program. 

For *windows*, start the command prompt window and cd to the directory
where you un-packed the software then type in the following command: 

#BEGIN_EXAMPLE
*> run.bat*
#END_EXAMPLE

For *linux*, start terminal and change to the un-pack directory and
  run the following command. 

#BEGIN_EXAMPLE
*$ ./run.sh*
#END_EXAMPLE


* The user interface. 

The demo client window contains two major area, visualization on the left
and exploration manipulation on the right. 

** The Visualization Region

On the left is the visual frame visualization window, in which data points
will be visualized with different colors based on the density, in general,
the lower the density of a point, the deeper in black the point will be in the
visualization and the higher the density the darker in red the point will
be. This makes it easy to identify interesting clusters buried in high
dimension datasets. 

Users can operate on the visual frames such as zoom in/out and move the
frames to different directions in the visual window. Visual explorations
will be created in batch, in which a number of visual frames are generated
in one exploration. In order to inspect these frames one by one, user can
use the auto-play mode. The visual frames of the exploration will be
displayed one by one in sequence. The visual frame play control tool bar
makes it easy to control the auto-play of visual frames. 

** The Management Area
On the right side of the client window is the exploration manipulation and
and management window, from this area, you can create new explorations,
edit, delete and visualize existing explorations etc. These operations can be
done by a handy popup menu by right clicking a node in the tree. 

The tree view shows the structure of the explorations currently exist on the
client machine. The root of the tree is "Datasets" the next level contains
all datasets currently available. And under each dataset, explorations of
this dataset is kept. The deepest level of the tree corresponds to sub-set
exploration generated from existing exploration by selecting a sub-set
region.

Under the tree structure is an area for editing existing explorations or
modifying parameters when creating new explorations. The "Save" button is
used to apply the changes. 

On the bottom of the management area lies the system status area, which
shows operations on the client etc.

* Configure from Main Window

After the program is started you were able to configure the
aforementioned settings by clicking the configure button on the
right-upper corner of the window. 

After fill in all the fields, click the "OK" button. 
#+CAPTION: System configuration. 
#+ATTR_HTML: alt="configuration window" title="Configuration window." align="center" width="100%"
     [[file:settings.png][file:settings.png]]
* Manipulation Instructions

** Data set operations. 
   If the HDFS directory is correctely set, all available datasets
   (directories under the HDFS dataset root) will be automatically 
   synced to the local client. These datasets will be read only during 
   all the operations. 

   You can add a new dataset to the HDFS root directory, it will be
   synced to local client automatically. 

   -- NOTE: The auto-sync feature currently is not available, so you need to
   manually specify the name of the dataset in local working directory to make
   the exploration work. For example, you have a dataset named sampleDs, you
   will need to create a directory $EXPLORE_DIR/sampleDs. If the dataset is not
   created, you will not be able to create explorations. 

** Exploration Operations. 
   Explorations are organized in tree format, you can create, edit,
   visualize, delete and visually explore explorations. 
   
   - To create a new exploration, right click a dataset item, then
     selected "New" in the popup menu, you are asked to enter the
     exploration name, and set up other exploration parameters under
     the exploration tree window. 

   - To edit an exploration, right click the desired exploration item
     on the tree view, then selected the edit button, then you can
     edit the parameters for the exploration, after the parameters are
     changed, click the "Save" button, or right click the item then
     select "Save" from the popup menu. 

   - To visualize an exploration, right click the desired exploration
     item in the exploration tree view window, and then select
     "Visualize" from the popup menu. If the selected exploration has
     been built previously, they will be loaded from local cache,
     which is fast, otherwise, a hadoop Map/Reduce job will be started
     to built this exploration on the server, after the hadoop job is
     done, the visual frame files will be downloaded from the server
     to the client and then loaded into the visualizer. 

   - To delete an exploration, right clicked the desired exploration,
     then select "Delete" from the popup menu. Then this item will
     disappear from the client window as well as the local exploration
     directory. But the server will have a back up of this exploration
     (current option) and can be deleted if desired. 

   - To visually explore an exploration, first visualize the
     exploration in the left windows of the client program
     (window). You can move visual frame left, right, bottom and
     up. You can zoom in and out the visual frame and play the visual
     frames frame by frame automatically. And more importantly,
     select interesting area for more detailed exploration (sub-set
     exploration, see next section).
   
** Subset Exploration Operations. 
A sub-set exploration is created from an existing exploration by
selecting an interesting area. More details about the data cluster can
be viewed through sub-set explorations. 

When a sub-area of a visual frame is selected, the life cycle of a
subset exploration get started, you first need to specify the
name of this sub-exploration, and you can also set the parameters
similar to a regular exploration and all the regular exploration
operations are supported by a sub-set exploration except that you can
not create it on the fly but through an existing exploration. 

* Related Publications

Please reference our paper throw the following BibTex: 
#+BEGIN_EXAMPLE
@article{Xu:2012:CIE:2367502.2367529,
 author = {Xu, Huiqi and Li, Zhen and Guo, Shumin and Chen, Keke},
 title = {CloudVista: interactive and economical visual cluster
 analysis for big data in the cloud},
 journal = {Proc. VLDB Endow.},
 issue_date = {August 2012},
 volume = {5},
 number = {12},
 month = aug,
 year = {2012},
 issn = {2150-8097},
 pages = {1886--1889},
 numpages = {4},
 url = {http://dl.acm.org/citation.cfm?id=2367502.2367529},
 acmid = {2367529},
 publisher = {VLDB Endowment},
}
#+END_EXAMPLE

* Copyright Notice
----------------
This software is for research purpose only, several third party
software packages are used during the development of this demo. We
would like to thank the authors of these software. 
