Data Deduplication
 
 

Thank you in advance for contributing in this research.
The goal of this project is to measure the chance of data deduplication across different users.  We developed a java-based application for this purpose. This application collects the following information on your machine:

    - hash value of file  (nothing about the content)

    - file size


To run this application, please follow the instruction based on your operating system:

    -Windows

    - OSX

    - Linux


Windows

  1. 1.Install java, if you did not already install it. You can download it from the following URL:

    - x86 (32 bits)

    - x64 (64 bits)

  1. 2.Download the application from here. Uncompress it. It looks like:


  1. 3.Double click on dedup.exe to open the application.

  2. 4.Click on “start” button to start application:

  

It takes few hours to run ;) (depends on your computer specification)

  1. 5.Upload result:

        - please go back to the deployment directory (the same place that you clicked on dedup.exe)

        - find the outfile*.zip (* is some random number).

        - upload “outfile*.zip” file in the following address:

            http://sysnet.cs.ucr.edu/dedup/Upload_Result.php

  1. 6.Done. Thank you so much. We greatly appreciate your help.


OSX

  1. 1.Install Java, if you did not already install it. Follow the instruction at the following URL for Mac User:

    http://www.java.com

  1. 2.Download the application from here. Uncompress it.

  2. 3.Open a terminal and go the deployment folder.

  3. 4.Run the following command:

    sudo java -jar deduplication.jar

  1. 5.The following widnows is shown:


Click on “Start” to start the application.

It takes few hours to run ;) (depends on your computer specification)

  1. 6.Upload result:

        - please go back to the uncompress directory.

        - find the outfile*.zip (* is some random number).

        - upload “outfile*.zip” file in the following address:

            http://sysnet.cs.ucr.edu/dedup/Upload_Result.php

  1. 7.Done. Thank you so much. We greatly appreciate your help.


Linux

  1. 1.Install java, if you did not already install it. You can download it from the following URL:

    - x86 (32 bits)

    - x64 (64 bits)

  1. 2.Download the application from here. Uncompress it. It looks like:



  1. 3.Open a terminal and run the following commands:

        java -jar deduplication.jar

java should point to where you already installed jre and deduplication.jar is one the files inside uncompressed directory.

  1. 4.It takes few hours to run ;) (depends on your computer specification)

  2. 5.Upload result:

        - please go back to c:\Program Files\Java\jre7\bin

        - find the outfile*.zip (* is some random number).

        - upload “outfile*.zip” file in the following address:

            http://sysnet.cs.ucr.edu/dedup/Upload_Result.php

  1. 6.Done. Thank you so much. We greatly appreciate your help.


Note: You can download the source code from here. You can use this source code instead of JAR file. There is no information leakage about content of your files.


If you have any questions or concerns, please email me

(makho001 [at] ucr.edu)