CS 206 Project: Effectiveness of Different Test Case Prioritization Methods Based on Coverage Criteria

Effectiveness of Different Test Case Prioritization Methods Based on Coverage Criteria

CS 206 Project, Fall 2012

Handed Out: October 30, 2012
Due Date: November 29, 2012
Grade Weight: 15% of total course grade
Note: This project can be completed in groups of two.

Overview

In this project, you will compare the effectiveness of different kinds of test case prioritization methods in exposing faults via multiple coverage criteria. You are provided with a set of benchmark programs, a set of tests cases for each benchmark, and a set of faults for each benchmark.

You will collect the coverage information for each test case.
For each benchmark program, you will use test case prioritization methods to create test suites according to the different coverage criteria.
For each created test suite, you will evaluate its fault-exposing potential by identifying the set of available faults that are exposed by the test suite.
Finally, you will report on your experimental results and observations.

Download Required Files

First, download the required files for this project here: benchmarks.tar.gz (download size: approximately 1.8 MB)

Next, uncompress the downloaded file by executing the command: "tar -xvzf benchmarks.tar.gz".

Benchmark Programs

You will be provided with a set of 7 relatively small benchmark programs written in C. Each program is associated with a set of faults and a set of test cases:

Program Name	# of Available Faults	# of Available Test Cases
tcas	41	1590
totinfo	23	1026
schedule	9	2634
schedule2	9	2679
printtokens	7	4072
printtokens2	9	4057
replace	31	5542

General Directory Structure for Benchmarks

All benchmark programs are contained within a folder named "benchmarks". Within this folder, there is a subdirectory for each of the seven benchmark programs.

Within the subdirectory for a benchmark program, you will find the following folders and files:

The benchmark program
The benchmark program is specified as a .c file, and possibly one or more associated .h files.
Example: Program tcas is located in the file "benchmarks/tcas/tcas.c" (there are no associated .h files for this program).
The set of faults
The faults are specified as a set of "faulty versions" that are associated with each benchmark program (one fault per faulty version). Each faulty version is contained within a folder that is named with the letter "v", followed by the number of the faulty version. The program code for each faulty version is identical to that of the (original) benchmark program, except for a slight modification to the code that represents the fault.
Example: Program tcas is associated with 41 different faulty versions, contained in the folders named "benchmarks/tcas/v1" through "benchmarks/tcas/v41".
The set of test cases
The test cases are specified in the file "universe.txt" (one test case per line).
Example: The test cases for program tcas are contained in the file "benchmarks/tcas/universe.txt".
Test case input file directories
Any other folders that may be present contain input files that are used by the associated test cases. You don't need to do anything with these folders except to make sure they are present in the current working directory when running the benchmark program.

Compiling and Running Each Benchmark Program

All benchmark programs take some input values (either command-line parameters, the name of an input file, or both), and produce output that is written to the standard output stream (by default, the screen). The following table shows how to compile each program, how to execute the compiled program using the first test case specified in the associated "universe.txt" test case file, and the names of any input file directories that are required by the associated test cases. Note that the test cases assume that any necessary input file directories are contained in the current working directory when running the benchmark program.

Program Name	How to Compile	Example Command to Execute Program	Input File Directories
tcas	`gcc -g -o tcas tcas.c`	`tcas 1258 1 0 897 174 7253 1 629 500 0 0 1`	(none)
totinfo	`gcc -g -o totinfo totinfo.c -lm`	`totinfo < universe/jkADl.mat`	`universe`
schedule	`gcc -g -o schedule schedule.c`	`schedule 5 1 1 < input/dat027`	`input`
schedule2	`gcc -g -o schedule2 schedule2.c`	`schedule2 5 1 1 < input/dat027`	`input`
printtokens	`gcc -g -o printtokens printtokens.c`	`printtokens < inputs/newtst122.tst`	`inputs`
printtokens2	`gcc -g -o printtokens2 printtokens2.c`	`printtokens2 < inputs/newtst122.tst`	`inputs`
replace	`gcc -g -o replace replace.c -lm`	`replace '@\|' 'E)m' < input/ruin.1373`	`input`, `moni`, `temp-test`

Coverage Criteria

We will consider 2 different kinds of coverage criteria in this project.

Statement coverage. Statement coverage is the selection of tests so that every statement has been executed at least once.
Branch coverage. Branch coverage is a requirement according to which, for each conditional branch in the program (such as due to if statements, loops etc.), the branch condition must have been true at least once and false at least once during testing.

Obtaining Coverage Information

For each test case associated with each benchmark program, you have to collect the coverage information for each of the two coverage criteria listed above. The coverage information can be collected by using a UNIX tool called gcov.

Here are some useful links for the usage of gcov:

Test Case Prioritization

Test case prioritization techniques schedule test cases in an execution order according to some criterion. Here is the prioritization methods you will use:

Random prioritization. Randomly order the test cases in a test suite.
Total Coverage prioritization. Prioritize test cases according to the total number of statements/branches they cover simply by sorting them in order of their total statement/branch coverage.
Additional Coverage prioritization. Iteratively perform the following two steps until all statements/branches are covered by at least one test case: (i) select a test case that yields the greatest additional statement/branch coverage; and (ii) then adjust the coverage information on subsequent test cases to indicate their coverage of statements/branches not yet covered.

If you want to read more about the above methods you may refer to the following papers for more details.

References:

S. Elbaum, A. G. Malishevsky, and G. Rothermel, Test Case Prioritization: A Family of Empirical Studies. IEEE Transactions on Software Engineering, 2002.
B. Jiang, Z. Zhang, W. K. Chan, and T. H. Tse, Adaptive Random Test Case Prioritization. IEEE/ACM International Conference on Automated Software Engineering, 2009 (ASE'09).
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, Test Case Prioritization: An Empirical Study. IEEE International Conference on Software Maintenance, 1999 (ICSM'99).

Detailed Requirements

The following tasks will need to be performed to complete this project.

Create the following test suites

For each benchmark program, create six test suites using the combinations of two coverage criteria (statements and branches) and three prioritization methods (random, total, and additional).
Each test suite should be "adequate" with respect to the given coverage criterion. That is, the test suite should cover all of the entities that are covered by the entire set of available test cases. The test suite you generate will be significantly smaller than the set of all available test cases. For example, even though there may be over 2000 test cases available for a benchmark program, a test suite may only need 10 or 15 test cases in order to become adequate for some particular coverage criterion.

Evaluate the fault-exposing potential of each test suite

Compile each benchmark program, and compile each of the faulty versions associated with each benchmark program.
For each created test suite for each benchmark program, identify the set of faults that are exposed by the test suite. A fault is considered to be "exposed" by a test suite if there exists at least one test case in the test suite that causes the program to produce different output when run on the faulty version, as compared to when run on (the original version of) the program. For example, even though there may be over 20 total faults available for a benchmark program, it may turn out that a particular test suite is only able to expose 2 or 3 of these faults.

Report your experimental results and observations

Create a data table listing your experimental results. The table should show, for each created test suite associated with each benchmark program, the size of the test suite (number of test cases), and the number of faults exposed by the test suite.
Write about any interesting observations you can make from your experimental results. How small are your test suites as compared to the original number of available test cases? How do the suite sizes change according to different coverage criteria? How many faults are exposed by your test suites as compared to the total number of available faults? Which coverage criteria seem to be the least/most effective at being able to expose faults? What other conclusions can you draw from your observations?

Implementation and Submission

Programming Languages to Use

We would prefer that you use either C/C++ or Java to implement your solution to this project. However, if you choose to use another language (e.g., script languages), that is OK too. Whatever language you choose to use, please also submit along with your programs, a description file (README) for how to compile and run them.

Items to Submit

Please submit the following items for this project (Please put all your files into one directory and compress it into a zip/tgz/tar.gz file; please do not include executables in your submission) to ltan003@cs.ucr.edu:

The report you have prepared regarding your experimental results and observations, according to the above detailed requirements.
The test suites you have created for each benchmark program.
The programs you wrote in order to get your experimental results, along with a README file to describe how to compile and run each program.