Software Testing strategies, their types and Levels
ranapoonam1
64 views
60 slides
May 03, 2024
Slide 1 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
About This Presentation
Testing Strategies
Size: 1.22 MB
Language: en
Added: May 03, 2024
Slides: 60 pages
Slide Content
Testing Strategies By: Dr. Poonam Panwar Associate Professor MMICT & BM MMDU, Mullana , Ambala, Haryana
Verification and Validation Before getting into the various forms and strategies of testing we must understand the process of verifying and validating the software code. Verification and validation is the generic name given to checking processes which ensure that the software conforms to its specification and meets the needs of the customer . The system should be verified and validated at each stage of the software development process using documents produced in earlier stages. Verification and validation thus starts with requirements reviews and continues through design and code reviews to product testing .
Verification and validation are sometimes confused, but they are different activities. The difference between the two can be summarised as follows: Validation: Are we building the right product? Verification: Are we building the product right? Verification involves checking that the program conforms to its specification. Validation involves checking that the program as implemented meets the expectations of the customer. Requirements validation techniques, such as prototyping , help in this respect. However, flaws and deficiencies in the requirements can sometimes be discovered only when the system implementation is complete.
To satisfy the objectives of the verification and validation process, both static and dynamic techniques of system checking and analysis should be used. Static techniques are concerned with the analysis and checking of system representations such as the requirements document, design diagrams and the program source code. Dynamic techniques or tests involve exercising an implementation. Static techniques include program inspections, analysis and formal verification. Some theorists have suggested these techniques should completely replace dynamic techniques in the verification and validation process and that testing is not necessary, this is not a useful point of view and could be 'considered harmful'. Static techniques can only check the correspondence between a program and its specification (verification). They cannot demonstrate that the software is operationally useful. Although static verification techniques are becoming more widely used, program testing is still the predominant verification and validation technique. Testing involves exercising the program using data like the real data processed by the program. The existence of program defects or inadequacies is inferred from unexpected system outputs . Testing may be carried out during the implementation phase to verify that the software behaves as intended by its designer. This later testing phase checks conformance with the requirements and assesses the reliability of the system.
Testing strategies A testing strategy is a general approach to the testing process rather than a method of devising particular system or component tests. Different testing strategies may be adopted depending on the type of system to be tested and the development process used. There are two different strategies available: Top-Down Testing and Bottom-Up Testing.
Top-Down Testing In Top-Down Testing , high levels of a system are tested before testing the detailed components. The application is represented as a single abstract component with sub-components represented by stubs . Stubs have the same interface as the component but very limited functionality. After the top-level component has been tested, its sub-components are implemented and tested in the same way. This process continues recursively until the bottom - level components are implemented. The whole system may then be completely tested. Top-down testing should be used with top-down program development so that a system component is tested as soon as it is coded. Coding and testing are a single activity with no separate component or module testing phase. If top-down testing is used, unnoticed design errors may be detected at an early stage in the testing process. As these errors are usually structural errors, early detection means that extensive re-design re-implementation may be avoided. Top-down testing has the further advantage that we could have a prototype system available at a very early stage, which itself is a psychological boost. Validation can begin early in the testing process as a demonstrable system can be made available to the users.
Bottom-Up Testing Bottom-Up Testing is the opposite of Top-Down. It involves testing the modules at the lower levels in the hierarchy, and then working up the hierarchy of modules until the final module is tested. This type of testing is appropriate for object-oriented systems in that individual objects may be tested using their own test drivers. They are then integrated and the object collection is tested.
Testing Testing is the process of running a program with the intention of finding errors . Testing should be carried out systematically . It should never be done by intuition . Most testing is top down, module by module testing. There are two main approaches to testing: 1. Black Box Testing 2. White Box Testing
Black Box Testing
Black Box Testing This treats the system as one that cannot be seen in detail. The structure of the program is not taken into account. The tests are performed based on what the program does . This is sometimes called Functional Testing . The functional requirements have been agreed with the customer. Thus they can see that the program performs as it was requested at the specification stage. It is difficult to know just how much of the program coding has been tested in this case, but it is the kind of testing which is very popular with the customer. The approach to testing should generally be as shown in the following algorithm: 1. Repeat 2. Code and unit test a component. 3. Add the component to the existing combination. 4. Test and debug the new combination. 5. Until all the components have been added. 6. Deliver the system for acceptance tests.
Black Box Testing Black Box testing involves: Equivalence partitioning, Boundary value analysis, Robustness testing, Cause-effect graphing
Equivalence Partitioning In this method the input domain data is divided into different equivalence data classes. This method is typically used to reduce the total number of test case s to a finite set of testable test cases, still covering maximum requirements. In short it is the process of taking all possible test cases and placing them into classes. One test value is picked from each class while testing.
Example If you are testing for an input box accepting numbers from 1 to 1000 then there is no use in writing thousand test cases for all 1000 valid input numbers plus other test cases for invalid data. Using equivalence partitioning method above test cases can be divided into three sets of input data called as classes. Each test case is a representative of respective class. So in above example we can divide our test cases into three equivalence classes of some valid and invalid inputs.
Test cases for input box accepting numbers between 1 and 1000 using Equivalence Partitioning One input data class with all valid inputs. Pick a single value from range 1 to 1000 as a valid test case. If you select other values between 1 and 1000 then result is going to be same. So one test case for valid input data should be sufficient. Input data class with all values below lower limit. I.e. any value below 1, as a invalid input data test case. Input data with any value greater than 1000 to represent third invalid input class. So using equivalence partitioning you have categorized all possible test cases into three classes. Test cases with other values from any class should give you the same result. We have selected one representative from every input class to design our test cases. Test case values are selected in such a way that largest number of attributes of equivalence class can be exercised. Equivalence partitioning uses fewest test cases to cover maximum requirements.
Boundary value analysis It’s widely recognized that input values at the extreme ends of input domain cause more errors in system. More application errors occur at the boundaries of input domain. ‘Boundary value analysis’ testing technique is used to identify errors at boundaries rather than finding those exist in center of input domain. Boundary value analysis is a next part of Equivalence partitioning for designing test cases where test cases are selected at the edges of the equivalence classes.
Test cases for input box accepting numbers between 1 and 1000 using Boundary value analysis Test cases with test data exactly as the input boundaries of input domain i.e. values 1 and 1000 in our case. Test data with values just below the extreme edges of input domains i.e. values 0 and 999. Test data with values just above the extreme edges of input domain i.e. values 2 and 1001. Boundary value analysis is often called as a part of stress and negative testing.
Things to take care There is no hard-and-fast rule to test only one value from each equivalence class you created for input domains. You can select multiple valid and invalid values from each equivalence class according to your needs and previous judgments. E.g. if you divided 1 to 1000 input values in valid data equivalence class, then you can select test case values like: 1, 11, 100, 950 etc. Same case for other test cases having invalid data classes. This should be a very basic and simple example to understand the Boundary value analysis and Equivalence partitioning concept.
Robustness testing Robustness testing is any quality assurance methodology focused on testing the robustness of software. Robustness testing has also been used to describe the process of verifying the robustness (i.e. correctness) of test cases in a test process. ANSI and IEEE have defined robustness as the degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions. The term "robustness testing" was first used by the Ballista project at Carnegie Mellon University. They performed testing of operating systems for dependability based on the data types of POSIX API, producing complete system crashes in some systems. The term was also used by OUSPG and VTT researchers taking part in the PROTOS project in the context of software security testing. Eventually the term Fuzzing (which security people use for mostly non-intelligent and random robustness testing) extended to also cover model-based robustness testing.
Interface robustness testing bombarding the public interface of the application/system/API with valid and exceptional inputs. The success criteria is in most cases: "if it does not crash or hang, then it is robust", hence no oracle is needed for the testing. Examples: Fuzz : Fuzz used a simple method (randomly generated string) to test the robustness of Unix console applications. They repeated their original experiment (1990) in 1995, and applied the method also for X-Window applications. The results were distressing, originally approximately 40% of the applications tested could be crashed with this method, and many of the reported robustness errors remained even after five years. In 2000 they conducted a third experiment with Windows 2000 applications. The method was similar, randomly generated mouse and keyboard events were supplied to the programs. The source code of the testing tools can be downloaded from Fuzz's homepage.
Interface robustness testing Contd … Ballista : in Ballista the robustness of the Posix API implementation was tested. They conducted a great number of experiments and compared 15 Unix versions. Later the test suite was implemented for Windows systems also. Part of the POSIX test suite can be downloaded from their website. Over the years a lot of publications appeared concerning Ballista. A good introduction is the brochure and the "Software Robustness Evaluation " slides. Ballista suggested quite a few good ideas and techniques, and they carried out a lot of well-documented experiments, it is worth to see. JCrasher : JCrasher is a tool to generate robustness tests from Java byte code in form of JUnit tests. Novel approaches implemented in a nice tool, which can be downloaded even as an Eclipse plug-in. PROTOS (Security Testing of Protocol Implementations): The PROTOS project analyzes the robustness and security aspects of protocols. Among the papers they published a test suite for WAP . The project was split into a research project (PROTOS Genome) and a commercial tool called Codenomicon DEFENSICS .
Dependability benchmarking The aim is to develop a public benchmark specification which focuses on evaluating the dependability of the system. It is a much broader field, than robustness, it contains the other attributes of dependability like availability and maintainability. The common method is to create a workload, which resembles the normal operation of the system under benchmark. Then define a fault load, which contains typical faults (hardware, software, operator, etc.) and the exact time period when they should be inserted. The specification includes also what dependability measures should be collected. DBench : the goal of the EU project DBench was to produce guidelines for developing dependability benchmarks. Along the general guidelines and background research they developed also concrete benchmarks also: OLTP benchmarks: 4 configurations of Oracle DBMS compared in the paper "Benchmarking the Dependability of Different OLTP Systems" (DOI: 10.1109/DSN.2003.1209940 ) Webserver benchmarks: compared Abyss and Apache webservers, fault load simulated typical programmer errors ("Dependability Benchmarking of Web-Servers", DOI: 10.1007/978-3-540-30138-7_25 ) - OS benchmarks: benchmarked 6 Windows and 4 Linux versions using the Post Mark file benchmark as a load ("Benchmarking the dependability of Windows and Linux using Post Mark/ spl trade/ workloads", DOI: 10.1109/ISSRE.2005.13 ) IBM Autonomic Computing Benchmark : similar to the DBench -OLTP, but uses SPECjAppServer2004 as a workload and focuses on the resiliency of the system to various disturbances.
Cause-effect Graphing Cause Effect Graph is a black box testing technique that graphically illustrates the relationship between a given outcome and all the factors that influence the outcome. It is also known as Ishikawa diagram as it was invented by Kaoru Ishikawa or fish bone diagram because of the way it looks.
Cause Effect - Flow Diagram
Circumstances - under which Cause-Effect Diagram used To Identify the possible root causes, the reasons for a specific effect, problem, or outcome. To Relate the interactions of the system among the factors affecting a particular process or effect. To Analyze the existing problems so that corrective action can be taken at the earliest.
Benefits It Helps us to determine the root causes of a problem or quality using a structured approach. It Uses an orderly, easy-to-read format to diagram cause-and-effect relationships. It Indicates possible causes of variation in a process. It Identifies areas, where data should be collected for further study. It Encourages team participation and utilizes the team knowledge of the process. It Increases knowledge of the process by helping everyone to learn more about the factors at work and how they relate.
Steps for drawing cause-Effect Diagram Step 1 : Identify and Define the Effect Step 2 : Fill in the Effect Box and Draw the Spine Step 3: Identify the main causes contributing to the effect being studied. Step 4 : For each major branch, identify other specific factors which may be the causes of the EFFECT. Step 5 : Categorize relative causes and provide detailed levels of causes.
Cause-effect graph example The "Print message" is a software that read two characters and, depending of their values, messages must be printed. The first character must be an "A" or a "B". The second character must be a digit. If the first character is an "A" or "B" and the second character is a digit, the file must be updated. If the first character is incorrect (not an "A" or "B"), the message X must be printed. If the second character is incorrect (not a digit), the message Y must be printed.
Contd … Causes: 1 - first character is "A" 2 - first character is "B" 3 - second character is a digit Effects: 70 - the file is updated 71 - message X is print 72 - message Y is print
Syntax Testing in Software Syntax Testing in Software Testing means it is widely used software testing term. It is done in White Box Testing by using some tools or by manually depending on the nature of the project. As you know Syntax Testing is used in White Box Testing so it is obviously done by developers. Not in all the situations it can be done by developers, it can also be done by the testers if they are skilled testers mean white box testers. As developers are doing this testing Therefore the developers should be responsible for running a syntax check before releasing their code to QA team.
Syntax Testing in Software Contd … In this testing, we test the syntax of the programming languages. As syntax of the every programming languages is almost different so criteria for doing Syntax Testing is also different on these programming languages.
Syntax Testing in Software Testing Example – php language For example we are doing the Syntax Testing of php language than here we check whether the syntax is proper or not by checking the starting and ending tag syntax of php language. As you know starting tag of php is <? php and ending tag is ?> so in this case we check whether the starting tag (<? php ) is OK or not and we also check whether the ending tag (?>) is also OK or not. So this is the Syntax Testing of php language has done by developers and testers. And in the same way we can test the Syntax of Asp language also which is given next.
Syntax Testing Example – Asp language In the previous example you can see the Syntax Testing of php language now we move towards Asp means how to test the Syntax of the Asp language. As you know Starting tag of Asp is <? and ending tag of Asp is ?> so here we can test whether the starting and ending tag Syntax of Asp is OK or not. So this is the Syntax Testing done by us on Asp language. Please note that Syntax of various languages are different so testing may vary accordingly to programming languages we used. And generally in this testing, we test whether the syntax is Ok or not like ?,.>< and so on, means all this syntax are in appropriate place or not. And normaly Syntax Testing is usually done by the development team.
Finite state Testing Software is a finite state machine With static memory allocation or with limited dynamic allocation nothing is infinite. Even if you add in disk or network storage, We don’t have infinite electrons, much less memory. So software systems are finite state machines, in reality.
Black box (FSM) testing Theoretical issues aside, why do we care about testing finite state machines? Abstraction: designs can often be best understood as finite-state machines 1. String processing/searching 2. Protocols – communication, cache coherence, etc. 3. Control component of any discrete system Automatic abstraction: 1. Tools that take systems and produce (coarse) finite state abstractions
Very Simple FSM Model FSM is a tuple, <S, , T, I> S is a set of states is the input alphabet T is the transition relation T: S x x S I S is the initial state Further assume: Machine is deterministic T is a (partial) function S x S Given an input from , machine either Outputs 0 (if no transition) Or outputs 1 and takes the transition to s’ a c a b a d
How do we test finite state machines? Let’s say we have Known FSM A Know all states and transitions Unknown FSM B (same alphabet) Can only perform experiments How do we tell if A = B ? Known as the conformance testing or equivalence testing problem
White Box Testing
White box testing techniques White-box testing (also known as clear box testing, glass box testing, transparent box testing, and structural testing) is a method of testing software that tests internal structures or workings of an application, as opposed to its functionality (i.e. black-box testing). In white-box testing an internal perspective of the system, as well as programming skills, are used to design test cases. The tester chooses inputs to exercise paths through the code and determine the appropriate outputs. This is analogous to testing nodes in a circuit, e.g. in-circuit testing (ICT). While white-box testing can be applied at the unit, integration and system levels of the software testing process, it is usually done at the unit level. It can test paths within a unit, paths between units during integration, and between subsystems during a system–level test. Though this method of test design can uncover many errors or problems, it might not detect unimplemented parts of the specification or missing requirements.
White-box test design techniques include: Statement Coverage Branch Coverage Path Coverage Control flow Graphing Data flow testing Decision coverage/ Condition Coverage Mutation testing Automated code coverage analysis
Statement coverage The statement coverage strategy aims to design test cases so that every statement in a program is executed at least once. The principal idea governing the statement coverage strategy is that unless a statement is executed, it is very hard to determine if an error exists in that statement. Unless a statement is executed, it is very difficult to observe whether it causes failure due to some illegal memory access, wrong result computation, etc. However, executing some statement once and observing that it behaves properly for that input value is no guarantee that it will behave correctly for all input values. In the following, designing of test cases using the statement coverage strategy have been shown.
Example: Consider the Euclid’s GCD computation algorithm: int compute_gcd (x, y) int x, y; { 1 while (x! = y){ 2 if (x>y) then 3 x= x – y; 4 else y= y – x; 5 } 6 return x; } By choosing the test set {(x=3, y=3), (x=4, y=3), (x=3, y=4)}, we can exercise the program such that all statements are executed at least once.
Branch Coverage In the branch coverage-based testing strategy, test cases are designed to make each branch condition to assume true and false values in turn. Branch testing is also known as edge testing as in this testing scheme, each edge of a program’s control flow graph is traversed at least once. It is obvious that branch testing guarantees statement coverage and thus is a stronger testing strategy compared to the statement coverage-based testing. For Euclid’s GCD computation algorithm , the test cases for branch coverage can be {(x=3, y=3), (x=3, y=2), (x=4, y=3), (x=3, y=4)}.
Path Coverage The path coverage-based testing strategy requires us to design test cases such that all linearly independent paths in the program are executed at least once. A linearly independent path can be defined in terms of the control flow graph (CFG) of a program.
Control Flow Graph (CFG) Control flow graph describes how the control flows through the program. In order to draw the control flow graph of a program, all the statements of a program must be numbered first. The different numbered statements serve as nodes of the control flow graph (as shown in fig. 10.3). An edge from one node to another node exists if the execution of the statement representing the first node can result in the transfer of control to the other node. The CFG for any program can be easily drawn by knowing how to represent the sequence, selection, and iteration type of statements in the CFG. After all, a program is made up from these types of statements. Fig. 10.3 summarizes how the CFG for these three types of statements can be drawn. It is important to note that for the iteration type of constructs such as the while construct, the loop condition is tested only at the beginning of the loop and therefore the control flow from the last statement of the loop is always to the top of the loop. Using these basic ideas, the CFG of Euclid’s GCD computation algorithm can be drawn as shown in fig. 10.4.
CFG for (a) sequence, (b) selection, and (c) iteration type of constructs
Control flow diagram
Path A path through a program is a node and edge sequence from the starting node to a terminal node of the control flow graph of a program. There can be more than one terminal node in a program. Writing test cases to cover all the paths of a typical program is impractical. For this reason, the path-coverage testing does not require coverage of all paths but only coverage of linearly independent paths.
Linearly independent path A linearly independent path is any path through the program that introduces at least one new edge that is not included in any other linearly independent paths. If a path has one new node compared to all other linearly independent paths, then the path is also linearly independent. This is because, any path having a new node automatically implies that it has a new edge. Thus, a path that is subpath of another path is not considered to be a linearly independent path. In order to understand the path coverage-based testing strategy, it is very much necessary to understand the control flow graph (CFG) of a program. The path-coverage testing does not require coverage of all paths but only coverage of linearly independent paths.
Cyclomatic complexity For more complicated programs it is not easy to determine the number of independent paths of the program. McCabe’s cyclomatic complexity defines an upper bound for the number of linearly independent paths through a program. Also, the McCabe’s cyclomatic complexity is very simple to compute. Thus, the McCabe’s cyclomatic complexity metric provides a practical way of determining the maximum number of linearly independent paths in a program. Though the McCabe’s metric does not directly identify the linearly independent paths, but it informs approximately how many paths to look for. There are three different ways to compute the cyclomatic complexity. The answers computed by the three methods are guaranteed to agree.
Method 1 Given a control flow graph G of a program, the cyclomatic complexity V(G) can be computed as: V(G) = E – N + 2 where N is the number of nodes of the control flow graph and E is the number of edges in the control flow graph. For the CFG of example shown in fig. 10.4, E=7 and N=6. Therefore, the cyclomatic complexity = 7-6+2 = 3.
Method 2 An alternative way of computing the cyclomatic complexity of a program from an inspection of its control flow graph is as follows: V(G) = Total number of bounded areas + 1 In the program’s control flow graph G, any region enclosed by nodes and edges can be called as a bounded area. This is an easy way to determine the McCabe’s cyclomatic complexity. But, what if the graph G is not planar, i.e. however you draw the graph, two or more edges intersect? Actually, it can be shown that structured programs always yield planar graphs. But, presence of GOTO’s can easily add intersecting edges. Therefore, for non-structured programs, this way of computing the McCabe’s cyclomatic complexity cannot be used. The number of bounded areas increases with the number of decision paths and loops. Therefore, the McCabe’s metric provides a quantitative measure of testing difficulty and the ultimate reliability. For the CFG example shown in fig. 10.4, from a visual examination of the CFG the number of bounded areas is 2. Therefore the cyclomatic complexity, computing with this method is also 2+1 = 3. This method provides a very easy way of computing the cyclomatic complexity of CFGs, just from a visual examination of the CFG. On the other hand, the other method of computing CFGs is more amenable to automation, i.e. it can be easily coded into a program which can be used to determine the cyclomatic complexities of arbitrary CFGs.
Method 3 The cyclomatic complexity of a program can also be easily computed by computing the number of decision statements of the program. If N is the number of decision statement of a program, then the McCabe’s metric is equal to N+1.
Condition coverage In this structural testing, test cases are designed to make each component of a composite conditional expression to assume both true and false values. For example, in the conditional expression ((c1.and.c2).or.c3), the components c1, c2 and c3 are each made to assume both true and false values. Branch testing is probably the simplest condition testing strategy where only the compound conditions appearing in the different branch statements are made to assume the true and false values. Thus, condition testing is a stronger testing strategy than branch testing and branch testing is stronger testing strategy than the statement coverage-based testing. For a composite conditional expression of n components, for condition coverage, 2ⁿ test cases are required. Thus, for condition coverage, the number of test cases increases exponentially with the number of component conditions. Therefore, a condition coverage-based testing technique is practical only if n (the number of conditions) is small.
Data flow-based testing Data flow-based testing method selects test paths of a program according to the locations of the definitions and uses of different variables in a program. For a statement numbered S, let DEF(S) = {X/statement S contains a definition of X}, and USES(S) = {X/statement S contains a use of X} For the statement S:a=b+c;, DEF(S) = { a }. USES(S) = { b,c }. The definition of variable X at statement S is said to be live at statement S1, if there exists a path from statement S to statement S1 which does not contain any definition of X. The definition-use chain (or DU chain) of a variable X is of form [X, S, S1], where S and S1 are statement numbers, such that X Є DEF(S) and X Є USES(S1), and the definition of X in the statement S is live at statement S1. One simple data flow testing strategy is to require that every DU chain be covered at least once. Data flow testing strategies are useful for selecting test paths of a program containing nested if and loop statements.
Mutation testing In mutation testing, the software is first tested by using an initial test suite built up from the different white box testing strategies. After the initial testing is complete, mutation testing is taken up. The idea behind mutation testing is to make few arbitrary changes to a program at a time. Each time the program is changed, it is called as a mutated program and the change effected is called as a mutant. A mutated program is tested against the full test suite of the program. If there exists at least one test case in the test suite for which a mutant gives an incorrect result, then the mutant is said to be dead. If a mutant remains alive even after all the test cases have been exhausted, the test data is enhanced to kill the mutant. The process of generation and killing of mutants can be automated by predefining a set of primitive changes that can be applied to the program. These primitive changes can be alterations such as changing an arithmetic operator, changing the value of a constant, changing a data type, etc. A major disadvantage of the mutation-based testing approach is that it is computationally very expensive, since a large number of possible mutants can be generated. Since mutation testing generates a large number of mutants and requires us to check each mutant with the full test suite, it is not suitable for manual testing. Mutation testing should be used in conjunction of some testing tool which would run all the test cases automatically.
Automated code coverage analysis Code coverage basically tests that how much of your code is covered under tests. So, if you have 90% code coverage than it means there is 10% code that is not covered under test. I know you might be thinking that 90% of the code is covered but you have to look from a different angle. What is stopping you to get 100% code coverage. A good example will be this: if( customer.IsOldCustomer ()) { } else { } Now, in the above code there are two paths/branches. If you are always hitting the "YES" branch then you are not covering the else part and it will be shown in the Code Coverage results. This is good because now you know that what is not covered and you can write a test to cover the else part. If there was no code coverage then you are just sitting on a time bomb to explode. NCover is a good tool to measure code coverage.
Automated code coverage analysis Code coverage has been explained well in the answers above. So this is more of an answer to the second part of the question. Some tools to determine code-coverage are. JTest - a proprietary tool built over JUnit . (It generates unit tests as well) Cobertura - an open source code coverage tool that can easily be coupled with JUnit tests to generate reports. Emma - another - this one we've used for a slightly different purpose than unit testing. It has been used to generate coverage reports when the web-application is accessed by end-users. This coupled with web-testing tools (ex: Canoo ) can give you very useful coverage reports which tell you how much code is covered during typical end user usage. These tools can be used to: Review that developers have written good unit tests Ensure that all code is traversed during black-box testing