EMPERICAL INVESTIGATION
Investigative Principles, Techniques
Adapted from SENG412
ESE —Software Metrics
ESE 12.2
Roadmap
>Software engineering investigation
>Investigation principles
>Investigation techniques
>Formal experiments: Planning
>Formal experiments: Principles
>Formal experiments: Types
>Formal experiments: Selection
>**Guidelines for empirical research
Empirical Investigation
>Fill the gap between research and practice
by:
—Developing methods for studying SE practice
—Building a body of knowledge of SE practice
—Validating research before deployment in
industrial settings
LECTURE TITLE
3
Definition: Empirical -based on, concerned with, or verifiable by
observation or experience rather than theory or pure logic.
SE Investigation
>What is software engineering investigation?
—Applying “scientific” principles and techniques to
investigate properties of software and software
related tools and techniques.
LECTURE TITLE
4
SE Investigation: Examples
>Experiment to confirm rules-of-thumb
—Should the LOC in a module be less than 200?
—Should the number of branches in any functional decomposition be less
than 7?
>Experiment to explore relationships
—How does the project team experience with the application affect the
quality of the code?
—How does the requirements quality affect the productivity of the
designer?
—How does the design structure affect maintainability of the code?
>Experiment to initiate novel practices
—Would it be better to start OO design with UML?
—Would the use of SRE improve software quality?
LECTURE TITLE
5
SE Investigation: Why?
>To improve ( process and/or product )
>To evaluate ( process and/or product )
>To prove a theory or hypothesis
>To disprove a theory or hypothesis
>To understand ( a scenario, a situation )
>To compare ( entities, properties, etc. )
LECTURE TITLE
6
SE Investigation: What?
>Person’s performance
>Tool’s performance
>Person’s perceptions
>Tool’s usability
>Document’s understandability
>Program’s complexity etc.
LECTURE TITLE
7
SE Investigation: Where & When?
>in the field
>in the lab
>in the classroom
>Anytime depending on what questions you are asking
LECTURE TITLE
8
SE Investigation: How?
>Hypothesis/question generation
>Data collection
>Data evaluation
>Data interpretation
>Feed back into iterative process
LECTURE TITLE
9
SE Investigation: Choice of Data Sources
>Data sources come from industrial settings
—This may include people, program code, etc.
>Usually
—Surveys
—Case-studieshypothesis generation
—Experimentshypothesis testing
LECTURE TITLE
10
Where Data Come From?
>First Degree Contact
—Direct access to participants
>Example:
—Brainstorming
—Interviews
—Questionnaires
—System illustration
—Work diaries
—Think-aloud protocols
—Participant observation
LECTURE TITLE
11
Where Data Come From?
>Second Degree Contact
—Access to work environment during work time, but not
necessarily participants
>Example:
—Instrumenting systems
—Real time monitoring
LECTURE TITLE
12
Where Data Come From?
>Third Degree Contact
—Access to work artifacts, such as source code, documentation
>Example:
—Problem report analysis
—Documentation analysis
—Analysis of tool logs
—Off-line monitoring
LECTURE TITLE
13
Practical Considerations
>Hidden Aspects of Performing Studies
—Negotiations with industrial partners
—Obtaining ethics approval and informed consent from
participants
—Adapting “ideal” research designs to fit with reality
—Dealing with the unexpected
—Staffing of project
LECTURE TITLE
14
Investigation Principles
>There are 4 main principles of investigation:
—Stating the hypothesis: What should be investigated?
—Selecting investigation technique: conducting surveys, case
studies, formal experiments
—Maintaining control over variables: dependent and independent
variables
—Making meaningful investigation: verification of theories,
evaluating accuracy of models, validating measurement results.
LECTURE TITLE
15
SE Investigation Techniques
>Three ways to investigate:
—Formal experiment
—Case study
—Survey
LECTURE TITLE
16
SE Investigation Techniques
>Case Studies
—Compare one situation with another
—Identify key factors that may affect an
activity’s outcome and then document them
—It involves a sequence of steps:
–conception hypothesis setting, design,
preparation, execution, analysis, dissemination,
and decision making
SE Investigation Techniques
>Case Study Types
—Sister projects: each is typical and has
similar values for the independent variables
—Baseline: compare single project to
organizational norm
—Random selection: partition single project
into parts
SE Investigation Techniques
>Formal Experiment
—A controlled investigation of an activity, by identifying,
manipulating and documenting key factors of that
activity.
—Controls variables
—Uses methods to reduce bias and eliminate
confounding factors
—Often replicated
SE Investigation Techniques
>Surveys
—A retrospective study of a situation to try to document
relationships and outcomes
–Done after an event has occurred.
—Record data
–to determine how project participants reacted to a particular
method, tool, or technique
–to determine trends or relationships
—Also used to capture information related to products or
projects
—Can be used to document the size of components,
number of faults, effort expended
Survey: Application
>Poll set of data from event to determine reaction of
population to particular method, tool, technique, e.g, use
of OO language over procedural languages
—Size of code
—Effort involved
—Number of faults, failures
—Project duration
>Limitations
—No control over situations –variables not manipulated
LECTURE TITLE
21
Case-study or Experiment?
How to decide whether conduct an experiment or perform a
case-study?
LECTURE TITLE
23
Control is the key factor
Selecting an Evaluation technique
>Key Selection Factors
—Level of control over the variables
—Degree to which the task can be isolated from the rest of the
development process
—Degree to which we can replicate the basic situation
>Formal experiments: research in the small
>Case studies: research in typical
>Surveys: research in the large
LECTURE TITLE
24
Examples /1
>Experiment: research in the small
—You have heard about Software Reliability
Engineering (SRE) and its advantages and may want
to investigate whether to use SRE in your company.
You may design a controlled (dummy) project and
apply the SRE technique to it.
—You may want to experimentwith the various phases
of application (defining operational profile, developing
test-cases and decision upon adequacy of test run)
and document the results for further investigation.
LECTURE TITLE
25
Examples /2
>Case study: research in the typical
—You may have used software reliability
engineering (SRE) for the first time in a
project in your company.
—After the project is completed, you may
perform a case-study to capture the effort
involved (budget, personnel), the number of
failures investigated, and the project duration.
LECTURE TITLE
26
Examples /3
>Survey: investigate in the large
—After you have used SRE in many projects in your
company, you may conduct a survey to capture the
effort involved (budget, personnel), the number of
failures investigated, and the project duration for all
the projects.
—Then, you may compare these figures with those
from projects using conventional software test
technique to see if SRE could lead to an overall
improvements in practice.
LECTURE TITLE
27
What to Investigate –Hypothesis
>The first step is deciding what to investigate.
>The goal for the research can be expressed as a
hypothesis in quantifiable terms that is to be tested.
>The test result (the collected data) will confirm or refute
the hypothesis.
>Example:
—Can Software Reliability Engineering (SRE) help us to achieve
an overall improvements in software development practice in
our company?
LECTURE TITLE
28
Hypothesis (cont’d)
>Other Examples:
—Can integrated development and testing tools improve our
productivity?
—Does Cleanroom software development produce better-quality
software than using the conventional development methods?
—Does code produced using Agile software development have a
lower number of defects per KLOC than code produced using
the conventional methods?
LECTURE TITLE
29
Control /1
>What variables may affect truth of a hypothesis?
>How do they affect it?
>Variable:
—Independent /state(values are set by the experiment or initial
conditions)
—Dependent/Response(values are affected by change of other
variables)
>Example: Effect of “programming language” on the
“quality” of resulting code.
—Programming language is an independentand quality is a
dependentvariable.
LECTURE TITLE
30
Control /2
>A common mistake: ignoring other variables that may
affect the values of a dependent variable.
>Example:
—Suppose you want to determine whether a change in
programming language (independent variable) can affect the
productivity(dependent variable) of your project. For instance,
you currently use FORTRAN and you want to investigate the
effects of changing to Ada. The values of all other variables
should stay the same (e.g., application experience,
programming environment, type of problem, etc.)
—Without this you cannot be sure that the difference in
productivity is attributable to the change in language.
>But list of other variables may grow beyond control!
LECTURE TITLE
32
Control /3
>How to identify the dependent and independent
variables
>Example:
A D
FBZ
D & C F
Given:{A,B,C} Using causal ordering:
{A,B,C} {D} {F} {Z}
LECTURE TITLE
33
Analysisisthe key, sodon’tgatheranything
untilyouknow how youwilluse it
Formal Experiments: Planning
>1. Conception
—Defining the goal of investigation
—State objectives
>2. Design
Translate objectives into formal hypothesis
—Generating quantifiable (and manageable) hypotheses to be
tested
—Defining experimental objects or units
—Identifying experimental subject
—Identifying the response variable(s)
LECTURE TITLE
36
Formal Experiments: Planning
>3. Preparation
—Getting ready to start, e.g., purchasing tools, hardware, training
personnel, etc.
>4. Execution:
—follow plan, apply treatments consistently for comparison of
results
>5. Review and analysis
—Review the results for soundness and validity
>6. Dissemination & decision making
—Documenting conclusions
—Support decision for future development, maintenance of
software (tools, methods, etc)
—Others use for suggesting change to development environment
—Experiment support -variations arising from controlled changes
LECTURE TITLE
37
Formal Experiments: Principles
>1. Replication
—Experiment under identical conditions should be repeatable.
–Provides an estimate of an experimental error, tells us how much
confidence we can attached to a result
—It should not introduce Confounded results
–unable to separate the results of two or more variables
–Eg, comparing quality in code, if we cant attribute it to tools or
programmer’s experience
>2. Randomization
—The random assignment of subjects to groups of treatments, so
that we can assume independence (and thus validity) of results
—Tries to keep treatment results from being biased
–E.g. results can be affected by time, place and some unknown
characteristic of the participants
LECTURE TITLE
38
Formal Experiments: Principles
>3. Local control
—How much control you have over placement of
subjects in experimental units and the organizational
units
—Ensures a valid test of significance, but reducing
magnitude of experimental error
—Local control is usually discussed in terms of two
characteristics of the design. –ie. Blocking and
blacking of units
LECTURE TITLE
39
Formal Experiments: Principles
—Blocking: allocating experimental units to blocks or
groups so the units within a block are relatively
homogeneous.
–The blocks are designed so that the experimental design
captures the anticipated variation in the blocks by grouping
like varieties, so that the variation does not contribute to the
experimental error.
—Balancing: is the blocking and assigning of
treatments so that an equal number of subjects is
assigned to each treatment.
–Balancing is desirable because it simplifies the statistical
analysis.
LECTURE TITLE
40
Example: Blocking & Balancing
>You are investigating the comparative effects of three design techniques on
the quality of the resulting code.
>The experiment involves teaching the techniques to 12 developers and
measuring the number of defects found per 1000 LOC to assess the code
quality.
>It may be the case that the twelve developers graduated from three
universities. It is possible that the universities trained the developers in very
different ways, so that being from a particular university can affect the way
in which the design technique is understood or used.
>To eliminate this possibility, three blocks can be defined so that the first
block contains all developers from university X, the second block from
university Y, and the third block from university Z. Then, the treatments are
assigned at random to the developers from each block. If the first block has
six developers, two are assigned to design method A, two to B, and two to
C.
LECTURE TITLE
41
Formal Experiments: Principles
>3. Local control
—Correlation:
—Determines the relationship
between two or more variables
—A scatter diagram or scatter
plot visually represents a
correlation
—Fitting response curves
—Linear and nonlinear
correlation.
—Nonlinear correlation is hard to
be measured and may stay
hidden.
LECTURE TITLE
42
Formal Experiments: Types
>Factorial design:
—Studies the effects of two or more factors
—Main effect: the change in response produced by a
change in the level of the factor
>Example
—Design a battery: the plate materials (3 levels) v.s.
temperatures (3 levels), and n = 4
—Two questions:
–What effects do material type and temperature have on the
life of the battery?
–Is there a choice of material that would give uniformly long
life regardless of temperature?
LECTURE TITLE
43
Formal Experiments: Types
LECTURE TITLE
44
Formal Experiments: factorial designs
>Proper nested or crossed design may reduce the number of
cases to be tested.
LECTURE TITLE
45
Crossing(each level
of each factor appears
with each level of the
other factor
Nesting(each level of one occurs
entirely in conjunction with one
level of another)
Example –Nested deign
>Effects of language
and experience on
productivity
>Factor A would be
the language
>Factor B would be
the year of
experience with a
particular language
>Only 4 treatments
>Compared to 8
treatments for
nested
>WHY?
LECTURE TITLE
46
We would have 4 cells for each programming language
Example –Nested deign
LECTURE TITLE
47
Formal Experiments: Types
>Advantagesof factorial design
—Resources can be used more efficiently
—Coverage (completeness) of the target variables’ range of
variation
—Implicit replication
>Disadvantagesof factorial design
—Higher costs of preparation, administration and analysis
—Number of combinations will grow rapidly
—Some of the combinations may be worthless
LECTURE TITLE
48
Formal Experiments: Selection
>Selecting the number of factors
or variables:
—Single variable
—Multiple variables
>Example: Measuring time to
code a program module with or
without using a reusable
repository
—Without considering the effects of
experience of programmers
—With considering the effects of
experience of programmers
LECTURE TITLE
49
Formal Experiments: Baselines
>Baselineis an “average” treatment of a variable in a
number of experiments (or case studies).
>It provides a measure to identify whether the value is
within an acceptable range.
>It may help checking the validity of measurement.
LECTURE TITLE
50
Planning Case Studies
>Similar to experiment, few differences
—compares one situation with another
—Sister projects: similar values for state variables to
be measured
—Baselines: collect data from various projects,
regardless of difference, calculate measures of
central tendency, dispersion
—Random selection: make use of single project
partitioned into parts
LECTURE TITLE
51
>DATA COLLECTION
LECTURE TITLE
52
Data Collection
>Good decisions cannot be made with bad data!
>What is good data?
—Correctness: data collected according to exact rules of metric
definition
–LOC -everything but comments
–Process duration -from beginning of specified activity to end of
another
—Accuracy: difference between data and actual value, e.g., time
measured using diff instruments
—Precision: number of decimal places needed to express the
data, use of hours/days for unit for duration of project activity
—Consistency: no much difference in value when measured using
different device or by different person, or repeatedly
LECTURE TITLE
53
Data Collection
>What is good data?
—Time Stamped: Association with particular activity/time period
to know when it was collected
–Track trends, compare activities
—Replication-replicate for difference circumstances.
LECTURE TITLE
54
Types of Data
>Raw data -initial measurement of software measures
>Refined data -processed, analyzed
LECTURE TITLE
55
Example
>Measure: Developer effort
—Weekly time sheets for each staff member
—For total effort on design -sum up relevant
time sheets
—Indirect measures:
–Average effort per developer
–Effort per design component, etc
LECTURE TITLE
56
Problem with problems
>No software developer consistently produces perfect
software for the 1
st
time
>It is important for developers to measure aspects of
software quality
>Useful for determining
—How many problems have been found with a product
—How effective are the preventions, detection and removal
processes;
—Whether a product is ready for release
—How the current version of the product compares to other
releases
LECTURE TITLE
57
Problems with problems
LECTURE TITLE
58
Defining Data
>Metric terminology must be clear and detailed,
understood by all involved
—Fault: human error results in mistake in software
productivity
—Failure: departure of system from required behaviour
—Anomaly: deviation from normal
—Bugs: faults in code
—Crash: type of failure, system ceases to function
—etc
LECTURE TITLE
59
Categorizations
>Data is collected for
—Faults
—Failures
—Changes
LECTURE TITLE
60
Failures
>Focus on external problems of the system:
installation, chain of events, effect on user or
other system, cost.
>Location -code that uniquely identifies the
installation and platform on which the failure
was observed.
>Timing -real time of occurrence, execution time
up to the occurrence of failure.
LECTURE TITLE
61
Failures
>Cause
—type of trigger, type of source
—Often cross-referenced to fault and change reports.
>Severity
—how serious the failure’s end result
—Classification for safety-critical system:
–Catastrophic
–Critical
–Significant
–Minor
LECTURE TITLE
62
Failures
>Can also be measured in terms of
—Cost-how much effort and other resources
to diagnose and response to the failure
—Count-number of failures in a stated time
interval.
—cause and cost can only be completed after
diagnosis.
LECTURE TITLE
63
Failure Report
>Location: such as installation where failure was observed
>Timing: CPU time, clock time or some temporal measure
>Symptom: type of error message or indication of failure
>End result: description of failure, such as “operating
system crash”, “ services degraded”, “loss of data”,
“wrong output”, “no output”
>Mechanism: chain of events, including keyboard
commands and state data, leading to failure
>Cause: reference to possible fault(s) leading to failure
>Severity: reference to a well-defined scale, such as
“critical”, “major”, “minor”
>Cost: cost to fix plus cost of lost potential business
Faults
>Focus on the internals of system
>Only the developer can see it
>Location
—which product (identifier and version) or part of the
product contains the fault.
—Spec, code, database, manuals, plan and
procedures, report, standard/policy
—Requirements, functional, preliminary design,
detailed design, product design, interface, database,
implementation.
Faults
>Timing -when the fault is created, detected,
corrected.
>Symptom -what is observed during diagnosis
>End result -actual failure caused by fault,
should be cross-referenced to failure report.
>Cause
—human error that led to the fault
—Communication, conceptual, clerical
LECTURE TITLE
66
Faults
>Severity -impact on the user
>Cost -total cost to system provider
>Count -number of faults found in a product or
subsystem or during a given period of operation.
LECTURE TITLE
67
Fault Report
>Location: module or document name
>Timing: phases of development during which fault was
created, detected, and corrected
>Symptom: type of error message reported or activity
which revealed fault
>End result: failure cause by the fault
>Mechanism: how cause was created, detected,
corrected
>Cause: type of human error that led to fault
>Severity: refer to severity of resulting or potential failure
>Cost: time or effort to locate and correct
Changes
>Once a failure is experienced and its cause
determined, the problem is fixed through one or
more changes
>Problem is fixed through 1 or more change to
any or all of the development product
>A Change report is usually prepared
—report the changes and track the most affected
products
LECTURE TITLE
69
Changes
>Cause of change may be
—Corrective -correcting a fault
—Adaptive -system changes in some way so the
product have to be upgraded
—Preventive -find faults before they become failure
—Perfective -redo something to clarify the system
structure
>Count -number of changes made in a given
time or given system component
LECTURE TITLE
70
Change Report
>Location: identifier of document or module changed
>Timing: when change was made
>Symptom: type of change
>End result: success of change, as evidence by
regression or other testing
>Mechanism: how and whom change was performed
>Cause: corrective, adaptive, preventive or perfective
>Severity: impact on rest of the system
>Cost: time and effort for change implementation and test
LECTURE TITLE
71
Assignment
>Describe:
—1 How to collect data
—2 When to collect data
—3 How to store and extract data
LECTURE TITLE
72
How to collect data
>Requires human observation and reporting
>Manual data collection –bias, error, omission
and delay -> uniform data collection form
>Automatic data capture –desirable, essential –
recording the execution time
To ensure data are accurate and complete,
planning is essential.
How to collect data
•Planning involves:
•Decide what to measure based on GQM
analysis.
•Determine the level of granularity (individual
modules, subsystem, function etc)
•Ensure that product is under configuration
control (which version)
•Form design
How to collect data
•Planning involves (continue)
•Establish procedures for
•handling the forms,
•analyzing the data and
•reporting the results,
•setting a central collection point
How to collect data
>Keep procedure simple
>Avoid unnecessary recording
>Train staff in recording data and procedure
>Provide results to original providers promptly
>Validate all data collected at a central collection
point
Data collection forms
•Encourages collecting good, useful data
•Self-explanatory
•Should allow fixed-format data and free-format
comments
When to collect data
•Data collection planning when project planning
begins
•Actual data collection takes place during many
phases of development.
•Can be collected at the beginning of the project
to establish initial values and collected again
later to reflect activities and resources being
studied.
When to collect data
•Data-collection activities should become part of
the regular development process.
•Should be mapped to the process model
When to collect data
•Eg:
—Data relating to project personnel (qualifications or
experience) can be collected at the start of the
project.
—While other data collection, such as effort, begins at
project start and continues through operation and
maintenance
—Count of the number of specification and design
faults can be collected as inspections are performed
—Data about changes made to enhance the product
can be collected as enhancements are performed
How to store and extract data
•Raw data should be stored in database set up
using DBMS.
•Can define data structure, insert, modify, delete
and extract refined data.
•Formats, ranges, valid values etccan be check
automatically as they are input.
Chapter Reading & HomeWork
>Chapter 4, 5
LECTURE TITLE
82
>Q & A
LECTURE TITLE
83
Empirical Research Guidelines
>Contents
1. Experimental context
2. Experimental design
3. Data collection
4. Analysis
5. Presentation of results
6. Interpretation of results
LECTURE TITLE
84
1 . Experimental Context
>Goals:
—Ensure that the objectives of the experiment
have been properly defined
—Ensure that the description of the experiment
provides enough details for the practitioners
LECTURE TITLE
85
1 . Experimental Context
>Be sure to specify as much of the context as possible. In
particular, clearly define the entities, attributes and
measures that are capturing the contextual information.
>If a specific hypothesis is being tested, state it clearly
prior to performing the study, and discuss the theory from
which it is derived, so that its implications are apparent.
>If the target is exploratory, state clearly and, prior to data
analysis, what questions the investigation is intended to
address, and how it will address them.
LECTURE TITLE
86
2 . Experimental Design
>Goal:
—Ensure that the design is appropriate for the
objectives of the experiment
—Ensure that the objective of the experiment can be
reached using the techniques specified in the design
LECTURE TITLE
87
2 . Experimental Design / 1
>Identify the population from which the subjects and objects
are drawn.
>Define the process by which the subjects and objects were
selected (inclusion/exclusion criteria).
>Define the process by which subjects and objects are
assigned to treatments.
>Restrict yourself to simple study designs or, at least, to
designs that are fully analyzed in the literature.
>Define the experimental unit.
LECTURE TITLE
88
2 . Experimental Design / 2
>For formal experiments, perform a pre-experiment or
pre-calculation to identify or estimate the minimum
required sample size.
>Use appropriate levels of blinding.
>Avoid the use of controls unless you are sure the control
situation can be unambiguously defined. ??D9: Fully
define all treatments (interventions).
>Justify the choice of outcome measures in terms of their
relevance to objectives of the empirical study.
LECTURE TITLE
89
3 . Data Collection Goal
>Ensure that the data collection process is well
defined
>Monitor the data collection and watch for
deviations from the experiment design
LECTURE TITLE
90
3 . Data Collection
>Define all software measures fully, including the entity,
attribute, unit and counting rules.
>Describe any quality control method used to ensure
completeness and accuracy of data collection.
>For observational studies and experiments, record data
about subjects who drop out from the studies.
>For observational studies and experiments, record data
about other performance measures that may be
adversely affected by the treatment, even if they are not
the main focus of the study.
LECTURE TITLE
91
4 . Analysis Goal
>Ensure that the collected data from the experiment is
analyzed correctly
>Monitor the data analysis and watch for deviations from
the experiment design
LECTURE TITLE
92
4 . Analysis
>Specify any procedures used to control for multiple
testing.
>Consider using blind analysis (avoid “fishing for results”).
>Perform sensitivity analysis.
>Ensure that the data do not violate the assumptions of
the tests used on them.
>Apply appropriate quality control procedures to verify the
results.
LECTURE TITLE
93
5 . Presentation of Results Goal
>Ensure that the reader of the results can understand the
objective, the process and the results of experiment
LECTURE TITLE
94
5 . Presentation of Results
>Describe or cite a reference for all procedures used.
Report or cite the statistical package used.
>Present quantitative results as well as significance levels.
Quantitative results should show the magnitude of effects
and the confidence limits.
>Present the raw data whenever possible. Otherwise,
confirm that they are available for review by the reviewers
and independent auditors.
>Provide appropriate descriptive statistics.
>Make appropriate use of graphics.
LECTURE TITLE
95
6 . Interpretation of Results Goal
>Ensure that the conclusions are derived merely from the
results of the experiment
LECTURE TITLE
96
6 . Interpretation of Results
>Define the population to which inferential statistics and
predictive models apply.
>Differentiate between statistical significance and
practical importance.
>Specify any limitations of the study.
LECTURE TITLE
97