Reliability Audit Lab
VEM RAL
DFR – Fundamentals for Engineers
DFR – Design for Reliability
Reliability Audit Lab
VEM RAL
Topics that will be covered:
1. Need for DFR
2. DFR Process
3. Terminology
4. Weibull Plotting
5. System Reliability
6. DFR Testing
7. Accelerated Testing
Reliability Audit Lab
VEM RAL
1. Need for DFR
Reliability Audit Lab
VEM RAL
What Customers Care about:
1. Product Life…. i.e., useful life before wear-out.
2. Minimum Downtime…. i.e., Maximum MTBF.
3. Endurance…. i.e., # operations, robust to
environmental changes.
4.Stable Performance…. i.e., no degradation in CTQs.
5. ON time Startup…. i.e., ease of system startup
Reliability Audit Lab
VEM RAL
Reliability Audit Lab
VEM RAL
Reliable Product Vision
Failure Mode
Identification
(Pre-Launch)
Failure Rate Resources/Costs
Identify & “eliminate”
inherent failure modes
before launch. (Minimize
Excursions!)
Start with lower “running
rate”, then aggressively
“grow” reliability. (Reduce
Warranty Costs)
Reduce overall costs by
employing DFR from the
beginning.
Take control of our product quality and aggressively drive to our goals
Time
#
F
a
i
l
u
r
e
Mo
d
e
s
DFR
No DFR
Time
F
a
i
l
u
r
e
R
a
t
e
Goal
Release
DFR
No DFR
Time
R
e
s
o
u
r
c
e
s
/
c
o
s
t
s
DFR
50%
5%
Release
No DFR
Develop Reliability metrics
Verification
• Execute Reliability Test strategy
• Continue Growth Testing
• Accelerated Tests
• Demonstration Testing
• Agency / Compliance Testing
Production / Field
• Establish audit program
• FRACAS system using ‘Clarify’
• Correlate field data & test results
System Model
• Construct functional block diagrams
•
ID critical comps. & failure potential•
Define Reliability model
• Allocate reliability targets
Design
• Apply robust design tools
•
•
DFSS tools
Generate life predictions
• Begin Growth Testing
• Field data analysis
Reliability Audit Lab
VEM RAL
Institute Reliability Validation Program
•Implement process firewalls & sensors to hold design robustness
•Develop and implement long-term reliability validation audit
Legacy Product DFR Process . . .
1
2
3
4
5
Develop & Execute Reliability Growth Plan
•Determine root cause for all identified failures
• Redesign process or parts to address failure mode pareto
•Validate reliability improvement through accelerated life testing & field betas
Develop Reliability Profile & Goals
•Develop P-Diagrams & System Block Diagram
•Generate Reliability Weibull plots for operational endurance
•Allocate reliability goals to key subsystems
•Identify reliability gaps between existing product & goals for each subsystem
Analyze Field & In-house Endurance Test Data
•Develop product Fault Tree Analysis
•Identify and pareto observed failure modes
Review Historical Data
•Review historical reliability & field failure data
•Review field RMA’s
•Review customer environments & applications
Reliability Audit Lab
VEM RAL
Design For Reliability Program Summary
DFR needs to be part of the entire product development cycle
• Customer reliability expectations & needs must be fully understood
• Reliability must be viewed from a “systems engineering” perspective
• Product must be designed for the intended use environment
• Reliability must be statistically verified (or risk must be accepted)
• Field data collection is imperative (environment, usage, failures)
• Manufacturing & supplier reliability “X’s” must be actively managed
Keys to DFR:
Reliability Audit Lab
VEM RAL
What do we mean by
1. Reliability
2. Failure
3. Failure Rate
4. Hazard Rate
5. MTTF / MTBF
Reliability Audit Lab
VEM RAL
1. Reliability R(t): The probability that an item will perform its intended
function without failure under stated conditions for a
specified period of time
2. Failure: The termination of the ability of the product to perform its
intended function
3. Failure Rate [F(t)]: The ratio of no. of failures within a sample to the
cumulative operating time.
4. Hazard Rate [h(t)]: The instantaneous probability of failure of an item
given that it has survived until that time, sometimes
called as instantaneous failure rate.
Reliability Audit Lab
VEM RAL
Failure Rate Calculation Example
EXAMPLE:A sample of 1000 meters is tested for a week,
and two of them fail. (assume they fail at the end of the
week). What is the Failure Rate?
hours
failures
RateFailure
7*24*1000
2
= =
2
168,000
failures/hour
= 1.19E-5 failures/hr
Reliability Audit Lab
VEM RAL
Probability Distribution Function (PDF):
The Probability Distribution Function (PDF) is the distribution f(t) of times to
failure. The value of f(t) is the probability of the product failing precisely at
time t.
time
f
(t)
Probability Distribution Function
t
Reliability Audit Lab
VEM RAL
Probability
Distribution
Probability Density
Function, f(t)
Variate,
Range, t
Exponential
Weibull
Normal
Log
Normal
ft=λe
−λt
ft=
β
η
⋅
t
η
β−1
⋅e
−
t
β
β
ft=
1
σ2π
⋅e
−t−μ
2
2σ
2
ft=
1
σt2π
⋅e
lnt−μ
2
2σ
2
0≤t∞
0≤t∞
−∞t∞
0≤t∞
Common Distributions
Reliability Audit Lab
VEM RAL
The Cumulative Distribution Function (CDF) represents the probability that the product
fails at some time prior to t. It is the integral of the PDF evaluated from 0 to t.
Cumulative Distribution Function (CDF) :
CDF=Ft=∫
0
t
ftdt
time
f
(t)
Probability Distribution Function
t
1
Cumulative
Distribution Function
Reliability Audit Lab
VEM RAL
Reliability Function R(t)
The reliability of a product is the probability that it does not fail before time t. It is therefore
the complement of the CDF:
Rt=1−Ft=1−∫
0
t
ftdt
or
Rt=∫
t
∞
ftdt
time
f
(t)
t
R(t) = 1-F(t)
Probability Density Function
Typical characteristics:
• when t=0, R(t)=1
• when t®¥, R(t) ®0
time
f
(t)
t
R(t) = 1-F(t)
Probability Density Function
Reliability Audit Lab
VEM RAL
Hazard Function h(t)
The hazard function is defined as the limit of the failure rate as Δt
approaches zero.
In other words, the hazard function or the instantaneous failure rate is
obtained as
h(t) = lim [R(t) – R(t+Δt)] / [Δt * R(t)]
Δt -> 0
The hazard function or hazard rate h(t) is the conditional probability of failure
in the interval t to (t + Δt), given that there was no failure at t. It is expressed
as
h(t) = f(t) / R(t).
Reliability Audit Lab
VEM RAL
Hazard Functions
As shown the hazard rate is a function of time.
What type of function does hazard rate exhibit with time?
The general answer is the bathtub-shaped function.
The sample will experience a high failure rate at the beginning of the
operation time due to weak or substandard components, manufacturing
imperfections, design errors and installation defects. This period of
decreasing failure rate is referred to as the “infant mortality region”
This is an undesirable region for both the manufacturer and consumer
viewpoints as it causes an unnecessary repair cost for the manufacturer
and an interruption of product usage for the consumer.
The early failures can be minimized by improving the burn-in period of
systems or components before shipments are made, by improving the
manufacturing process and by improving the quality control of the products.
Reliability Audit Lab
VEM RAL
At the end of the early failure-rate region, the failure rate will eventually
reach a constant value. During this constant failure-rate region the failures
do not follow a predictable pattern but occur at random due to the changes
in the applied load.
The randomness of material flaws or manufacturing flaws will also lead to
failures during the constant failure rate region.
The third and final region of the failure-rate curve is the wear-out region.
The beginning of the wear out region is noticed when the failure rate starts
to increase significantly more than the constant failure rate value and the
failures are no longer attributed to randomness but are due to the age and
wear of the components.
To minimize the effect of the wear-out region, one must use periodic
preventive maintenance or consider replacement of the product.
Reliability Audit Lab
VEM RAL
Infant Mortality
Random Failure
(Useful Life)
Wear out
Manufacturing
Defects
Random
Failures
Wear out
Failures
Product's Hazard Rate Vs. Time :
“The Bathtub Curve”
Time
H
a
z
a
r
d
R
a
t
e
,
h
(
t
)
h(t) decreasing
h(t) constant
h(t) increasing
Reliability Audit Lab
VEM RAL
Mean Time To Failures [MTTF] -
One of the measures of the system's reliability is the mean time to
failure (MTTF). It should not be confused with the mean time between
failure (MTBF). We refer to the expected time between two successive
failures as the MTTF when the system is non-repairable.
When the system is repairable we refer to it as the MTBF
Now let us consider n identical non-repairable systems and observe the
time to failure for them. Assume that the observed times to failure are
t
1
, t
2
, .........,t
n.
The estimated mean time to failure, MTTF is
MTTF = (1/n)Σ t
i
Reliability Audit Lab
VEM RAL
EXAMPLE: A motor is repaired and returned to service
six times during its life and provides 45,000
hours of service. Calculate MTBF.
Useful Life Metrics: Mean Time
Between Failures (MTBF)
MTBF=
Totaloperatingtime
¿offailures
=
45,000
6
=7,500hours
MTBF or MTTF is a widely-used metric during the
Useful Life period, when the hazard rate is constant
(also Mean Cycles Between Failures, MCBF, etc.)
Mean Time Between Failures [MTBF] - For a repairable
item, the ratio of the cumulative operating time to the
number of failures for that item.
Reliability Audit Lab
VEM RAL
The Exponential Distribution
If the hazard rate is constant over time, then the product follows the exponential
distribution. This is often used for electronic components.
ht=λ=constant
MTBF mean time between failures=
1
λ
ft=λe
−λt
Ft=1−e
−λt
Rt=e
−λt
At MTBF: Rt=e
−λt
=e
−λ
1
λ
=e
−1
=36.8
Appropriate tool if failure rate is known to be constant
Reliability Audit Lab
VEM RAL
Useful Life Metrics: Reliability
R=e
−
t
MTBF
=e
−FRt Where:t = Mission length
(uptime or cycles
in question)
EXAMPLE: If MTBF for a motor is 7,500 hours, the probability
of operating for 30 days without failure is ...
R=e
−
30∗24hours
7500hours
=0.908=90.8
A mathematical model for reliability during Useful Life
Reliability can be described by the single parameter exponential distribution when
the Hazard Rate, l, is constant (i.e. the “Useful Life” portion of the bathtub curve),
Reliability Audit Lab
VEM RAL
• Originally proposed by the Swedish
engineer Waloddi Weibull in the early 1950’s
• Statistically represented fatigue failures
• Weibull probability density function (PDF,
distribution of values):
Weibull Probability Distribution
t = Mission length (time, cycles, etc.)
b = Weibull Shape Parameter, “Slope”
h = Weibull Scale Parameter, “Characteristic Life”
Waloddi Weibull 1887-1979
ft=
βt
β-1
η
β
e
−
t
η
β
Equation valid for minimum life = 0
Reliability Audit Lab
VEM RAL
This powerful and versatile reliability function is capable of modeling
most real-life systems because the time dependency of the failure rate
can be adjusted.
The Weibull Distribution
Rt=1−Ft=e
−
t
η
β
ft=
βt
β−1
η
β
e
−
t
η
β
ht =
β
η
β
t
β-1
Reliability Audit Lab
VEM RAL
•Exponential when b = 1.0
•Approximately normal when b = 3.44
•Time dependent hazard rate
Weibull PDF
500 1000 1500 2000
0.001
0.002
0.003
0.004
0.005
b=0.5
h=1000
b=3.44
h=1000
b=1.0
h=1000
ft=
βt
β−1
η
β
e
−
t
η
β
Reliability Audit Lab
VEM RAL
ht =
ft
1 - Ft
=
ft
Rt
ht =
β
h
t
η
β−1
exp [
−
t
η
β
]
1 - {
1 - exp [
−
t
η
β
]}
ht =
β
η
β
t
β-1
0 500 1000 1500 2000 2500
0.002
0.004
0.006
b=3.44
h=1000
b=0.5
h=1000
b=1.0
h=1000
h(t)
Time
Weibull Hazard Function
b < 1: Highest failure rate early-
“Infant Mortality”
b > 1: Highest failure rate later-
“Wear-Out”
b = 1: Constant failure rate
Reliability Audit Lab
VEM RALWeibull Reliability Function
Time
0 500 1000 1500 2000 2500
0
0.2
0.4
0.6
0.8
1
b=3.44
h=1000
b=1.0
h=1000
b=0.5
h=1000
R(t)
Rt=1−Ft=e
−
t
η
β
Reliability is the probability that the part survives to time t.
Reliability Audit Lab
VEM RAL
Beta (b): The slope of the Weibull CDF when printed on Weibull paper
B-life: A common way to express values of the cumulative density function - B10
refers to the time at which 10% of the parts are expected to have failed.
CDF: Cumulative Density Function expresses the time-dependent probability that a
failure occurs at some time before time t.
Eta (h): The characteristic life, or time at which 63.2% of the parts are expected to
have failed. Also expressed as the B63.2 life. This is the y-intercept of the
CDF function when plotted on Weibull paper.
PDF: Probability Density Function expresses the expected distribution of failures
over time.
Weibull plot:A plot where the x-axis is scaled as ln(time) and the y-axis is scaled as
ln(ln(1 / (1-CDF(t))). The Weibull CDF plotted on Weibull paper will be a
straight line of slope b and y intercept = ln(ln(1 / (1-CDF(0))) = h.
Summary of Useful Definitions - Weibull Analysis
Reliability Audit Lab
VEM RAL
•Comparison: test results for a
redesigned product can be
plotted against original product
or against goals
Weibull Analysis
What is a Weibull Plot ?
Confidence on Fit
Observed
Failures
Weibull Best Fit
•Easily generated, easily
interpreted graphical read-out
•Nominal “best-fit” line, plus
confidence intervals
•Log-log plot of probability of
failure versus age for a product
or component
Reliability Audit Lab
VEM RAL
Scale and Shape are the Key Weibull Parameters
Weibull Shape Parameter (b ) and
Scale Parameter (h ) Defined
h is called the CHARACTERISTIC LIFE
For the Weibull distribution, the characteristic life is
equal to the scale parameter, h. This is the time at
which 63.2% of the product will have failed.
b is called the SLOPE
For the Weibull distribution, the slope describes the
steepness of the Weibull best-fit line (see following
slides for more details). b also has a relationship
with the trend of the hazard rate, as shown on the
“bathtub curves” on a subsequent slide.
Reliability Audit Lab
VEM RAL
b and the Bathtub Curve
b < 1
•Implies “infant mortality”
•If this occurs:
Failed products “not to print”
Manufacturing or assembly defects
Burn-in can be helpful
•If a component survives infant mortality
phase, likelihood of failure decreases with
age.
b = 1
•Implies failures are “random”, individually
unpredictable
•An old part is as good as a new part (burn-
in not appropriate)
•If this occurs:
Failures due to external stress,
maintenance or human errors.
Possible mixture of failure modes
1 < b < 4
•Implies mild wearout
•If this occurs
Low cycle fatigue
Corrosion or Erosion
Scheduled replacement may be cost
effective
b > 4
•Implies rapid wearout
•If this occurs, suspect:
Material properties
Brittle materials like ceramics
•Not a bad thing if it happens after mission
life has been exceeded.
Reliability Audit Lab
VEM RAL
5. DFR – System Reliability
Reliability Audit Lab
VEM RAL
System Reliability Evaluation
A system (or a product) is a collection of components arranged according
to a specific design in order to achieve desired functions with acceptable
performance and reliability measures.
Clearly, th type of components used, their qualities, and the design
configuration in which they are arranged have a direct effect on the
system performance an its reliability. For example, a designer may use a
smaller number of high-quality components and configure them in a such
a way to result in a highly reliable system, or a designer may use larger
number of lower-quality components and configure them differently in
order to achieve the same level of reliability.
Once the system is configured, its reliability must be evaluated and
compared with an acceptable reliability level. If it does not meet the
required level, the system should be redesigned and its reliability should
be re-evaluated.
Reliability Audit Lab
VEM RAL
Reliability Block Diagram (RBD) Technique
The first step in evaluating a system's reliability is to construct a reliability
block diagram which is a graphical representation of the components of the
system and how they are connected.
The purpose of RBD technique is to represent failure and success criteria
pictorially and to use the resulting diagram to evaluate System Reliability.
Benefits
The pictorial representation means that models are easily understood and
therefore readily checked.
Block diagrams are used to identify the relationship between elements in the
system. The overall system reliability can then be calculated from the
reliabilities of the blocks using the laws of probability.
Block diagrams can be used for the evaluation of system availability
provided that both the repair of blocks and failures are independent
events, i.e. provided the time taken to repair a block is dependent only on
the block concerned and is independent of repair to any other block
Reliability Audit Lab
VEM RAL
Elementary models
Before beginning the model construction, consideration should be given to
the best way of dividing the system into blocks. It is particularly
important that each block should be statistically independent of all
other blocks (i.e. no unit or component should be common to a number
of blocks).
The most elementary models are the following
Series
Active parallel
m-out-of-n
Standby models
Reliability Audit Lab
VEM RAL
Simple Series and Parallel System
A B C Z
a) Series System
Figure a shows the units A,B,C,….Z constituting a system. The interpretation can be stated as
‘any unit failing causes the system as a whole to fail’, and the system is referred to as active series system.
Under these conditions, the reliability R(s) of the system is given by
R(s) = Ra * Rb * Rc * ………Rz
X
Y
I
O
I
O
b) Parallel System
Figure b shows the units X and Y that are operating in such a way that the system will survive as long as
At lest one of the unit survives. This type of system is referred to as an active parallel system.
R(s) = 1 – (1 – Rx)(1 – Ry)
Typical RBD configurations and related formulae
Reliability Audit Lab
VEM RAL
A Series / Parallel System
A1 B1 C1 Z1
I
O
A2 B2 C2 Z2
c) Series / ParallelSystem
When blocks such as X and Y themselves comprise sub-blocks in series, block diagrams of the
type are illustrated in figure c.
Rx = Ra1 * Rb1 * Rc1 *……..Rz1;
Ry = Ra2 * Rb2 * Rc2 *……..Rz2
Rs = 1 – (1 – Rx)(1 – Ry)
Reliability Audit Lab
VEM RAL
m-out-of-n units
The figure represents instances where system success is assured whenever at least m of
n identical units are in an operational state. Here m = 2, n = 3.
Rs = (Rx)^3 + 3*(Rx)^2*Fx, where Fx = 1 – Rx.
X
X
X 2/3
I O
d) m-out-of-n System
Reliability Audit Lab
VEM RAL
Reliability Testing allows us to:
Reliability Testing - Why?
• Provide a path to “grow” a product’s reliability by identifying weak
points in the design.
• Have confidence that our sample-based prediction will accurately
reflect the performance of the entire population.
• Determine if a product’s design is capable of performing its intended
function for the desired period of time.
• Identify failures caused by severe applications that exceed the ratings,
and recognize opportunities for the product to safely perform under
more diverse applications.
• Confirm the product’s performance in the field.
Reliability Audit Lab
VEM RAL
Reliability Testing answers questions like …
Reliability Testing - Measures
• What is my product’s Failure Rate?
.
. .
. .
.
These metrics and more can be obtained with the right reliability test
• Which distribution does my data follow?
• What is the expected life?
• What does my hazard function look like?
• What failure modes are present?
• How “mature” is my product’s reliability?
Reliability Audit Lab
VEM RAL
Four Major Categories of Reliability Testing
• Reliability Growth Tests (RGT)
• Reliability Demonstration Tests (RDT)
• Production Reliability Acceptance Tests (PRAT)
• Reliability Validation (RV)
- Normal Testing
- Accelerated Testing
Reliability Audit Lab
VEM RAL
Scope: To determine a product’s physical limitations, functional
capabilities and inherent failure mechanisms.
Used early & throughout the design process
Reliability Testing - Growth Testing
• Emphasis is on discovering & “eliminating” failure modes
• Failures are welcome. . . represent data sources
• Failures in development = less failures in field
• Used with a changing design to drive reliability growth
• Sample size is typically small
• Test Types: Normal or Accelerated Testing
• Can be very helpful early in process when done on competitor
products which are sufficiently similar to the new design.
Reliability Audit Lab
VEM RAL
Scope: To demonstrate the product’s ability to fulfill reliability,
availability & design requirements under realistic conditions.
Reliability Testing … Demonstration Testing
Used at end of design stages to demonstrate compliance to specification
• Failures are no longer hoped for, because they jeopardize compliance (though
it’s still better to catch a problem before rather than after launch!)
• Management tool . . . provides means for verifying compliance
• Provide reliability measurement, typically performed on a static design
(subsequent design changes may invalidate the demonstrated reliability results)
• Sample size is typically larger, due to need for degree of confidence in results
and increased availability of samples.
Reliability Audit Lab
VEM RAL
Scope: To ensure that variation in materials, parts, &
processes related to move from prototypes to full production
does not affect product reliability
Reliability Testing … Production Reliability
Acceptance Testing (PRAT)
Screens and Audits precipitate and detect hidden defects
• Provides feedback for continuous improvement in sourcing/manufacturing
• Performed during full production, verifies that predictions based on
prototype results are valid in full production
• Sample size ranges from full(screen) to partial (audit)
• Test Types: Highly Accelerated Stress Screens/Audits (HASS/A),
Environmental Stress Screening (ESS), Burn in
Reliability Audit Lab
VEM RAL
Scope: To ensure that the product is performing reliably in the
actual customer environment/application.
Reliability Testing … Validation
Reliability Validation tracks field data on Customer Dashboards
• Provides field feedback on the success of the design
• “Testing results” based on actual field data sources
• Helps to improve future design / redesign & prediction methods
• Requires effective data collection & corrective action process
• Sample size depends on the customer & product type
Reliability Audit Lab
VEM RAL
Reliability Tests are critical at all stages!
Reliability Testing … The Path
Legacy Products:
Initial Design
Set Reliability Goals
Develop Models
Initial Design
Accelerated Testing
NPI (New Products):
Growth Testing
Demonstration Testing
Acceptance Testing
Growth Testing
Validation Testing
Implementation
Implement Production
Reliability Demonstration
Audit Programs
Establish service schedule
Keep updated dashboards
Ensure Data Collection
Improve future design
Post-Sales Service
Demonstration Testing
NPI Pilot Readiness
Mature Design
Pilot Testing
Implementation
Implement changes
Reliability Demonstration
Audit Programs
Product Redesign
Revise goals
Redefine models
Product redesign
Verification
Reproduce Failure
Reliability Verification
Complaint generated
Create case Clarify
Field Data
Acquisition
Validation Testing
Acceptance Testing
Reliability Audit Lab
VEM RAL
Results @ high stress + stress-life relationship = Results @ normal stress
.
.
.
..
.
.
.
.
.
..
.
BASIC CONCEPT
Stress
T
i
m
e
t
o
F
a
i
l
u
r
e
Accelerated Testing
Model:
The model is how we extrapolate back
to normal stress levels.
Common Models:
}
To predict here,
(Normal stress level)
}
we test here
(Elevated stress level)
Scope : Accelerated testing allows designers to make predictions about the
life of a product by developing a model that correlates reliability under
accelerated conditions to reliability under normal conditions.
• Arrhenius: Thermal
• Inverse Power Law: Non-Thermal
• Eyring: Combined
Reliability Audit Lab
VEM RAL
Key steps in planning an accelerated test:
• Choose a stress to elevate: requires an understanding of the anticipated
failure mechanism(s) - must be relevant (temp. & vibration usually apply)
Applicability of technique depends on careful planning and execution
Accelerated Testing
• Determine the accelerating model: requires knowledge of the nature of
the acceleration of this failure mechanism, as a function of the accelerating
stress.
• Select elevated stress levels: requires a previous study of the product’s
operating & destructive limits to ensure that the elevated stress level does
not introduce new failure modes which would not occur at normal
operating stress levels.
Reliability Audit Lab
VEM RAL
Parametric Reliability Models
One of the most important factors that influence the design process of a
product or a system is the reliability values of its components.
In order to estimate the reliability of the individual components or the entire
system, we may follow one or more of the following approaches.
➢Historical Data
➢Operational Life Testing
➢Burn-In Testing
➢Accelerated Life Testing
Reliability Audit Lab
VEM RAL
Approach 1 : Historical Data
The failure data for the components can be found in data banks such as
➢GIDEP (Government-Industry Data Exchange Program),
➢MIL-HDBK-217 (which includes failure data for components as well as
procedures for reliability prediction),
➢AT&T Reliability Manual and
➢Bell Communications Research Reliability Manual.
In such data banks and manuals, the failure data are collected from
different manufacturers and presented with a set of multiplying factors
that relate to different manufacturer's quality levels and environmental
conditions