CS-438
COMPUTER SYSTEMS MODELING
Spring Semester 2023
Batch: 2019
(WEEK # 6 + 7: LECTURE # 11 -14)
FAKHRA AFTAB
LECTURER
DEPARTMENT OF COMPUTER & INFORMATION SYSTEMS ENGINEERING
NED UNIVERSITY OF ENGINEERING & TECHNOLOGY
1
RELIABILITY
Reliability R(t) of a system is defined as the probability that the system
will survive till time t.
Hence, if T is a random variable denoting system’s lifetime, then
??????(�)=??????[??????>�]=1− �
??????(�)
It should be noted that:
•R(0) = 1 (i.e. a system is expected to be operational when it’s initially
put into operation)
•lim
�→∞??????(�)=0 (i.e. nothing can operate forever)
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
MATHMATICAL EXPRESSION OF RELIABILITY
Let,
•N
0 = number of identical components under test at t = 0
•N
s(t) = number of components which survived till time t
•N
f(t) = number of components which failed till time t
Clearly,
??????
�(�)+??????
??????(�)= ??????
0
Then, using fundamental definitions of reliability and probability, we get,
??????(�) =
??????
�(�)
??????
0
= 1 −
??????
f(�)
??????
0
Taking first derivative with respect to time,
??????′(�) = −??????′
??????(�)/??????
0
where, N’
f(t) represents failure rate of components.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Recall the basic definition of reliability,
??????(�) = 1− �
??????(�)
Now, taking first derivative on both sides of with respect to time, we get,
??????′(�)=−??????
??????(�)
Reliability includes:
•correctness (ensuring the system services are as specified),
•precision (ensuring information is delivered at an appropriate level of detail),
and
•timeliness (ensuring that information is delivered when it is required).
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
HAZARD RATE
Let us now calculate the conditional probability that the system will not
survive an additional time duration x, given that it has already survived till
time t.
If we divide this probability by x and the interval x is shrunk to zero (x → 0),
we get the instantaneous failure rate or hazard rate h(t):
Calculate h(t) = ?
If X ¬ EXP (λ)
h(t) =
??????
????????????
????????????
h(t) =
??????ⅇ
−????????????
ⅇ
−????????????
h(t) = λ
i.e. the constant failure rate
The cumulative hazard H(t) is given as:
This gives
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
If T ∼ EXP(λ), then
Clearly, the hazard rate for an exponentially distributed lifetime is
constant.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Task
The hazard rate of a certain component is given by:
1)What are the cumulative hazard function and the reliability function
of this component?
2)What is the probability that it survives until t = 2.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Practice Problem
The failure rate of a certain component is h(t) = λ
0t where λ
0 > 0 is a
constant. Determine the reliability R(t) of the component.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
MORTALITY CURVE
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Phase I Phase II Phase III
Age (Years)
RELIABILITY BLOCK DIAGRAMS (RBD)
1)Series Systems
•When every module (block) in the system must be operational for the
entire system to be functional, the blocks are said to be in series
interconnection.
•E.g. processor, memory and system bus form a series configuration in a
computer system.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Let us define an event E
k = block k is operational.
Then, reliability of block k is R
k = P(E
k). Also,
P[system is working] = P[all modules working] = ??????[�
1∩ �
2∩⋯∩�
�]
Since block failures are independent, therefore, reliability of a series
system is given by,
??????
�=??????[�
1]??????[�
2]⋯??????[�
�]=??????
1??????
2⋯??????
�
For homogeneous modules (i.e. identical reliability),
??????
�= ??????
�
Remarks
•Effect of Component Reliability in a Series System
In a series configuration, the component with the least reliability has
the biggest effect on the system's reliability.
Clearly,
??????
� < �??????�(??????
1,??????
2,⋯,??????
�)
•Effect of Number of Components in a Series System
The number of components is another concern in systems with
components connected reliability-wise in series.
As the number of components connected in series increases, the
system's reliability decreases.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Example 1
A module of a satellite monitoring system has 500 components in series.
The reliability of each component is 0.999.
•Find the reliability of the module.
•If the number of components is reduced to 200, what is the reliability?
Answers:
•0.60637
•0.81864
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
RELIABILITY BLOCK DIAGRAMS (RBD)
2) Parallel System
•A parallel system is a kind of configuration wherein functioning of at least
one system block is sufficient for the entire system to operate correctly.
•Application: where a high degree of operation reliability is required to
avoid any kind of human, economic, or data loss.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
In order to derive an expression for reliability of a parallel system, we
observe that
P[System failing] = P[all modules failing] = ??????[�̅
1∩ �̅
2∩⋯∩�̅
�]
Since block failures are independent, therefore,
1 – P[system working] = ??????[�̅1]??????[�̅2]⋯??????[�̅�]
For homogeneous modules (i.e. identical reliability)
??????
??????=1−(1−??????)
�
Reliability of a parallel system increases with the increase in number of
modules.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Remarks
•Effect of Component Reliability in a Parallel Configuration
The component with the highest reliability in a parallel configuration
has the biggest effect on the system's reliability, since the most reliable
component is the one that will most likely fail last.
•Effect of Number of Components in a Parallel System
For a parallel configuration, as the number of components/subsystems
increases, the system's reliability increases.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Example 2
A system has three parallel components, A, B, and C with reliabilities
0.95, 0.92, and 0.90, respectively.
•Find the reliability of the system.
•Determine the reliability if Component C gets out of order.
Answers:
•0.9996
•0.996
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
3) SERIES-PARALLEL SYSTEM
Many systems use a mix of series and parallel configurations as
exemplified below:
R
ov= 1 –(1 –R * R)(1 –R * R)
= 1 – (1 – R
2
)
2
Example 3
Consider the given series-parallel system & determine the overall
reliability.
Answer:
R
system = 0.984995
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
4) K-OUT-OF-N SYSTEM
•k out of n components need to be functional for the system to be
functional.
•Please note that parallel (k = 1) and series (k = n) systems are special
cases of k-out-of-n system.
•The reliability of such a system is given by binomial distribution:
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Example 4
Consider a system of 6 pumps of which at least 4 must function
properly for system success. Each pump has an 85% reliability for the
mission duration.
What is the probability of success of the system for the same mission
duration?
Answer:
R
6І4 = 0.9546
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
TRIPLE MODULAR REDUNDANCY (TMR)
•A TMR system, also known as a triplex system and a special case of k-
out-of-n system (k = 2, n = 3) is illustrated in the following diagram.
•The ‘V’ block is a majority voter which produces correct output as long
as 2 modules are working correctly. Such TMR systems are very
common across many scientific disciplines.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Task
Q.1) Three subsystems are reliability-wise in series and make up a
system. Subsystem 1 has a reliability of 99.5%, subsystem 2 has a
reliability of 98.7% and subsystem 3 has a reliability of 97.3% for a
mission of 100 hours.
•What is the overall reliability of the system for a 100-hour mission?
•Now consider that these three sub-systems are arranged in parallel
configuration. Compute the overall reliability of the system.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Task
Q.2) Consider a system with three components. Units 1 and 2 are
connected in series and Unit 3 is connected in parallel with the first
two.
•What is the reliability of the system ifR
1 = 99.5%, R
2 = 98.7% and R
3 =
97.3% ?
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
1
3
2
Task
Consider the following network of six routers.
Each router can fail with probability p. Router failures are mutually
independent. Showing all steps, derive expressions for the probability that
the node:
a) A can successfully send packets to node B
b) B can successfully send packets to node C
c) A can successfully send packets to node C
SYSTEM AVAILABILITY
Probability that the system will be up and running and able to deliver
useful services to users at any given time.
Example:
•For the insulin pump system, the most important dependability
properties are:
•availability (it must work when required),
•reliability (it must deliver the correct dose of insulin), and
•safety (it must never deliver a dangerous dose of insulin).
•Security is not an issue as the pump will not maintain confidential
information.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Availability (A) during an
interval is calculated as the
fraction of time a system is up.
Therefore,
We may define unavailability (U) as:
Example Problem
A computer has an MTTF = 34 hr and MTTR = 2.5 hr.
a)Determine the availability?
b)If the MTTR is reduced to 1.5 hr, what MTTF can be tolerated
without decreasing the availability of computer?
Answers:
a)0.9315
b)20.4 hrs
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
FAULT, ERROR AND FAILURE
•Fault: an incorrect step, process, or data definition which causes the
program to perform in an undesirable manner. e.g. absence of a data
validation condition.
•Error: A system state that can lead to undesirable system behavior. e.g.
assignment of zero value to a variable that has to divide some other
variable in the next step.
•Failure: a situation in which the system does not deliver a service
according to its specification.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
System faults do not always result in system errors and system errors do
not necessarily result in system failures.
The reasons for this are as follows:
1)Not all code in a program is executed.
2)Errors are transient.
3)The system may include fault detection and protection mechanisms.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Software Reliability vs Hardware Reliability
1) Software has no aging property (no parts to wear out).
2) There are different sources of improving reliability.
3) Copies of software systems are identical.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
RELIABILITY METRICS
1) Probability of Failure on Demand (POFOD):
•The likelihood that the system will fail when a service request is made.
•Most appropriate for systems where services demanded at relatively
long time intervals and there are serious consequences if service is not
delivered.
•It might be used to specify protection systems such as the reliability of
a pressure relief system in a chemical plant or an emergency shutdown
system in a power plant.
•A POFOD of 0.001 means that one out of a thousand service requests
may result in failure.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
RELIABILITY METRICS (Cont’d)
2) Rate of Occurrence of Failures (ROCOF)
•This metric should be used where regular demands are made on
system services and where it is important that these services are
correctly delivered.
•A ROCOF of 2/100 means that two failures are likely to occur in each
100 operational time units.
•It might be used in the specification of a bank teller system that
processes customer transactions or in a hotel reservation system.
•Sometimes called the failure intensity.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
RELIABILITY METRICS (Cont’d)
3) Mean time to failure (MTTF)
•Average time between observed system failures.
•Should be used in systems with long transactions.
•MTTF should be longer than the average transaction length.
•Examples of systems using this metric are word processor systems and
CAD systems.
•An MTTF of 500 means one failure can be expected every 500 time
units.
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
RELIABILITY METRICS (Cont’d)
4) Availability (AVAIL)
•This metric should be used in non-stop systems where users expect
the system to deliver a continuous service.
•Examples of such systems are telephone switching systems and railway
signaling systems.
•Availability of 0.998 means that the system is likely to be available for
998 of every 1,000 time units. It is defined as:
Availability = [MTTF/(MTTF + MTTR)] x 100%
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
RELIABILITY VALIDATION
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Fig 2: The reliability measurement process
This process involves four stages:
1.Studying existing systems of same type to establish an operational
profile.
2.Construct a set of test data that reflect the operational profile.
RELIABILITY VALIDATION (Cont’d)
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)
Fig 2: The reliability measurement process
3. Test the system using these data and then count the number and
type of failures that occur.
4. After observing a statistically significant number of failures,
compute the appropriate reliability metric value. This approach
sometimes called statistical testing.
Task
Three identical computers are networked together in parallel
configuration. Their failure rate is given by λ = 0.2 failures/year.
Calculate:
i)MTTF of each computer
ii)Reliability at the end of five years
Answers:
i)5 years
ii)0.747
Prepared by: Ms. Fakhra Aftab (Lecturer, CISD, NEDUET)