CriticalSystems
LoganathanR.
1Prof. Loganathan R., CSE, HKBKCE
Objectives
•Toexplainwhatismeantbyacriticalsystem
wheresystemfailurecanhaveseverehuman
oreconomicconsequence.
•Toexplainfourdimensionsofdependability-•Toexplainfourdimensionsofdependability-
availability,reliability,safetyandsecurity.
•Toexplainthat,toachievedependability,you
need to avoid mistakes, detect and remove
errorsandlimitdamagecausedbyfailure.
2Prof. Loganathan R., CSE, HKBKCE
CriticalSystems
•If the system failure results in significant economic losses,
physical damages or threats to human life than the system is
calledcriticalsystems.3typesofitare:
•Safety-criticalsystems
–Failureresultsinlossoflife,injuryordamagetotheenvironment;–Failureresultsinlossoflife,injuryordamagetotheenvironment;
–Chemicalplantprotectionsystem;
•Mission-criticalsystems
–Failureresultsinfailureofsomegoal-directedactivity;
–Spacecraftnavigationsystem;
•Business-criticalsystems
–Failureresultsinhigheconomiclosses;
–Customeraccountingsysteminabank;
3Prof. Loganathan R., CSE, HKBKCE
Systemdependability
•The most important emergent property of a critical
system is its dependability. It covers the related
system attributes of availability, reliability, safety &
security.
•Importanceofdependability
–Systems that are unreliable, unsafe or insecure are often rejected
bytheirusers(refusetotheproductfromthesamecompany).
–Systemfailurecostsmaybeveryhigh.(reactor/aircraftnavigation)
–Untrustworthy systems may cause information loss with a high
consequentrecoverycost.
4Prof. Loganathan R., CSE, HKBKCE
Socio-technicalcriticalsystemsFailures
•Hardwarefailure
–Hardware fails because of design and manufacturing
errors or because componentshave reached the end
oftheirnaturallife.
•Softwarefailure•Softwarefailure
–Softwarefailsduetoerrorsinitsspecification,design
orimplementation.
•HumanOperatorfailure
–Failtooperatecorrectly.
–Now perhaps the largest single cause of system
failures.
6Prof. Loganathan R., CSE, HKBKCE
ASimplesafetyCriticalSystem
•Exampleofsoftware-controlledinsulinpump.
•Usedbydiabeticstosimulatethefunctionof
insulin,anessentialhormonethatmetabolises
bloodglucose.bloodglucose.
•Measuresbloodglucose(sugar)usingamicro-
sensor and computes the insulin dose
requiredtometabolisetheglucose.
7Prof. Loganathan R., CSE, HKBKCE
Insulinpumporganisation
Needle
assembly
Pump
Clock
Insulin reservoir
8Prof. Loganathan R., CSE, HKBKCE
Sensor
Display1 Display2
AlarmController
Power supply
Dependabilityrequirements
•Thesystemshallbeavailabletodeliverinsulin
whenrequiredtodoso.
•Thesystemshallperformreliablyanddeliver
thecorrectamountofinsulintocounteractthecorrectamountofinsulintocounteract
thecurrentlevelofbloodsugar.
•The essential safety requirement is that
excessive doses of insulin should never be
deliveredasthisispotentiallylifethreatening.
10Prof. Loganathan R., CSE, HKBKCE
SystemDependability
•The dependability of a system equates to its
trustworthiness.
•Adependablesystemisasystemthatistrustedbyits
users.
•Principaldimensionsofdependabilityare:•Principaldimensionsofdependabilityare:
–Availability:- Probability that it will be up & running & able to deliver
atanygiventime;
–Reliability:-Correct delivery of services as expected by user over a
givenperiodoftime;
–Safety:-A Judgment of how likely the system will cause damage to
peopleoritsenvironment;
–Security:- A Judgment of how likely the system can resist accidental
ordeliberateintrusions;
11Prof. Loganathan R., CSE, HKBKCE
Dimensionsofdependability
Availability Reliability Security
Dependability
Safety
12Prof. Loganathan R., CSE, HKBKCE
Availability Reliability Security
The ability of the system
to deliver services when
requested
The ability of the
system to deliver
services as specified
The ability of the system
to operate without
catastrophic failure
The ability of the system
to protect itself against
accidental or deliberate
intrusion
Safety
Otherdependabilityproperties
•Repairability
–Reflectstheextenttowhichthesystemcanberepairedintheeventof
afailure
•Maintainability
–Reflects the extent to which the system can be adapted to new
requirements;requirements;
•Survivability
–Reflects the extent to which the system can deliver services while it is
underhostileattack;
•Errortolerance
–Reflects the extent to which user input errors can be avoided and
tolerated.
13Prof. Loganathan R., CSE, HKBKCE
Dependabilityvsperformance
•It is very difficult to tune systems to make
themmoredependable
•High level dependability can be achieved by
expenseofperformance.Becauseitincludeexpenseofperformance.Becauseitinclude
extra/ redundant code to perform necessary
checking
•Italsoincreasesthecost.
14Prof. Loganathan R., CSE, HKBKCE
Dependabilitycosts
•Dependabilitycoststendtoincreaseexponentiallyas
increasinglevelsofdependabilityarerequired
•Therearetworeasonsforthis
–Theuseofmoreexpensivedevelopmenttechniquesand–Theuseofmoreexpensivedevelopmenttechniquesand
hardwarethatarerequiredtoachievethehigherlevelsof
dependability
–The increased testing and system validation that is
required to convince the system client that the required
levelsofdependabilityhavebeenachieved
15Prof. Loganathan R., CSE, HKBKCE
Costsofincreasingdependability
C
o
s
t
16Prof. Loganathan R., CSE, HKBKCE
Low Medium High Very
high
Ultra-high
Dependability
Availabilityandreliability
•Reliability
–The probability of failure-free system operation
overaspecifiedtimeinagivenenvironmentfora
specificpurpose
•Availability•Availability
–The probabilitythat a system, at a point in time,
will be operational and able to deliver the
requestedservices
•Both of these attributes can be expressed
quantitatively
17Prof. Loganathan R., CSE, HKBKCE
Availabilityandreliability
•It is sometimes possible to include system
availabilityundersystemreliability
–Obviously if a system is unavailable it is not
deliveringthespecifiedsystemservices
•However,itispossibletohavesystemswith•However,itispossibletohavesystemswith
lowreliabilitythatmustbeavailable.Solong
assystemfailurescanberepairedquicklyand
donotdamagedata,lowreliabilitymaynotbe
aproblem
•Availabilitytakesrepairtimeintoaccount
18Prof. Loganathan R., CSE, HKBKCE
Faults,Errorsandfailures
•Failures are a usually a result of system errors that
arederivedfromfaultsinthesystem
•However, faults do not necessarily result in system
errors
–Thefaultysystemstatemaybetransientand‘corrected’–Thefaultysystemstatemaybetransientand‘corrected’
beforeanerrorarises
•Errorsdonotnecessarilyleadtosystemfailures
–The error can be correctedby built-inerror detectionand
recovery
–The failure can be protected against by built-inprotection
facilities. These may, for example, protect system
resourcesfromsystemerrors
19Prof. Loganathan R., CSE, HKBKCE
Reliabilityterminology
Term Description
System failure
An event that occurs at some point in time when the system
does not deliver a service as expected by its users
20Prof. Loganathan R., CSE, HKBKCE
System error
An erroneous system state that can lead to system behaviour
that is unexpected by system users.
System fault
A characteristic of a software system that can lead to a
system error. For example, failure to initialise a variable
could lead to that variable having the wrong value when it is
used.
Human error or
mistake
Human behaviour that results in the introduction of faults
into a system.
ReliabilityImprovement
•Threeapproachestoimprovereliability
•Faultavoidance
–Development technique are used that either minimise the possibility
of mistakes or trap mistakes before they result in the introduction of
systemfaults
•Faultdetectionandremoval•Faultdetectionandremoval
–Verification and validation techniques that increase the probability of
detectingandcorrectingerrorsbeforethesystemgoesintoserviceare
used
•Faulttolerance
–Run-time techniques are used to ensure that system faults do not
resultinsystemerrorsand/orthatsystemerrorsdonotleadtosystem
failures
21Prof. Loganathan R., CSE, HKBKCE
Reliabilitymodelling
•You can model a system as an input-output
mapping where some inputs will result in
erroneousoutputs
•Thereliabilityofthesystemistheprobability
thataparticularinputwilllieinthesetofthataparticularinputwilllieinthesetof
inputsthatcauseerroneousoutputs
•Different people will use the system in
differentwayssothisprobabilityisnotastatic
systemattributebutdependsonthesystem’s
environment
22Prof. Loganathan R., CSE, HKBKCE
Input/outputmapping
Input set
I
e
Inputs causing
erroneous outputs
23Prof. Loganathan R., CSE, HKBKCE
Output set O
e
Program
Erroneous
outputs
Reliabilityperception
User
Possible
inputs
24Prof. Loganathan R., CSE, HKBKCE
User
3
User
1 Erroneous
inputs
User
2
Reliabilityimprovement
•Removing X% of the faults in a system will not necessarily
improve the reliability by X%. A study at IBM showed that
removing 60% of product defects resulted in a 3%
improvementinreliability
•Programdefectsmaybeinrarelyexecutedsectionsofthe•Programdefectsmaybeinrarelyexecutedsectionsofthe
codesomayneverbeencounteredbyusers.Removingthese
doesnotaffecttheperceivedreliability
•A program with known faults may therefore still be seen as
reliablebyitsusers
25Prof. Loganathan R., CSE, HKBKCE
Safety
•Safety is a property of a system that reflects the system
shouldneverdamagepeopleorthesystem’senvironment
•Forexamplecontrol&monitoringsystemsinaircraft
•It is increasingly important to consider software safety as
moreandmoredevicesincorporatesoftware-basedcontrolmoreandmoredevicesincorporatesoftware-basedcontrol
systems
•Safety requirements are exclusive requirements i.e. they
exclude undesirable situations rather than specify required
systemservices
•Safetycriticalsoftwareare2types
26Prof. Loganathan R., CSE, HKBKCE
TypesofSafety-criticalsoftware
•Primarysafety-criticalsystems
–Embedded software systems whose failure can cause
hardware malfunction which results inhuman injury or
environmentaldamage.
•Secondarysafety-criticalsystems•Secondarysafety-criticalsystems
–Systemswhosefailureindirectlyresultsininjury.
–Eg.MedicalDatabaseholdingdetailsofdrugs
•Discussion here focuses on primary safety-critical
systems
27Prof. Loganathan R., CSE, HKBKCE
Safetyandreliability
•Safetyandreliabilityarerelatedbutdistinct
–Ingeneral,reliabilityandavailabilityarenecessary
butnotsufficientconditionsforsystemsafety
•Reliabilityisconcernedwithconformancetoa
givenspecificationanddeliveryofservice
•Reliabilityisconcernedwithconformancetoa
givenspecificationanddeliveryofservice
•Safety is concerned with ensuring system
cannotcausedamageirrespectiveofwhether
ornotitconformstoitsspecification
28Prof. Loganathan R., CSE, HKBKCE
Unsafereliablesystems
•Reasons why reliable system are not
necessarilysafe:
•Specificationerrors
•It does not describe the required behaviour in some
criticalsituationscriticalsituations
•Hardwarefailuresgeneratingspuriousinputs
•Hardtoanticipateinthespecification
•Operatorerror
•Context-sensitive commands i.e. issuing the right
commandatthewrongtime
29Prof. Loganathan R., CSE, HKBKCE
Safetyterminology
Term Definition
Accident (or
mishap)
An unplanned event or sequence of events which results in human death or injury,
damage to property or to the environment. A computer-controlled machine injuring its
operator is an example of an accident.
Hazard
A condition with the potential for causing or contributing to an accident. A failure of the
sensor that detects an obstacle in front of a machine is an example of a hazard.
Ameasureofthelossresultingfromamishap.Damagecanrangefrommanypeople
30Prof. Loganathan R., CSE, HKBKCE
Damage
Ameasureofthelossresultingfromamishap.Damagecanrangefrommanypeople
killed as a result of an accident to minor injury or property damage.
Hazard severity
An assessment of the worst possible damage that could result from a particular
hazard. Hazard severity can range from catastrophic where many people are killed to
minor where only minor damage results.
Hazard
probability
The probability of the events occurring which create a hazard. Probability values tend
to be arbitrary but range fromprobable(say 1/100 chance of a hazard occurring) to
implausible (no conceivable situations are likely where the hazard could occur).
Risk
This is a measure of the probability that the system will cause an accident. The risk is
assessed by considering the hazard probability, the hazard severity and the
probability that a hazard will result in an accident.
WaystoachieveSafety
•Hazardavoidance
–Thesystemisdesignedsothathazardsimplycannotarise.
–Eg.Press2buttonsatthesametimeinacuttingmachinetostart
•Hazarddetectionandremoval
–Thesystemisdesignedsothathazardsaredetectedandremoved–Thesystemisdesignedsothathazardsaredetectedandremoved
beforetheyresultinanaccident.
–Eg.Openreliefvalve ondetectionoverpressureinchemicalplant.
•Damagelimitation
–The system includes protection features that minimise the damage
thatmayresultfromanaccident
–Automaticfiresafetysysteminaircraft.
31Prof. Loganathan R., CSE, HKBKCE
Security
•Securityisasystempropertythatreflectsthe
ability to protect itself from accidental or
deliberateexternalattack.
•Securityisbecomingincreasinglyimportantas•Securityisbecomingincreasinglyimportantas
systemsarenetworkedsothatexternalaccess
tothesystemthroughtheInternetispossible
•Security is an essential pre-requisite for
availability,reliabilityandsafety
•Example:Viruses,unauthoriseduseofservice/datamodification
32Prof. Loganathan R., CSE, HKBKCE
Securityterminology
Term Definition
Exposure
Possible loss or harm in a computing system. This can be loss or
damage to data or can be a loss of time and effort if recovery is
necessary after a security breach.
Vulnerability
A weakness in a computer-based system that may be exploited to
causelossorharm.
33Prof. Loganathan R., CSE, HKBKCE
causelossorharm.
Attack
An exploitation of a system vulnerability. Generally, this is from
outside the system and is a deliberate attempt to cause some
damage.
Threats
Circumstances that have potential to cause loss or harm. You can
think of these as a system vulnerability that is subjected to an
attack.
Control
A protective measure that reduces a system vulnerability.
Encryption would be an example of a control that reduced a
vulnerability of a weak access control system.
Damagefrominsecurity
•Denialofservice
–The system is forced into a state where normal services
becomeunavailable.
•Corruptionofprogramsordata
–The system components of the system may altered in an
unauthorisedway,whichaffectsystembehaviour&hence
itsreliabilityandsafety
•Disclosureofconfidentialinformation
–Information that is managed by the system may be
exposed to people who are not authorised to read or use
thatinformation
34Prof. Loganathan R., CSE, HKBKCE
Securityassurance
•Vulnerabilityavoidance
–Thesystemisdesignedsothatvulnerabilitiesdonotoccur.
For example, if there is no external network connection
thenexternalattackisimpossible
•Attackdetectionandelimination•Attackdetectionandelimination
–The system is designed so that attacks on vulnerabilities
are detected and remove them before they result in an
exposure. For example, virus checkers find and remove
virusesbeforetheyinfectasystem
•Exposurelimitation
–The consequences of a successful attack are minimised.
Forexample,abackuppolicyallowsdamagedinformation
toberestored
35Prof. Loganathan R., CSE, HKBKCE