Basic Statistical Concepts and Methods

119,108 views 122 slides Nov 28, 2009
Slide 1
Slide 1 of 122
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122

About This Presentation

Statistics is the science of dealing with numbers.
 It is used for collection, summarization, presentation and analysis of data.
Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).


Slide Content

Ahmed-Refat-ZU
Basic Statistical
Concepts and Methods
Ahmed-Refat AG RefatAhmed-Refat AG Refat
FOM-ZUFOM-ZU

Ahmed-Refat-ZU
Definition of Statistics
Statistics is the science of dealing with
numbers.
It is used for collection, summarization,
presentation and analysis of data.
Statistics provides a way of organizing data to
get information on a wider and more formal
(objective) basis than relying on personal
experience (subjective).

Ahmed-Refat-ZU
Uses of medical
statistics
Medical statistics are used in
1- Planning, monitoring and evaluating community
health care programs.
2- Epidemiological research studies.
3-  Diagnosis of community health problems.
4-  Comparison of health status and diseases in
different countries and in one country over years.
5-  To form standards for the different biological
measurements as weight, height.
6- To differentiate between diseased and normal
groups.

Ahmed-Refat-ZU
Types of data
Any aspect of an individual that is measured, is called
variable. Variables are either
1-Quantitative or 2-Qualitative.
1-    Quantitative data: it is numerical data.
Discrete data: are usually whole numbers, such as
number of cases of certain disease, number of
hospital beds (no decimal fraction).
Continuous data: it implies the measurement on a
continuous scale e.g. height, weight, age (a decimal
fraction can be present).

Ahmed-Refat-ZU
   1- Quantitative data
.
    Quantitative data: it is numerical data.
Tow Types
A- Discrete data: are usually whole numbers, such
as number of cases of certain disease, number of
hospital beds (no decimal fraction).
B- Continuous data: it implies the measurement on a
continuous scale e.g. height, weight, age
(a decimal fraction can be present).

Ahmed-Refat-ZU
2- Qualitative data
   Qualitative data: It is non numerical data and
is subdivided into Two Types:
  A- Categorical : data are purely descriptive
and imply no ordering of any kind such as
sex, area of residence.
  B- Ordinal data: are those which imply some
kind of ordering like
-         Level of education:
-         Socio-economic status:
-         Degree of severity of disease:

Ahmed-Refat-ZU
Presentation Of Data
The first step in statistical analysis is to present
data in an easy way to be understood.
The two basic ways for data presentation are:
Tabular presentation.
Graphical presentation

Ahmed-Refat-ZU
Tabulation
Some rules for the construction tables:
1- The table must be self-explanatory.
2- Title: written at the top of table to define
precisely the content, the place and the time.
3- Clear heading of the columns and rows
and units of measurements
4- The size of the table depends on the
number of classes. Usually lie between 2
and 10 rows or classes. Its selection depends on the
form of data and the requirement of the distribution. Too small may
obscure some information and too long will not differ from raw data.

Ahmed-Refat-ZU
Types of tables
For Qualitative data, draw a simple table eg., List
Table : count the number of observations
( frequencies) in each category.
For Quantitative data, we have to form a
frequency distribution Table
List tables (2 columns- one value for each measured variable)
Frequency Distribution Tables

Ahmed-Refat-ZU
Types of tables
:List:
A table consisting of two  columns, the first
giving an identification of the observational
unit and the second giving the value of
variable for that unit.
Example : number of patients in each hospital
department are
Medicine 100 patients
Surgery 80 “
ENT 28 “
Ophthalmology 30 “

Ahmed-Refat-ZU
Frequency Distribution
tables
FDTs are used for presentation of
qualitative ( and quantitative Discrete) data,
By recording the number of
observations in each category.
These counts are called frequencies.
…………………………………… .
No Classes ….. No Intervals

Ahmed-Refat-ZU
FDT for Quantitative Continuous Data
consists of a series of classes
(intervals) together with the number of
observations ( frequency) whose values
fall within the interval of each class.
Frequency Distribution
tables

Ahmed-Refat-ZU
Frequency Distribution
tables
EXAMPLE (1) Assume we have a group
of 20 individuals whose blood groups
were as followed : A , AB, AB, O, B, A,
A, B, B, AB, O, AB, AB, A, B, B, B, A,
O, A. We want to present these data
by table.
????? Type of data >>>>>>……

Ahmed-Refat-ZU
How to Construct a
Frequency Distribution
tables
Four Steps
Title, Table, No , %
1- Put a title
2-  Draw Columns & Rows
3- Enumerate the individuals in each
category
4- 4- Calculate The relative frequency (%)Calculate The relative frequency (%)

Ahmed-Refat-ZU
How to Construct a
Frequency Distribution
tables
Four Steps
1- Put a title eg.,
Distribution of the studied individuals according
to their blood group.
2-  Draw a table (Columns & Rows),
First column > Studied Variable“ Blood Group”,
2
nd
column heading >“Frequency-Number”
3
rd
column heading > “ Percentage %”

Ahmed-Refat-ZU
Frequency Distribution
tables
3- Enumerate the individuals in each
blood group , i.e. individuals with blood group A are 6
and those with blood group B are 6 , AB are 5 and blood group
O are 3.
Make sure that the total number of
individuals in all blood groups is 20 (the
number of the studied group).

Ahmed-Refat-ZU
Frequency Distribution
tables
4- Calculate The relative frequency 4- Calculate The relative frequency
(%)(%) of each blood group by dividing the
frequency of that group over the total
number of individuals and multiplied by
100
i.e. the percentage of group A = 6 / 20 x 100, and the same for
group AB = 5 / 20 x 100 and group O = 3 / 20 x 100. The final
table will be :

Ahmed-Refat-ZU
Frequency Distribution
tables What is Your
Conclusion?

Ahmed-Refat-ZU
Frequency Distribution
tables
We can conclude from this table that
blood groups A & B are the most
common groups and the rarest is group
O (depending on the percentage of each group).
So presenting data in table is beneficial
in deducing facts and simplify
information than raw data.

Ahmed-Refat-ZU
Frequency Distribution
tables
EXAMPLE (3) : The Following data are
Systolic Blood Pressure measurements
(mmHg) of 30 patients with hypertension.
Present these data in frequency table:
150, 155, 160, 154, 162, 170, 165, 155, 190, 186, 180, 178,
195, 200, 180,156, 173, 188, 173, 189, 190, 177, 186,
177, 174, 155, 164, 163, 172, 160.
??????? Type of Data

Ahmed-Refat-ZU
Frequency Distribution
tables
Four Steps
1- Put a title eg.,
Frequency distribution of blood pressure
measurements (mmHg) among a group of
hypertensive patients.
2-  Draw a table (Columns & Rows),
First column > Studied Variable“ Blood Pressure-
mm Hg”,
2
nd
column heading >“Frequency-Number”
3
rd
column heading > “ Percentage %”

Ahmed-Refat-ZU
Frequency Distribution
tables
3-In the first column we have to classify
blood pressure into categories or
classes because we have a large
sample (N=30)
and the measured variable is of
continuous type (not discrete as in the previous
examples).

Ahmed-Refat-ZU
Frequency Distribution
tables
construction of classes
Calculate the Range of observation:
subtract the lowest value of blood pressures from the highest
value (the highest was 200 and the lowest was 150) the
difference is 50.
Determine the number of classes
and the width class intervals Let
class interval be 10 , so we will have 50/10 =
5 classes.
Enumerate the Frequency By Tally Methods
Calculate the Exact Frequncy & Relative
frequency

Ahmed-Refat-ZU
Frequency Distribution
tables
construction of classes
Determine the the number of classes You
want to display ( not too few ~2 and too
frequent >8. it is a matter of trial and sense !!!
Let class interval= 10 mmHg , we will have 5 classes.
If we choose 5 mmHg as a class interval-width we
will obtain 10 classes (too long table).
We must maintain constant width for all intervals.
Choose the upper and lower limits of the
class start with the lowest value i.e 150
List the intervals in order every 10

Ahmed-Refat-ZU

Ahmed-Refat-ZU
2-Graphical
Presentation
The diagram should be:
  Simple
Easy to understand
Save a lot of words
 Self explanatory
 Has a clear title indicating its content
 Fully labeled
The y axis (vertical) is usually used for frequency

Ahmed-Refat-ZU
2-Graphical
Presentation
Graphic presentations used to illustrate
and clarify information. Tables are
essential in presentation of scientific
data and diagrams are complementary
to summarize these tables in an easy,
attractive and simple way.

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart
It is used for presenting discrete or
qualitative data.
It represent the measured value (or %) by
separated rectangles of constant width and
its lengths proportional to the frequency
Type:

>>>Simple ,
 >>> Multiple,
 >>>Components

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart- Simple
Mean maternal age of three studied groups
24
24.5
25
25.5
26
26.5
27
group I group II group III
The studied groups
Mean age in years

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart
Multiple bar chart: Each observation
has more than one value represented,
by a group of bars. Percentage of
males and females in different
countries, percentage of deaths from
heart diseases in old and young age,
mode of delivery (cesarean or vaginal)
in different female age groups.

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart-Multiple
Multiple bar chart:
Cancer Anemia
Males
Females

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart
Component bar chart : subdivision
of a single bar to indicate the
composition of the total divided into
sections according to their relative
proportion.

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart
Component bar chart :
For example two countries are compared in
their socio-economic standard of living, each
bar represent one country, the height of the
bar is 100, it is divided horizontally into 3
components (low, moderate and high
classes) of socio-economic classes (SE),
each class is represented by different color or
shape.

Ahmed-Refat-ZU
Graphical Presentation
1-Bar chart-
Component
0%
20%
40%
60%
80%
100%
percentage of population
Egypt USA
Comparison between Egypt and USA in socio-economic standard of
living
high
moderate
low

Ahmed-Refat-ZU
Graphical Presentation
2-Pie diagram:
Consist of a circle whose area represents
the total frequency (100%) which is
divided into segments.
Each segment represents a proportional
composition of the total frequency.

Ahmed-Refat-ZU
Graphical Presentation
2-Pie diagram:
Percentage of causes of child death in Egypt
diarrhea
50%
chest infection
30%
congenital
10%
accident
10%

Ahmed-Refat-ZU
Graphical Presentation
3- Histogram:
It is very similar to the bar chart with the
difference that the rectangles or bars are
adherent (without gaps).
It is used for presenting class frequency
table (continuous data).
Each bar represents a class and its height
represents the frequency (number of cases),
its width represent the class interval.

Ahmed-Refat-ZU
Graphical Presentation
3- Histogram:
Distribution of studied group according to their height
0
5
10
15
20
25
30
100- 110- 120- 130- 140- 150-
height in cm
number of individuals

Ahmed-Refat-ZU
Graphical Presentation
4 -Frequency Polygon
Derived from a histogram by connecting the
mid points of the tops of the rectangles in
the histogram.
The line connecting the centers of histogram
rectangles is called frequency polygon.
We can draw polygon without rectangles so
we will get simpler form of line graph.
A special type of frequency polygon is the
Normal Distribution Curve.

Ahmed-Refat-ZU
Graphical Presentation
5 - Scatter diagram
- It is useful to represent the
relationship between two
numeric measurements , each
observation being represented by a
point corresponding to its value on each
axis

Ahmed-Refat-ZU
This scatter diagram showed a positive or direct
relationship between NAG and
albumin/creatinine among diabetic patients
Correlation between NAG and albumin creatinine
ratio in group of early diabetics
0
5
10
15
20
25
30
35
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
albumin creatinine ratio
NAG

Ahmed-Refat-ZU
In negative correlation, the points will be
scattered in downward direction,
meaning that the relation between the
two studied measurements is
controversial i.e. if one measure
increases the other decreases. As
shown in the following graph
Correlation between Doppler velocimetry (RI) and
baby birth weight
0
0.2
0.4
0.6
0.8
1
1.5 2 2.5 3 3.5 4 4.5
baby weight in kg
RI

Ahmed-Refat-ZU
Graphical Presentation
6- Line graph:
it is diagram showing the relationship between two
numeric variables (as the scatter) but the points are
joined together to form a line (either broken line or
smooth curve)
Changes in body temperature of a patient after use of antibiotic
36
36.5
37
37.5
38
38.5
39
39.5
1 2 2 4 5 6 7
time in hours
temperature

Ahmed-Refat-ZU
Normal Distribution
Curve

Ahmed-Refat-ZU
Normal Distribution
curve
NDC is a
Graphical Presentation
<Frequency Polygon>
of any Quantitative Biologic Variables
The Normal Distribution Curve is the frequency polygon of a
quantitative variable measured in large number.
It is a form of presentation of frequency distribution of biologic
variables such as weights, heights, hemoglobin level and blood
pressure or any continuous data.
It occupies a major role in the techniques of
statistical analysis.

Ahmed-Refat-ZU

Ahmed-Refat-ZU
Characteristics of Normal
Distribution curve
1-  It is bell shaped, continuous curve.
2- It is symmetrical i.e. can be divided into two
equal halves vertically.
3- The tails never touch the base line but
extended to infinity in either direction.
4- The mean, median and mode values coincide
5- It is described by two parameters: arithmetic
mean determine the location of the center of the
curve and standard deviation represents the
scatter around the mean.

Ahmed-Refat-ZU
Areas under the normal
curve
X ± 1 SD = 68% of the area on each
side of the mean.
X ± 2 SD = 95% of area on each side of
the mean.
X ± 3 SD = 99% of area on each side of
the mean.

Ahmed-Refat-ZU
Skewed data
If we represent a collected data by a
frequency polygon graph and the
resulted curve does not simulate the
normal distribution curve (with all its characteristics)
then these data are not normally
distributed

Ahmed-Refat-ZU
Causes of Skewed Curve
Not Normally Distributed Data
The curve may be skewed to the right or to the left side
This is because The data collected are from:
certain heterogeneous group
or from diseased or abnormal population
therefore the results obtained from these data can not be applied
or generalized on the whole population.

Ahmed-Refat-ZU
NDC can be used in distinguishing between
normal from abnormal measurements.
Example:
If we have NDC for hemoglobin levels for a
population of normal adult males with
mean ± SD = 11 ±1.5
If we obtain a hemoglobin reading for an individual
= 8.1 and we want to know if he/she is normal
or anemic.
If this reading lies within the area under the curve
at 95% of normal (i.e. mean ± 2 SD)he
/she will be considered normal. If his reading is
less then he is anemic.

Ahmed-Refat-ZU
The normal range for hemoglobin in
this example will be:
the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14.
The lower hemoglobin level 11 – 2 ( 1.5 ) = 8.
i.e the normal range of hemoglobin of
adult males is from 8 to 14.
our sample (8.1 ) our sample (8.1 ) lies within the 95%
of his population.
therefore this individual is normalis normal
because his reading lies within the
95% of his population.

Ahmed-Refat-ZU
Data Summarization
To summarize data, we need to use
one or two parameters that can
describe the data.
Measures of Central tendency
which describes the center of the data
and the Measures of Dispersion,
which show how the data are scattered
around its center.

Ahmed-Refat-ZU
Measures of central tendency
Variable usually has a point (center) around
which the observed values lie. These
averages are also called measures of
central tendency. The three most commonly
used averages are:
•The arithmetic mean:
•The Median
•The Mode

Ahmed-Refat-ZU
1- The arithmetic
mean:
the sum of observation divided by the number
of observations:
x = ∑ x
n
Where : x = mean
∑ denotes the (sum of)
x the values of observation
n the number of observation

Ahmed-Refat-ZU
Example: In a study the age of 5
students were: 12 , 15, 10, 17, 13
Mean = sum of observations / number
of observations
Then the mean X = (12 + 15 + 10 + 17
+ 13) / 5 =13.4 years
1- The arithmetic
mean:

Ahmed-Refat-ZU
Calculation of Mean
For frequency Distribution Data
In case of frequency distribution data we
calculate the mean by this equation:
x = ∑ fx
n
where f = frequency
for example : we want to calculate the
mean incubation period of this group.

Ahmed-Refat-ZU
Calculation of Mean
For frequency Distribution Data

Ahmed-Refat-ZU
If data is presented in frequency table
with class intervals we calculate mean
by the same equation summation of f
x1 /n , x1 denotes the midpoint of class
interval.
Example : calculate the mean of blood
pressure of the following group :
Calculation of Mean
For frequency Distribution Data
with class intervals

Ahmed-Refat-ZU

Ahmed-Refat-ZU

Ahmed-Refat-ZU
2- Median
It is the middle observation in a series
of observation after arranging them in
an ascending or descending manner.
The rank of median for is (n + 1)/2 if
the number of observation is odd
and n/2 if the number is even

Ahmed-Refat-ZU
   Calculate the median of the following
data 5, 6, 8, 9, 11 n = 5~ Odd!!
-The rank of the median = n + 1 / 2
i.e. (5+ 1)/ 2 = 3

The median is the third value in these groups
when data are arranged in ascending (or
descending) manner.
-         So the median is 8 (the third value)
2- Median

Ahmed-Refat-ZU
-  If the number of observation is even, the
median will be calculated as follows:
e.g. 5, 6, 8, 9 n = 4
- The rank of median = n / 2 i.e. 4 / 2 =
2 .The median is the second value of that
group. If data are arranged ascendingly then
the median will be 6 and if arranged
descendingly the median will be 8 therefore
the median will be the mean of both
observations i.e. (6 + 8)/2 =7.
2- Median

Ahmed-Refat-ZU
 For simplicity we can apply the same
equation used for odd numbers i.e.
n + 1 / 2. The median rank will be 4 +
1 /2 = 2 ½ i.e. the median will be the
second and the third values i.e. 6 and
8, take their mean = 7.
2- Median

Ahmed-Refat-ZU
The most frequent occurring value in the data
is the mode and is calculated as follows:
Example: 5, 6, 7, 5, 10. The mode in this
data is 5 since number 5 is repeated twice.
Sometimes, there is more than one mode
and sometimes there is no mode especially
in small set of observations.
3- Mode

Ahmed-Refat-ZU
Example : 20 , 18 , 14, 20, 13, 14, 30,
19. There are two modes 14 and 20.
Example : 300, 280 , 130, 125 , 240 ,
270 . Has no mode.
Unimodal Bimodal Nomodal
3- Mode

Ahmed-Refat-ZU
Advantages and
disadvantages of the
measures of central
Tendency:
- Mean: is the preferred CTM since it
takes into account each individual
observation but its main disadvantage
is that it is affected by the extreme valus
of observations.

Ahmed-Refat-ZU
Median: it is a useful descriptive
measure if there are one or two
extremely high or low values.
-Mode: is seldom used.
Advantages and
disadvantages of the
measures of central
Tendency:

Ahmed-Refat-ZU
Measures of
Dispersion
The measure of dispersion describes the
degree of variations or scatter or dispersion
of the data around its central values:
(dispersion = variation = spread = scatter).
Range - R
Variance -V
Standard Deviation - SD
Coefficient of Variation -COV

Ahmed-Refat-ZU
  1- Range:
is the difference between the largest and
smallest values.
is the simplest measure of variation.
disadvantages, it is based only on two of
the observations and gives no idea of how the
other observations are arranged between
these two.
Also, it tends to be large when the size of
the sample increases

Ahmed-Refat-ZU
If we want to get the average of
differences between the mean and
each observation in the data,
we have to reduce each value from the
mean
and then sum these differences and
divide it by the number of observation.
V = ∑ (mean – xi) / n
  2- Variance

Ahmed-Refat-ZU
Variance V = ∑ (mean – x) / n
The value of this equation will be equal
to zero
because the differences between each value and the
mean will have negative and positive signs that will
equalize zero on algebraic summation.
  2- Variance

Ahmed-Refat-ZU
To overcome this zero we square the
difference between the mean and each value
so the sign will be always positive
. Thus we get:
V = ∑ (mean – x)
2
/ n - 1
  2- Variance

Ahmed-Refat-ZU
3- Standard Deviation
SD
The main disadvantage of the
variance is that it is the square of the
units used. So, it is more convenient to
express the variation in the original
units by taking the square root of the
variance. This is called the standard
deviation (SD). Therefore SD = √ V
i.e. SD = √ ∑ (mean – x)
2
/ n - 1

Ahmed-Refat-ZU
The coefficient of variation expresses the
standard deviation as a percentage of the
sample mean.
C. V = SD / mean * 100
C.V is useful when, we are interested in the
relative size of the variability in the data.
Example : if we have observations 5, 7, 10,
12 and 16. Their mean will be 50/5=10. SD =
√ (25+9 +0 + 4 + 36 ) / (5-1) = √ 74 / 4 = 4.3
C.V. = 4.3 / 10 x 100 = 43%
4- Coefficient of
variation CoV

Ahmed-Refat-ZU
Example
Calculate the mean, variance, SD and CV
From the following measurements
5, 7, 10, 12 and 16.
Mean= 5+7+10+12+16/5=10.
SD = √ (25+9 +0 + 4 + 36 ) / (5-1) =
√ 74 / 4 = 4.3
C.V. = 4.3 / 10 x 100 = 43%

Ahmed-Refat-ZU
Another observations are 2, 2, 5, 10, and 11.
Their mean = 30 / 5 = 6
SD = √ (16 + 16 + 1 + 16 + 25)/(5 –1) =
√ 74 / 4 = 4.3
C.V = 4.3 /6 x 100 = 71.6 %
Both observations have the same SD but they
are different in C.V. because data in the first
group is homogenous (so C.V. is not high),
while data in the second observations is
heterogenous (so C.V. is high).

Example

Ahmed-Refat-ZU
Example: In a study where age was
recorded the following were the
observed values: 6, 8, 9, 7, 6. and the
number of observations were 5.
Calculate the mean, SD and range,
mode and median.
-         The mean = sum of observation /
their number
Example

Ahmed-Refat-ZU
The variance = Sum of the squared
differences (mean minus observation) /
number of observations. (7.2 – 6)
2
+
(7.2 – 8)
2
+ (7.2 – 9)
2
+ (7.2 – 7)
2
+ (7.2
– 6)
2
/ 5 – 1. which is equal to (1.2)
2
+ (-
0.8)
2
+ (- 1.8)
2
+(0.2)
2
+ (1.2)
2
/ 4 = 1.7
- So the variance = 1.7
Examples

Ahmed-Refat-ZU
- The S.D. = √ 1.7 = 1.3
-         Range = 9 – 6 = 3
-         The mode is 6
-         The median is : first we have to arrange
data ascendingly i.e. 6 – 6 – 7 – 8 – 9.
The rank of median = n + 1 / 2 i.e. 5 + 1 / 2 =
3 therefore the median is the third value i.e.
median = 7
Examples

Ahmed-Refat-ZU
Inferential statistics
Inference involves making a
Generalization about a larger group
of individuals on the basis of a subset
or sample.

Ahmed-Refat-ZU
Inferential statistics
Hypothesis Testing
In hypothesis testing we want to find out
whether the observed variation among
sampling is explained by chance alone
???? (i.e., the chance of random sampling
variations ), or due to a real difference
???? between groups.

Ahmed-Refat-ZU
Hypothesis Testing
It involves conducting a test of statistical
significance quantifying the chance of
random sampling variations that may
account for observed results.
In hypotheses testing, we are asking whether
the sample mean for example is consistent
with a certain hypothesis value for the
population mean.

Ahmed-Refat-ZU
Hypothesis Testing
The method of assessing the
hypotheses testing is known as
significance testsignificance test.
The significance testingThe significance testing
is a method for assessing whether a
result is likely to be due to chance or
due to a real effect.

Ahmed-Refat-ZU
Hypothesis Testing –
Steps
>>> Formulate Hypothesis
>>> Collect the Data
>>>> Test Your Hypothesis
>>> Accept of Reject Your Hypothesis

Ahmed-Refat-ZU
Null and alternative
hypotheses
In hypotheses testing, a specific hypothesis
( Null and alternative Hypothesis ) are
formulated and tested.
The null hypotheses H0 means : X1=X 2
Or X1-X 2=0
this means that there is no difference
between x1 and x2
The alternative hypotheses H1 means
X1>X2 or X1< X2

Ahmed-Refat-ZU
Null and alternative
hypotheses
The alternative hypotheses H1 means
X1>X2 or X1< X2
this means that there is no difference
between x1 and x2.
If we reject the null hypothesis, i.e there is a
difference between the two readings, it is
either H1 : x1 < x2 or H2 : x1> x2
in other words the null hypothesis is rejected
because x1 is different from x2.

Ahmed-Refat-ZU
General principles of
significance tests
•set up a null hypothesis and its
alternative.
•find the value of the test statistic.
•refer the value of the test statistic to a
known distribution which it would
follow if the null hypothesis was true.

Ahmed-Refat-ZU
General principles of
significance tests
4-conclude that the data are consistent
or inconsistent with the null hypothesis.
If the data are not consistent with the
null hypotheses, the difference is said
to be statistically significant. If the data
are consistent with the null hypotheses
it is said that we accept it i.e.
statistically insignificant.

Ahmed-Refat-ZU
General principles of
significance tests
P<0.05
In medicine, we usually consider that
differences are significant if the
probability is less than 0.05. This means
that if the null hypothesis is true, we
shall make a wrong decision less than 5
in a hundred times

Ahmed-Refat-ZU
Tests of significance
The selection of test of significance depends
essentially on the type of data that we have.
1-Quantitative Data ( Means & SD): tt
test ,test ,paired tpaired t test and , test and ,ANOVAANOVA
2-Qualitative Data>>> ChiChi, and , and Z testZ test
.

Ahmed-Refat-ZU
Tests of significance
Comparison of means:
1-comparing two means of large samples
using the normal distribution:
(z test or SND standard normal deviate)
If we have a large sample size i.e. 60 or more
and it follows a normal distribution then we
have to use the z-test.
z = (population mean — sample mean)
/ SD. If the result of z >2 then there is
significant difference.

Ahmed-Refat-ZU
Tests of significance
Since the normal range for any
biological reading lies between the
mean value of the population reading ±
2 SD. (this range includes 95% of the
area under the normal distribution
curve).

Ahmed-Refat-ZU
Student’s t-test
2-Comparing two means of small
samples using t-test:
If we have a small sample size (less
than 60), we can use the t distribution
instead of the normal distribution.
T = mean1 — mean2 / Ö(SD
1

2
/ n1) +
(SD
2
2
/ n2)

Ahmed-Refat-ZU
The value of t will be compared to values in
the specific table of "t distribution test" at the
value of the degree of freedom. If the value of
t is less than that in the table , then the
difference between samples is insignificant.
If the t value is larger than that in the table so
the difference is significant i.e. the null
hypothesis is rejected.
t-test

Ahmed-Refat-ZU
2-Comparing two means of small
samples using t-test:
If we have a small sample size (less
than 60), we can use the t distribution
instead of the normal distribution.
T = mean1 — mean2 / Ö(SD
1

2
/ n1) +
(SD
2
2
/ n2)
t-test

Ahmed-Refat-ZU
3-paired t-test:
If we are comparing repeated
observation in the same individual or
difference between paired data, we
have to use paired t-test where the
analysis is carried out using the mean
and standard deviation of the difference
between each pair.
Paired t-test

Ahmed-Refat-ZU
4-comparing several means:
Sometimes we need to compare more
than two means, this can be done by
the use of several t-test which is not
only tedious but can lead to spurious
significant results. Therefore we have to
use what we call analysis of variance or
ANOVA.
ANOVA

Ahmed-Refat-ZU
4-comparing several means:
There are two main types: one-way analysis
of variance and two-way analysis of variance.
One-way analysis of variance is appropriate
when the subgroups to be compared are
defined by just one factor, for example
comparison between means of different
socio-economic classes. The two-way
analysis of variables is used when the
subdivision is based upon more than one
factor
ANOVA

Ahmed-Refat-ZU
The main idea in the analysis of variance is
that we have to take into account the
variability within the groups and between the
groups and value of F is equal to the ratio
between the means sum square of between
the groups and within the groups.
F = between-groups MS / within-groups MS
ANOVA

Ahmed-Refat-ZU
b-Qualitative variables:
1)Chi -squared test:
Qualitative data are arranged in table
formed by rows and columns. One
variable define the rows and the
categories of the other variable define
the column.
Chi-Squared Test

Ahmed-Refat-ZU
A chi-squared test is used to test whether
there is an association between the row
variable and the column variable or, in other
words whether the distribution of individuals
among the categories of one variable is
independent of their distribution among the
categories of the other.
X
2
=å(O-E)
2
/ E
Chi-Squared Test

Ahmed-Refat-ZU
1)Chi -squared test:
degree of freedom = (row - 1) (column -
1)
O = observed value in the table
E = expected value calculated as follows:
E= Rt x Ct / GT
total of row x total of column / grand total
Chi-Squared Test

Ahmed-Refat-ZU

Ahmed-Refat-ZU
From tables of X2 significance at
degree of freedom (row 3-1)x(column
3-1) = 2x 2=4. The level of significance
at 0.05 level, d.f.=4 is 9.48. therefore
we conclude that there is significant
relation between socioeconomic level
and the degree of intelligence
(because the value of X2 > that of the
table).
Chi-Squared Test

Ahmed-Refat-ZU
2) Z test for comparing two percentages:
z = p1 – p2 /√p1q1/n1 + p2q2/n2.
where p1=percentage in the 1
st
group. P2 =
percentage in the 2
nd
group, q1=100-p1,
q2=100-p2, n1= sample size of group 1,
n2=sample size of group2.Z test is
significant(at 0.05 level)if the result>2.
Z Test

Ahmed-Refat-ZU
Example: if the number of anemic patients in
group 1 which includes 50 patients is 5 and
the number of anemic patients in group 2
which contains 60 patients is 20. To find if
groups 1 & 2 are statistically different in
prevalence of anemia we calculate z test.
P1=5/50=10% p2=20/60=33%
q1=100-10=90 q2=100-33=67
Chi-Squared Test

Ahmed-Refat-ZU
Z=10 – 33/ √ 10x90/50 + 33x67/60
Z= 23 / √ 18 + 36.85 z= 23/ 7.4
z= 3.1
Therefore there is statistical significant
difference between percentages of
anemia in the studied groups (because
z >2).
Chi-Squared Test

Ahmed-Refat-ZU
c-Correlation and regression:
Correlation measures the closeness of
the association between two continuous
variables, while linear regression gives
the equation of the straight line that best
describes and enables the prediction of
one variable from the other.
Correlation &
regression

Ahmed-Refat-ZU
1-Correlation:
In the correlation, the closeness of the
association is measured by the correlation
coefficient, r. The values of r ranges between
+ 1 and —1.
One means perfect correlation while 0 means
no correlation. If r value is near the zero, it
means weak correlation while near the one it
means strong correlation. The sign — and +
denotes the direction of correlation,
Correlation &
regression

Ahmed-Refat-ZU
1-Correlation:
the +ve correlation means that if one
variable increases the other one
increases similarly while for the –ve
correlation means that when one
variable increases the other one
decreases
Correlation

Ahmed-Refat-ZU
2- Linear regression:
Similar to correlation, linear regression
is used to determine the relation and
prediction of the change in a variable
due to changes in other variable. For
linear regression, the independent
factor has to be specified from the
dependent variable.
Linear regression

Ahmed-Refat-ZU
2- Linear regression:
The linear regression, not only allow
assessment of the presence of association
between the independent and dependent
variable but also allows the prediction of
dependent variable for a particular
independent variable. However, regression
for prediction should not be used outside the
range of original data. a t-test is also used for
the assessment of the level of significance.
The dependent variable in linear regression
must be a continuous one.
Linear regression

Ahmed-Refat-ZU
Correlation between Doppler velocimetry (RI) and
baby birth weight
0
0.2
0.4
0.6
0.8
1
1.5 2 2.5 3 3.5 4 4.5
baby weight in kg
RI

Ahmed-Refat-ZU
3-Multiple regression:
Situations frequently occur in which we
are interested in the dependency of a
dependent variable on several
independent variables, not just one.
Test of significance used is the analysis
of variance.(F test).
Multiple regression

Ahmed-Refat-ZU
How do you select a representative
sample of 100 students from a
primary school – Use all possible
methods of sample selection
How to select a primary school
from a rural area and another
school from an urban area in
Egypt?

Ahmed-Refat-ZU
What Type of Sample is?
Lottery to select a winner
Hospitalized Patients with SLE
Every 6
th
patient coming to an
outpatient clinic
Random 20 females and 20 males
out of group of 100 person
All workers in a factory chosen from
all factories in certain governorate

Ahmed-Refat-ZU
Present the following data
by a suitable table & graph
Infant mortality rates in 2006 in some
countries were as follows : Egypt
=25/1000 , USA=10/1000 , Sweden
12/1000 and Pakistan= 30/1000

Ahmed-Refat-ZU
Present the following data
by a suitable table & graph
A the body weight (Kg ) of a group of
male children were as follow:
12-22-18-17-28-20-16-21-19-16-27-21 Kg
and for a group of female children were
as follows:
16-23-19-29-18-22-17-15-21-21-24 Kg

Ahmed-Refat-ZU
The weight (Kg ) of a pregnant

Ahmed-Refat-ZU