Descriptive statistics: Mean, Mode, Median

abidasultana86 69 views 49 slides Oct 06, 2024
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

A brief description on mean, mode, median, variability, central tendency, range, quartiles, pie chart, bar diagrams


Slide Content

Descriptive Statistics

Arithmetic Mean Median Mode Approach of describing numerical data Variance Standard Deviation Coefficient of Variation Range Interquartile Range Central Tendency Variation

Central Tendency Numerical central value of a set observation is called measures of central tendency. It is a central or typical value for a probability distribution.  It may also be called a  center  or  location  of the distribution. Measures of central tendency: Mean Median Mode

Measures of Central Tendency Central Tendency Mean Median Mode Midpoint of ranked values Most frequently observed value Arithmetic average

Mean Mean is a single and typical value used to represent a set of data. It also referred as the average. Objective: To get a single value that represents the entire data To facilitate the comparison between groups of data of similar nature Classification of mean Arithmetic Mean (AM) Geometric Mean (GM) Harmonic Mean (HM)

Arithmetic Mean The arithmetic mean (mean) is the most common measure of central tendency For a population of N values: For a sample of size n: Sample size Observed values Population size Population values

Arithmetic Mean The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4

Properties of mean It takes all observations into account reflecting the value It is used in other statistical tools It is most reliable for drawing inferences It is the easiest to use in advanced statistics technique

Limitations of mean Highly affected by extreme values, even just one extreme value Sometimes negative and zero values can not be counted

Median In an ordered list, the median is the “middle” number (50% above, 50% below) It is not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3

Median It is the middle value of a set of numbers which have been ordered by magnitude The  median  is also the number that is halfway into the set. The location of the median: If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of the two middle numbers

Median For grouped frequency Where, L = lower limit of the median class N = total number of observations F = cumulative frequency of preceding median class f m = frequency of the median class C = class interval of the median class

SBP Range (mmHg) Frequency Cumulative Frequency 101 -105 2 2 106 -110 3 5 111-115 5 10 116-120 8 18 121-125 6 24 126-130 4 28 131-135 2 30 136-140 1 31 L = 121 N= 31 F= 18 f m =6 C=5 118.92 L = 116 N= 31 F= 10 f m =8 C=5 119.43

Properties of median Not affected by extreme value Perfect statistical example for skewed distribution Can be calculated from frequency distribution It is not influenced by the position of items Limitations of median It is not based on all observations Compared to mean it is less reliable Not suitable for further analysis

Mode The mode is the value of a data set that occurs most frequently. It is the commonly observed value which occurs maximum number times

Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode

Mode

SBP Range (mmHg) Frequency 101-105 2 106-110 3 111-115 5 116-120 8 121-125 6 126-130 4 131-135 2 136-140 1 l = 116 f 1 = 8 f = 5 f 2 = 6 h = 5 119

SBP Range (mmHg) Frequency 101-105 2 106-110 3 111-115 5 116-120 8 121-125 6 126-130 4 131-135 8 136-140 1 L = 131 f 1 = 8 f = 4 f 2 = 1 h = 5 132.82

Properties of mode Not affected by extreme value For large number of data, mode happens to be meaningful as an average Can be calculated from frequency distribution Do not affected by small and large numbers It is not based on all observations Compared to mean it is less reliable Not suitable for further advanced analysis Limitations of mode

Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. Which one is the “best” measurement?

Dispersions/variability Dispersions are the measures of extent of deviation of individual from the central value (average). It determines how much representative the central value is. It may be small if the values are closely bunched about their mean and it is large when the values are scattered widely about their mean.

To determine the reliability of an average For controlling the variability For comparing two or more series of data regarding their variability For facilitating the use of other statistical measures Objectives of Dispersions Measurement

It should be rigidly defined It should be easy to calculate and easy to understand It should be based on all observations It should be suitable for further mathematical treatment It should be affected as little as possible to the sampling fluctuation Characteristics of a good measure of Dispersions

Shape of a Distribution Describes how data are distributed Measures of shape: Symmetric or skewed Mean = Median Mean < Median Median < Mean Right-Skewed Left-Skewed Symmetric

Same center, different variation Measures of Variability Variation Variance Standard Deviation Coefficient of Variation Range Interquartile Range Measures of variation give information on the spread or variability of the data values

Range Simplest measure of variation Difference between the largest and the smallest observations: Range = X largest – X smallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:

Ignores the way in which data are distributed Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Characteristics of the Range 1 ,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4, 5 1 ,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4, 120 Range = 5 - 1 = 4 Range = 120 - 1 = 119

Quartiles Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25% The first quartile, Q 1 , is the value for which 25% of the observations are smaller and 75% are larger Q 2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile Q1 Q2 Q3

Quartile Formulas Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q 1 = 0.25(n+1) Second quartile position: Q 2 = 0.50(n+1) (median) Third quartile position: Q 3 = 0.75(n+1) where n is the number of observed values

(n = 9) Q 1 = is in the 0.25( 9+1) = 2.5 position of the ranked data so use the value half-way between the 2 nd and 3 rd values, Q 1 = 12.5 Quartiles Sample Ranked Data: 11 12 13 16 16 17 18 21 22 Example: Find the first, second and third quartiles second and third quartiles ??

Interquartile Range Example: Median (Q2) X maximum X minimum Q1 Q3 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27 Interquartile range = 3 rd quartile – 1 st quartile IQR = Q 3 – Q 1

Variance Variance  measures how far each number in the set is from the mean.  It is calculated by taking the differences between each number in the set and the mean squaring the differences dividing the sum of the squares by the number of values in the set.

Standard deviation It is a measure of how spread-out numbers are. Its symbol is σ. It is the square root of the deviations of individual items from their arithmetic mean.

8, 9, 11, 12 Ava. = 10 18, 19, 2, 1 Ava. = 10 Calculate standard deviation, consider a sample of IQ scores given by 96, 104, 126, 134 and 140.

Examples Calculate standard deviation, consider a sample of IQ scores given by 96, 104, 126, 134 and 140. The mean of this data is (96+104+126+134+140)/5 =120. σ = √[ ∑( x-120)^2 / 5 ] The deviation from the mean is given by 96-120 = -24, 104-120 = -16, 126-120 = 6, 134-120 = 14, 140-120 = 20. σ = √[ ((-24)^2+(-16)^2+(6)^2+(14)^2+(20)^2) / 5 ] σ = √[ (1464) / 5 ] = ± 17.11

Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = 0.926 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.570 Data C

Measuring variation Small standard deviation Large standard deviation

Standard Deviation Most commonly used measure of variation Each value in the data set is used in the calculation Shows variation from the mean Has the same units as the original data It cannot be negative. A standard deviation close to 0 indicates that the data points tend to be close to the mean. The further the data points are from the mean, the greater the standard deviation

Important Note Standard deviation of sample data of a population Variance of sample data of a population

Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of data measured in different units

Comparing Coefficient of Variation Stock A: Average price last year = $50 Standard deviation = $5 Stock B: Average price last year = $100 Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price

Measure of Locations of Data Percentiles Percentile is a measure of position in a set of observations. It is a number where a certain percentage of scores fall below that percentile. It is a measure used in  statistics  indicating the value below which a given percentage of observations in a group of observations falls. For example, the 29th  percentile  is the value of a variable such that 29% of the observations are less than the value and 71% of the observations are greater.

Suppose, you got 80 th percentile on GRE analytical score, that means 80% of GRE test taker have marks less than you and 20% of GRE test taker have more marks than you.

Standard error  ( SE ) is the standard deviation of the sampling distribution of a statistic. If the statistic is the sample mean, it is called the standard error of the mean (SEM) Standard Error of the sample mean S tandard error of the mean is a measure of the dispersion of sample means around population mean

Find out the standard error among the following data Exercise Drug Concentration ( μ g/ml) Absorption Mean ± SE 25 0.286 ?? 0.214 0.255 50 0.482 ?? 0.510 0.524 100 1.119 ?? 1.225 1.316

Percentile rank A percentile rank is the percentage of scores that fall at or below a given score.

10 marks 2,5,6,3, 6,8,10,1, 4,6,7,2 1,2,2,3, 4,5,6,6, 6,7, 8 ,10   10 marks 7,10,10,10 9,8,9,9, 10,9,10,7 7, 7, 8 ,9,9,9,9, 10,10,10,10,10  

Practical problems 1. Find the mean, median & mood of each of the following sets of blood pressure reading 145, 146, 148, 146, 145, 147, 144, 144, 138, 142, 140, 152, 160, 158, 148, 148. 2. Calculate the appropriate average for prolactin levels (ng/L) obtained during a clinical trial involving 10 subjects, 9.4, 7.0, 7.6, 6.7, 6.3, 8.6, 6.8, 10.6, 8.9, 9.4