Stat Chapter 3.pptx, proved detail statistical issues

TesfishaAltaseb 32 views 41 slides Mar 08, 2025
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

this chapter provides detailed statistical tools for students


Slide Content

Chapter three Numerical Descriptive Measures Objectives Describe data using measures of central tendency, such as the mean, median, mode, and midrange. Summarize data using measures of variation, such as the range, variance, and standard deviation. Determine the position of a data value in a data set using various measures of position, such as percentiles, deciles and quartiles.

A. Measure of central tendency A measure of central tendency is very important tool that refer to the center of a histogram or a frequency distribution curve. Such measures are the mean , the median , and the mode for the two cases (grouped and ungrouped data sets).

The mean The most commonly used measure of central tendency is called mean (or the average ). Also known as arithmetic average: it is t he most common measure. Calculated by adding all the values in the group & then dividing by the number of values. Helps to summarizing the essential features and enables comparison.

Cont… Mean is the sum of the values divided by the number of values. The mean of a set of numbers x 1 , x 2 ... x n is typically denoted by " ". This mean is a type of arithmetic mean. It is the " standard " average, often simply called the "mean". The mean for an ungrouped data is obtained by dividing the sum of all values by the number of values in that data set.  

Cont… The Mean for Ungrouped Data calculated as Mean for population data : Mean for sample data: x̄ = Example; Find the mean score of 10 students in a midterm exam in a class if their scores are  

Cont… 25 27 30 23 16 27 29 14 20 28 = Example2. According to example 1, if we take a sample of 4 students from the class and find their scores to be: 23, 27, 16, and 29 . Find the mean of this scores.   x̄ = x̄= =95/4=23.75  

ii . W e i g h t e d M e an  If 𝑥 1 , 𝑥 2 , … , 𝑥 𝑛 a n d 𝑤 1 , 𝑤 2 , … , 𝑤 𝑛 a r e r e p r e s e n t v a l u e s o f t h e i t e ms t h e corresponding weights, then the weighted mean, ( 𝑥ҧ 𝑤 ) is given by Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are A, B, D and C respectively. If the respective credits received for these courses are 4, 4, 3 and 2, determine the approximate average mark the student has got for the course. Solution: = 𝟏𝟔 + 𝟏 𝟐 + 𝟑 + 𝟒 𝟑𝟓 𝟏𝟑 𝟏𝟑 = = 2.6 9 . T h a t i s, A v e r a g e mar k o f t h e s t ud e n t i s 2.69.  w i   w i x i w 1  w 2    w n  w 1 x 1  w 2 x 2    w n x n x w  w i   w i x i w 1  w 2    w n  w 1 x 1  w 2 x 2  w n x n x w 𝑥 𝑖 4 3 1 2 𝑤 𝑖 4 4 3 2 𝑥 𝑖 𝑤 𝑖 16 12 3 4

iii . C o m b i n ed mean When a set of observations is divided into k groups and x̄ 1 𝑛 1 is the mean of n 1 & group 1, x̄ 2 𝑛 2 is the mean of n 2 & group2, …, x̄ k 𝑛 k is the mean of n k & group k, then the combined mean, denoted by x ̄c , of all observations taken together is given by = x̄ 1 𝑛 1 + x̄ 2 𝑛 2 + ⋯ + x̄ 𝑘 𝑛 𝑘 𝑛 1 + 𝑛 2 + ⋯ + 𝑛 𝑘 Example: There are two classes, Class A and Class B. Class A has 30 students with an average score of 70 on a test. Class B has 20 students with an average score of 80. What is the combined average score for both classes? Solution: = 74. T he c o m b i n e d me an of th e e n ti r e s t u d e n ts w ill be 74. X̄c X̄c = =3700/50  

Note: If a constant c is added to or subtracted from every value in the data set, the mean increases or decreases by that constant: New Mean=Old Mean + c, for added; New Mean=Old Mean - c, for subtracted If each value in the data set is multiplied by a constant k, the mean is also multiplied by k: New Mean=k × Old Mean. Question 1: If the mean of a data set is 50, what will the new mean be if a constant value of 5 is added to every value in the data set ? Given mean = 50 and constant = 5; New mean = 50 + 5 = 55.

The mid range The midrange ( MR ) is defined as the sum of the lowest and highest values in the data set divided by 2. MR = Lowest value + Highest value 2 Example: Find the midrange ( MR ) for the following data: 11, 13, 20, 30, 9, 4, 15 Solution: The lowest value is 4, and the highest value is 30, then MR = 4 + 30 = 34 = 17 2 2 Note that, this measure ( MR ) is weak as a measure of central ten- dency since it is depends only on two values among of all values in the data set.

M ean f or Gr oup ed data If d at a a r e gi v e n in t he f orm of c o n tinu o us f r e qu e ncy di s tri b ution , t h e s ample me an c an b e c omp u t e d as x̄ = σ 𝑖=1 𝑘 𝑓 𝑖 𝑥 𝑖 𝑓 𝑖 𝑥 1 𝑓 1 + 𝑥 2 𝑓 2 + … + 𝑥 𝑘 𝑓 𝑘 σ 𝑖=1 𝑘 = 𝑓 1 + 𝑓 2 + … + 𝑓 𝑘 , 𝑥 𝑖 𝑓 𝑖 - is the p r oduct of mi d - p o i n t & f r e q . S o l utio n : T he f or m ula t o b e u s e d f or t h e me an is as f oll o w s : x̄ = σ 𝑖=1 fi 𝑥 𝑓 𝑖 𝑖 σ 𝑖= 1 𝑘 x̄ = σ 𝑖=1 𝑘 fi 𝑥 𝑓 𝑖 100 σ 𝑖=1 𝑘 𝑖 = x̄ = 6558 = 65 .5 8 . C l ass bound a r y 60 - 62 62 - 64 6 4 - 66 6 6 - 68 68 - 70 7 - 72 T o t al F r e qu e nc y ( f i ) 5 18 42 20 8 7 100 xi 61 63 65 67 69 71 xifi 305 1134 2730 1340 552 497

Median Is the value of the middle item of series when it is arranged in ascending or descending order. It divides the series into two half. It is positional average. It is the middle value of the distribution when all items are arranged in either ascending or descending order in terms of value . Where n is odd 12 12/2/2024 By: Menberu T.

Cont… Example: Find the median for the data set: 312, 257, 421, 289, 526, 374, 497 Solution: First, the data set after we have ranked in increasing order is: x1 x2 x3 x4 x5 x6 x7 257 289 312 374 421 497 526 Median=374 Since there are 7 values in this data set, so the fourth term a 7+ 1 = 4k in the ranked data is the median. Therefore the median is median = ( ) th item= = 4 th item = 374  

Cont… Median of Even Numbers Step 1: Arrange the data either in ascending or in descending order. Step 2: If the number of observations (say n) are even, then identify (n/2) th and [(n/2) + 1] th observations. Step 3: The average of the above two observations (which are identified in step 2) is the median of the given data.

Cont… Example: Find the median for the data set: 8, 12, 7, 17, 14, 45, 10, 13, 17, 13, 9, 11 Solution: First, we rank the data in increasing order : Since there are 12 values in this data set, the median is given by the average of the two middle values whose ranks are   x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 7 8 9 10 11 12 13 13 14 17 17 45

Median for grouped data For grouped data, the median is obtained by the following formula. Median=L+( )h Where L= lower limit of the median class n= number of observation f=frequency of the median class cf =cumulative frequency of the class preceding the median class h=class width  

E x amp l e: W at e r p e r c e nt a g e in th e b o dy of spec ie s of F i s h is gi v e n b e l o w . C alcu l a t e the m e dian . S o l utio n : Co n s tr u ct t h e le s s t h an c u m u l a ti v e f r e qu e ncy d i s tri b ution , t h e n: S i n c e n = 50, 50/2 = 25 l=35 f=16 h=9 Cf =24 Median=L+( ) h = =35 +( ) 9 = 35.56   ~ x C l ass i n t e r v al 15 - 24 25 - 34 35 - 44 4 5 - 54 55 - 64 T o t al F r e qu e nc y 7 17 16 6 4 50 C l ass I nt e r v al 15 - 24 25 - 34 35 - 44 4 5 - 54 55 - 64 T o t al F r e q u e n cy 7 17 16 6 4 50 C u mu l a ti v e F r eq . 7 24 40 46 50

The mode The mode is another measure of central tendency and it is known as the most common value in a data set. Data set with none mode: In such data set each value occurring only once . Data set with one mode: In such data set only one value occurring with the highest frequency. The data set in this case is called unimodal . Data set with two modes: In such data set two values that occur with the same (highest) frequency. The distribution, in this case, is said to be bimodal . Data set with more than two modes: In such data set more than two values occurs with the same (highest) frequency, then the data set contains more than two modes and it is said to be multimodal .

Cont… Example: Find the mode for the given data set: 22, 19, 21, 19, 27, 21, 29, 22, 19, 25, 21, 22, 25 Solution: Since each of the three values, 19 (occur three times), 21 (occur three times), and 22 (occur three times) occurs with a highest frequency in their neighborhoods, therefore, each of these is a mode, that is the modes for this data set are: 19, 21, and 22.

Mode for grouped data The formula for calculating the mode of grouped data is: In this formula, the variables are: L : The lower limit of the modal class h : The size of the class interval f1 : The frequency of the modal class f0 : The frequency of the class preceding the modal class f2 : The frequency of the class succeeding the modal class  

Example : The following table shows the distribution of scores obtained by students in an exam: W h a t i s t h e mo d e o f t h e ex am s c o r e s? A ns w e r : L = lo w e r b o u n dar y o f t h e m o da l c l a s s = 70 f 1 = f r equen cy o f the m od al c l a s s = 25 f = f r e q ue nc y o f t h e c l a s s b e f o r e t h e m o da l c l a s s = 12 f 2​ = f r e q ue ncy o f the c l a s s a f t e r the m o dal c l a s s = 10 h = c l a s s w i dt h = 10 U s i ng f o r m u l a: Mode = 75. S co r e R a n g e N um b e r of S tu d e n ts (F r e q u e n c y ) 5 – 60 8 60 – 70 12 7 – 80 25 80 – 90 10 9 - 100 5 =75  

Relationships Between Mean, Median and Mode: The relationships between mean, median & mode is defined as Mode is equal to the difference between 3 times the median & 2 times the mean. That is, Mean – Mode = 3 (Mean – Median) OR; Mode = 3 Median – 2 Mean. Example : If the difference between mean and mode of a population is 48 and the m e dian is 12, then the mean is Solution: Mean – Mode = 3(Mean – Median);  48 = 3(Mean – 12);  16 = Mean – 12; Mean = 28.

B . Measures of dispersion An average can represent a series only as best as a single figure can, but it certainly cannot reveal the entire story of any phenomenon under study It shows the degree by which numerical data tend to spread around an average value/mean . A v e r a g es d o n o t t e l l a n y t h i n g a b o u t t h e s c a tt e r n e ss o f o b s e r v a ti o n s w it h in t h e distribution. In order to measure the degree of scatter , the statistical device called measures of dispersion are calculated. 23 12/2/2024 By: Menberu T.

Range = highest value – lowest value It shows the difference b/n the highest value and the lowest value, hence it is the weakest measure of dispersion Variance First calculate the mean, then deduct the mean from each value in the group square the result and divide the result by the number of values. The variance is used as a measure of how far a set of numbers are spread out from each other. It describes how far the numbers lie from the mean (expected value). 24 12/2/2024 By: Menberu T.

Standard deviation The most reliable measurement of the degree to which the data is spread around the mean Putting the variance in square root 25 12/2/2024 By: Menberu T.

Example: please, find the mean , median , mode , range , variance and standard deviation for the following row data? 12/2/2024 By: Menberu T. 26 ID Age of respondent 1 53 2 44 3 56 4 70 5 45 6 62 7 36 8 23 9 56 10 55

S olution: A) Mean = ∑xi/n = 53 + 44 + 56 +70 + 45 + 62 + 36 +23 + 56 + 55/10 = 500/10 = 50 B) Median , first we should arrange the raw data in ascending or descending order as follow: 23, 36, 44, 45, 53, 55, 56, 56, 62, 70, since n is order, therefore Median = 53 + 55/2 = 54 C) Mode, we find the most frequently occur , 56 is the mode of the given data since it is more frequently occur and It is uni-modal . D) Range = largest value-lowest value = 70-23 = 47 E) Variance = ∑(xi- )2/n   12/2/2024 By: Menberu T. 27

ID xi xi- (xi- )2 1 53 3 9 2 44 -6 36 3 56 6 36 4 70 20 400 5 45 -5 25 6 62 12 144 7 36 -14 196 8 23 -27 729 9 56 6 36 10 55 5 25 ∑(xi- )2 =1636 variance = ∑(xi- )2/n = 1636/10 = 163.6 ID xi 1 53 3 9 2 44 -6 36 3 56 6 36 4 70 20 400 5 45 -5 25 6 62 12 144 7 36 -14 196 8 23 -27 729 9 56 6 36 10 55 5 25 12/2/2024 By: Menberu T. 28 F) SD= variance = 163.6 = 12.79  

Measure of dispersion for Grouped Data Sample Variance Formula for Grouped Data (σ 2 ) = ∑ f(m i – x̄) 2 /(n-1) Population Variance Formula for Grouped Data (σ 2 ) = ∑ f(m i – x̄) 2 /n where,  f is the frequency of each interval m i is the midpoint of the i th interval x̄ is the mean of the grouped data 12/2/2024 By: Menberu T. 29

Cont… Find the variance and the standard deviation for the following frequency distribution of a sample : 12/2/2024 By: Menberu T. 30 Class F requency f m 5 – 9 2 10 – 14 4 15 – 19 7 20 – 24 3 25 – 29 1 30 – 34 3 Total 20

Cont… 12/2/2024 By: Menberu T. 31 3.5 12.25

Cont… Variance= = 1105/19=58.158 Standard deviation= 7.626   12/2/2024 By: Menberu T. 32

C. Measures of relationship 1. Coefficient of variance It (CV) is a normalized measure of dispersion. It is also known as unitized risk or the variation coefficient . It is defined as the ratio of the standard deviation to the mean. CV is a relative measure of dispersion , V, defined by: 33 12/2/2024 By: Menberu T.

Example: If the standard deviation of a given distribution is 0.20 and the mean is 0.50 , what is the coefficient of variation (CV )? CV = (0.20/0.50)*100% = 40%  2. Covariance Covariance between X and Y refers to a measure of how much two variables change together . Covariance indicates how two variables are related . A positive covariance means the variables are positively related, while a negative covariance means the variables are inversely related. The formula for calculating covariance of sample data is shown below. 34 12/2/2024 By: Menberu T.

35 Note: for population (N) and for sample (n-1) Often the numbers have no meaning. Thus we focus on the sign . 12/2/2024 By: Menberu T.

3. correlation Covariance only shows the direction. It has no upper and lower bound. Correlation tells the degree to which the variables tend to move together. The most familiar measure of dependence between two quantities is the " Pearson's correlation." It is obtained by dividing the covariance of the two variables by the product of their standard deviations . The Pearson correlation is defined only if both of the standard deviations are finite ፥ፍልሕ፡ህ፡ and both of them are nonzero . The correlation coefficient is symmetric: corr (X, Y) =  corr (Y, X). 36 12/2/2024 By: Menberu T.

The Pearson correlation is +1 if there is perfect positive linear relationship, − 1 if there is perfect negative linear relationship . If the variables are independent , Pearson's correlation coefficient is 0. The sample correlation coefficient is written 37 12/2/2024 By: Menberu T.

The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by  , can take on any value from -1 to 1 .    indicates a perfect negative linear relationship -1 <  < 0 indicates a negative linear relationship    indicates no linear relationship 0 <  < 1 indicates a positive linear relationship    indicates a perfect positive linear relationship The absolute value of  i ndicates the strength or exactness of the relationship. 38 12/2/2024 By: Menberu T.

Example: find covariance and Pearson correlation following hypothetical row data? 12/2/2024 By: Menberu T. 39 xi yi xi- yi- (xi- )(Yi- ) (xi- )2 (Yi- )2 10 18 -4 6 -24 16 36 30 6 16 -6 -96 256 36 8 12 -6 36 16 15 2 3 6 4 9 6 9 -8 -3 24 64 9 Cov (X,Y)= ∑(xi- )( Yi- )/n= -90/5 = -18 ∑(xi- )2 = 376 ∑ (Yi- )2 = 90 r (x, y) = ∑(xi- )(Yi- )/ ∑(xi- )2∑ (Yi- )2 = -90/ 33, 840 = -90/183 = -0.49 xi yi 10 18 -4 6 -24 16 36 30 6 16 -6 -96 256 36 8 12 -6 36 16 15 2 3 6 4 9 6 9 -8 -3 24 64 9

Skewness It refers to symmetry or asymmetry of the distribution . A distribution is symmetric if its left half is a mirror image of its right half . The skewness value can be positive or negative. A symmetric distribution with a single peak and a bell shape is known as a normal distribution. D. Shape of Frequency Distribution 12/2/2024 By: Menberu T. 40

Kurtosis: It refers to peakedness /flatness of the distribution. Higher kurtosis means more of the variance is the result of infrequent extreme deviation. The fourth standardized moment is defined as 12/2/2024 By: Menberu T. 41