UNIT IV probability and standard distribution

rehabonehealthcare 33 views 37 slides Oct 11, 2024
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

UNIT IV probability and standard distribution


Slide Content

UNIT IV Probability and Standard Distributions DR PRASANNA MOHAN PROFESSOR/RESEARCH HEAD KRUPANIDHI COLLEGE OF PHYSIOTHERAPY

Range The spread, or the distance, between the lowest and highest values of a variable. To get the range for a variable, you subtract its lowest value from its highest value. Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Class A Range = 140 - 89 = 51 Class B--IQs of 13 Students 127 162 131 103 96 111 80 109 93 87 120 105 109 Class B Range = 162 - 80 = 82

Interquartile Range A quartile is the value that marks one of the divisions that breaks a series of values into four equal parts. The median is a quartile and divides the cases in half. 25 th percentile is a quartile that divides the first ¼ of cases from the latter ¾. 75 th percentile is a quartile that divides the first ¾ of cases from the latter ¼. The interquartile range is the distance or range between the 25 th percentile and the 75 th percentile. Below, what is the interquartile range? 0 250 500 750 1000 25% of cases 25% 25% 25% of cases

Variance A measure of the spread of the recorded values on a variable. A measure of dispersion. The larger the variance, the further the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean. Mean Mean

Variance Variance is a statistical measure that indicates how spread out or dispersed the data points are in a dataset. It gives insight into the degree to which each data point deviates from the mean (average) Calculating variance starts with a “deviation.” A deviation is the distance away from the mean of a case’s score. A high variance means that the data points are spread out over a large range of values, while a low variance means they are clustered closely around the mean.

Variance The deviation of 102 from 110.54 is? Deviation of 115? Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Y-bar A = 110.54

Variance The deviation of 102 from 110.54 is? Deviation of 115? 102 - 110.54 = -8.54 115 - 110.54 = 4.46 Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Y-bar A = 110.54

Variance We want to add these to get total deviations, but if we were to do that, we would get zero every time. Why? We need a way to eliminate negative signs. Squaring the deviations will eliminate negative signs... A Deviation Squared: (Y i – Y-bar) 2 Back to the IQ example, A deviation squared for 102 is: of 115: (102 - 110.54) 2 = (-8.54) 2 = 72.93 (115 - 110.54) 2 = (4.46) 2 = 19.89

Variance If you were to add all the squared deviations together, you’d get what we call the “Sum of Squares.” Sum of Squares (SS) = Σ (Y i – Y-bar) 2 SS = (Y1 – Y-bar) 2 + (Y2 – Y-bar) 2 + . . . + (Yn – Y-bar) 2

Variance Class A, sum of squares: (102 – 110.54) 2 + (115 – 110.54) 2 + (128 – 110.54) 2 + (109 – 110.54) 2 + (131 – 110.54) 2 + (89 – 110.54) 2 + (98 – 110.54) 2 + (106 – 110.54) 2 + (140 – 110.54) 2 + (119 – 110.54) 2 + (93 – 110.54) 2 + (97 – 110.54) 2 + (110 – 110.54) = SS = 2825.39 Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Y-bar = 110.54

Variance The last step… The approximate average sum of squares is the variance. SS/N = Variance for a population. SS/n-1 = Variance for a sample. Variance = Σ (Y i – Y-bar) 2 / n – 1

Variance For Class A, Variance = 2825.39 / n - 1 = 2825.39 / 12 = 235.45 How helpful is that???

Standard Deviation To convert variance into something of meaning, let’s create standard deviation. The square root of the variance reveals the average deviation of the observations from the mean. s.d. = Σ (Y i – Y-bar) 2 n - 1

Standard Deviation For Class A, the standard deviation is: 235.45 = 15.34 The average of persons’ deviation from the mean IQ of 110.54 is 15.34 IQ points. Review: 1. Deviation 2. Deviation squared 3. Sum of squares 4. Variance 5. Standard deviation

Standard Deviation Larger s.d. = greater amounts of variation around the mean. For example: 19 25 31 13 25 37 Y = 25 Y = 25 s.d. = 3 s.d. = 6 s.d. = 0 only when all values are the same (only when you have a constant and not a “variable”) If you were to “rescale” a variable, the s.d. would change by the same magnitude—if we changed units above so the mean equaled 250, the s.d. on the left would be 30, and on the right, 60 Like the mean, the s.d. will be inflated by an outlier case value.

Skewness and Kurtosis

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. 

Skewness  Skewness refers to a distortion or asymmetry that deviates from the symmetrical bell curve, or normal distribution, in a set of data. If the curve is shifted to the left or to the right, it is said to be skewed. Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution. A normal distribution has a skew of zero

Types of skewness 

Positive skewed or right-skewed  In which most values are clustered around the left tail of the distribution while the right tail of the distribution is longer.  In positively skewed, the mean of the data is greater than the median (a large number of data-pushed on the right-hand side). T he mean, median, and mode of the distribution are positive rather than negative or zero.

Negative skewed or left-skewed In which more values are concentrated on the right side (tail) of the distribution graph while the left tail of the distribution graph is longer. T he mean of the data is less than the median (a large number of data-pushed on the left-hand side). T he mean, median, and mode of the distribution are negative rather than positive or zero.

Calculate the skewness coefficient of the sample I t truly scales the value down to a limited range of  -1 to +1. If the skewness is between -0.5 & 0.5, the data are nearly symmetrical. If the skewness is between -1 & -0.5 (negative skewed) or between 0.5 & 1(positive skewed), the data are slightly skewed. If the skewness is lower than -1 (negative skewed) or greater than 1 (positive skewed), the data are extremely skewed.

Kurtosis Types of excess kurtosis Leptokurtic or heavy-tailed distribution (kurtosis more than normal distribution). Mesokurtic (kurtosis same as the normal distribution). Platykurtic or short-tailed distribution (kurtosis less than normal distribution).

Leptokurtic (kurtosis > 3) Platykurtic (kurtosis < 3) Platykurtic having a lower tail and stretched around center tails means most of the data points are present in high proximity with mean. A platykurtic distribution is flatter (less peaked) when compared with the normal distribution.

Mesokurtic (kurtosis = 3) Mesokurtic is the same as the normal distribution, which means kurtosis is near to 0. In mesokurtic, distributions are moderate in breadth, and curves are a medium peaked height.

Important Measure of Variation Shows Variation About the Mean: For the Population: For the Sample: Variance For the Population: use N in the denominator. For the Sample : use n - 1 in the denominator.

Important Measure of Variation Shows Variation About the Mean: For the Population: For the Sample: Standard Deviation

Coefficient of Variation Measure of Relative Variation Always a % Shows Variation Relative to Mean Used to Compare 2 or More Groups Formula (for Sample):

Stock A: Average Price last year = $50 Standard Deviation = $5 Stock B: Average Price last year = $100 Standard Deviation = $5 Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5% Comparing Coefficient of Variation

Correlation: How much depend the value of one variable on the value of the other one? Y X Y X Y X high positive correlation poor negative correlation no correlation

How to describe correlation (1): Covariance The covariance is a statistic representing the degree to which 2 variables vary together

cov ( x,y ) = mean of products of each point deviation from mean values Geometrical interpretation : mean of ‘signed’ areas from rectangles defined by points and the mean value lines

sign of covariance = sign of correlation Y X Y X Y X Positive correlation: cov > 0 Negative correlation: cov < 0 No correlation. cov ≈ 0

5 test scores for Calculus I are 95, 83, 92, 81, 75. Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 3. Here are a bunch of 10 point quizzes from MAT117: 9, 6, 7, 10, 9, 4, 9, 2, 9, 10, 7, 7, 5, 6, 7 4. 11, 140, 98, 23, 45, 14, 56, 78, 93, 200, 123, 165 Find the Variance, SD & CV

Class Interval Frequency 2 -< 4 3 4 -< 6 18 6 -< 8 9 8 -< 10 7 Find the Variance, SD & CV Example A: 3, 10, 8, 8, 7, 8, 10, 3, 3, 3 Example B: 2, 5, 1, 5, 1, 2 Example C: 5, 7, 9, 1, 7, 5, 0, 4

Exam marks for 60 students (marked out of 65) mean = 30.3 sd = 14.46 Find the Mean, Median, Mode Variance, SD & CV

Group Frequency Table Frequency Percent 0 but less than 10 4 6.7 10 but less than 20 9 15.0 20 but less than 30 17 28.3 30 but less than 40 15 25.0 40 but less than 50 9 15.0 50 but less than 60 5 8.3 60 or over 1 1.7 Total 60 100.0
Tags