descriptive measures of data(mean, median, mode and etc.).pptx

NecroManXer 46 views 37 slides Aug 19, 2024
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

measurements of central tendency


Slide Content

Descriptive Measures of Data “Battalions of figures are like battalions of men, not always as strong as is supposed” - M.Sage

Learning Objectives Given the learning materials and activities of this chapter, you will be able to 1. find the mean, median and the mode to describe the center of a set of data; 2. calculate the range, the mean deviation, the variance and the standard deviation to describe the variability of a set of data;
3. determine ranks as measures of position for a given set of data;
4. describe the shape of a distribution using measures of skewness and kurtosis;
5. interpret values of specific descriptive measures of data and evaluate their relative merits and limitations.

• The Summation Symbol (Σ) The summation notation uses the Greek capital word Σ (sigma) to denote the sum of values .

Example:

i . Arithmetic Mean (Mean) The arithmetic mean, or simply, the mean of a set of data is the sum of the values divided by the number of values. Where X is the values of a data set.

Example: The ratings given by five judges to a painting exhibit using a 10-point rating scale, with 10 as excellent and 1 as poor, were as follows: 8, 5, 9, 8 and 7. Treating the data as a population, what is the mean rating of the five judges? X n = 8, 5, 9, 8, 7

a. Population and Sample Mean The sample mean is the average of a subset (sample) of observations taken from a larger population. The population mean is the average of all observations in a specific population.

Sample Mean e.g. The scores of a sample of 11 undergraduate students in a 100-item final examination in Statistics were recorded as follows: 70, 83, 74, 75, 81, 75, 92, 75, 90, 74 and 95. Find the mean for the given data.

ii. The Median The median of a set of data, arranged in an increasing or decreasing order magnitude, is the value that splits the data into two halves such that half of the values are greater than or equal to the median and the other half of the values are less than or equal to the median. if n is odd: if n is even: (n+2)/2

Example (Sample Mean e.g.) The recorded sample: 70, 83, 74, 75, 81, 75, 92, 75, 90, 74, 95 To find the median, we first arrange the scores in an array, that is, in an increasing order of magnitude, as follows: 70, 74, 74, 75, 75, 75, 81, 83, 90, 92,95. Since number of observations (n = 11) is odd, the median is determined as follow: Median =X (n + 1)/2 =X (11+1)/2 =X 6 =75 Since X 6 is 75 in the given data set.

Example (Even) : The ages of 10 Grade I teachers at the City Central School are as follows: 31,36, 42, 42, 55, 57, 57, 59, 60 and 62. Find the median age.
Solution: Since n = 10 and the given ages are arranged in an increasing order, the median is obtained as follows:

iii. Mode The mode is the value that occurs most frequently in a set of data. For ungrouped data, it is easy to determine the mode and does not require any calculation. For example, the given data set from the 10 students from Biliran International University who took the numeracy test; the data shows their scores out of 100 items test; 70, 81, 95, 70, 65, 90, 70, 75, 91, 70 For the given example, the mode is 70 since it appeared four times. Another example, given data set; 10, 12, 11, 20, 10, 14, 11, 27 There are two modes which is 10 and 11 who appeared twice on the data set and this is called a bimodal.

I. The Appropriateness of an Average The measures of central tendency the mean, median and the mode are measure of the average. To decide which statistical average is most appropriate for a given se of data, the nature of the data and the use to which the average is to be applied an important considerations. Also, we shall consider the relative merits and limitations each of these statistical measures.

ii. Aside from the fact that the arithmetic mean is a simple and familiar measure, it has following desirable properties:
1. It can be calculated from any set of numerical data, so it always exists.
2. A set of data has one and only one mean, so it is always unique.
3. It takes into consideration every item in the set of data.
4. It lends itself to further statistical treatment. For instance, the means of a number of separate groups of data can readily be combined into an overall mean, called the grand mean, without going back to the original data
5. It is relatively reliable and stable in sampling. The means of many samples drawn from the same population usually do not fluctuate, or vary, as widely as other statistics used to estimate the population mean.

iii. An outlier is either an extremely large or extremely s value in a set of data which can affect the mean to such an extent that it may give distorted impression of the data. The median always exists, and unique and is also easy to find once the data have been arranged in the order of magnitude. One advantages of the median is that it is not affected by extreme values and not sensitive to change the values of the items that make up the set of data.

iv. The two main advantages of the mode are: (1) it requires no calculation, and (2) it can be used for qualitative as well as quantitative data. However, a set of data may have either no mode or more than one mode. It is an unstable measure and its value may not be representative of the center of a given set of data. Thus, the mode is not widely used.

Measures of Variability: Range The range is the difference between the largest and smallest values in a set of data. Formula: Range Range = (Largest Value) – (Smallest Value)

Example of Range: In a 100-item test, the set of data is gained as follows; 92, 90, 86, 54, 48, 31 Solution: Range = 92 – 31 = 61, thus, the range of the given data set is 61.

The Mean Deviation Also called the average deviation, the mean deviation of a set of value x 1 , x 2 , …, x n is defined as the arithmetic mean of the absolute values of their deviations from their arithmetic mean. Formula: Mean Deviation (Ungrouped) MD =  

Example: The scores of a random sample of 5 Grade 11 pupils on a 10-item spelling quiz were 3, 5, 2, 8 and 7. Find the mean deviation of the sample. Solution: Mean x = = =5 MD = = = =2  

The Variance

Measures of Position Ranks, Fractiles and Quantiles

i . Measures of Position Aside from the measures of central tendency and variability, measures of position are used to describe the position of a score or any observation relative to the entire set of data. A measure of position describes the ordinal position of any score in a population of ordinal of interval ratio data.

ii. Ranks The rank of a score or any observation in a population is a measure of position that reflects the absolute number of the score in that population whose values fall above that score. It is an index of the absolute number of scores above a given score in a population. (Trooper, 1998)

a. Formula 1: Assigning Ranks to Scores The rank assigned to any given score is the number of scores in the population greater than that score plus one(1) . Rank = A + 1 Where A is the number of scores greater than the score being ranked.

Formula 2: Assigning Ranks to Tied Scores When two or more individuals in a population have the same score, their score are said to be tied. A group of individuals whose scores are equal to one another is called a knot. Rank = A+ Where A is the number of scores above the knot and B is the number of scores in the knot.  

Example: Student Score Ariel 35 Ben 39 Dick 32 Ella 42 Fred 45 Jay 40 Mia 35 Olga 35 Rod 40 Ted 38 This table shows the scores of ten graduate in a 50-item Proficiency Test. Rank These students on the basis of their scores. To facilitate ranking, the students may be assigned based on their scores from the highest to lowest. If we used the first formula which is the Rank = A + 1, Rank of Fred = A + 1 = 0 + 1 = 1 We note that A = 0 since there is no greater than Fred’s score of 45

i.1 Since two students, Jay and Rod, got the same score of 40, applying Formula 3.14 for tied scores, we get Rank of Jay & Rod =A+ = Where A= 2 since there are two scores greater than 40 and B= 2 since there are two students (jay and Rod ) in the knot.  

i.2 Next is Ben who obtained a score of 39. By formula 1, Rank of Ben = A + 1= 4+1 =5, where A= 4 since there are four scores greater than his score. It follows than rank of Ted is 6. For Ariel, Mia and Olga, since score of 35, their rank is determined as follows: Rank of Ariel, Mia & Olga = = Finally, Dick is ranked 10 th , that is, Rank of Dick = A+1=9+1=10, where A=9 since there are 9 score greater than his score. Summing up, the ranks of the 10 students on the basis of there are as follows:  

Final Ranking of the Students Students Scores Ranking Fred 45 1 Ella 42 2 Jay 40 3.5 Rod 40 3.5 Ben 39 5 Ted 38 6 Ariel 35 8 Mia 35 8 Olga 35 8 Dick 32 10 This table shows the final ranking of the ten graduate students who took the 50-item proficiency test. Summing up the ranks of the 10 students on the basis of their scores are as shown in the table.

2. Fractiles or Quantiles These measures of position describe or locate the position of a score or any observation in given data set relative to the entire set of data. They are value below which a relative number or percentage of observation in the population must fall. These measure include the following: 2.1 Quartiles . These are Value (denoted by Q 1 , Q 2 , Q 3 ) that divide a set of data into 4 equal parts such that 25% of the data falls below Q 1 , 50% falls below Q 2 and 75% falls below Q 3 . 2.2 Deciles. These are Value (denoted by D 1 , D 2 , D 3 ) that divide a set of data into 10 equal parts such that 10% of the data falls below the D 1 , 20%, falls below the D 2 ,…., 905 falls below the D 3 . 2.3 Percentiles. These are value (Denoted by P 1 , P 2 ,……, P 99 ) that divide a set of data into 100 equal parts such that 1%of the data falls below P-1, 2% falls below the P2 ,…, and 99% falls below P 99 . Based on their definition, it can be seen that the following equality relations between the various fractile values hold Median = Q 2 = D5 = P50 Q 1 = P25 Q 3 = P75 D 1 = P10 D 2 = P100 : D 9 = P90

3.5 Skewness and Kurtosis Large sets of data may also be described in term of its shape and configuration through measures of skewness and kurtosis. A distribution is said to be symmetric if it can be folded along a vertical axis so that is two sides coincide (Walpole, 2000). A distribution that lacks symmetry is said to be asymmetric or skewed thus, symmetry implies balance in the shape or distribution of a set of measurements. On the other hand, a distribution on that is skewed to the right (that, it has a long right tail compared to a much shorter tail) is positively skewed while a distribution that is skewed to the left is negatively skewed. Pearsonian Coefficient of Skewness : SK =  

i . As shown in formula 3.15, the numerator of SK is three times the difference of the mean and the median. For a symmetrical distribution, the mean = median, hence, SK=0. When the distribution is skewed to the left, SK is negative and when the distribution is skewed to the right SK is positive. In general, the values of SK will fall between -3 and 3. The distribution in normal , kurtosis = 3. If kurtosis is less than three, the distribution is platykurtic or less peaked than the normal and a kurtosis greater than three implies that the distribution is leptokurtic or more peaked than the normal curve.

Platykurtic and Leptokurtic

Shapes of frequency Distribution The measures of skewness and kurtosis are useful in describing the shape of large sets of data since they indicate the extent of departure of a distribution from normality and permit comparison of two or more distributions. The graphs below present various shapes of distribution described in terms of central tendency, symmetry and skewness   Median mode Figure 3.1 A unimodal, Symmetrical Distribution

The Positive and Negative Distribution A Negatively Skewed Distribution A Positively Skewed Distribution

Descriptive Measures for Grouped Data