Descriptive Statistics: Mean, Median Mode and Standard Deviation.

2,260 views 16 slides May 14, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Descriptive Statistics: Mean, Median Mode and Standard Deviation.


Slide Content

Data Science Descriptive Statistics (Mean, Median, Mode, Standard Deviation)

Mean In statistics, mean is the most common and frequently used method to measure the center of a data set. It’s a fundamental yet essential part of the statistical analysis of data.  The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set.  Mean= Sum of observation / Total number of observation

Example Find the mean of the following data set. 10, 20, 36, 12, 35, 40, 36, 30, 36, 40 Mean = ∑x i /n = (10 + 20 + 36 + 12 + 35 + 40 + 36 + 30 + 36 + 40) /10 = 295/10 = 29.5 Therefore, the mean of the given data set is 29.5.

Example- Grouped Data Marks 25 43 38 42 33 28 29 20 Number of students 20 1 4 2 15 24 28 6 Mean = (∑ f i x i )/ ∑f i

Example- Grouped Data Marks (x i ) Number of students (f i ) f i x i 25 20 500 43 1 43 38 4 152 42 2 84 33 15 495 28 24 672 29 28 812 20 6 120 Sum 100 2878

Continue… Mean = (∑ f i x i )/ ∑f i = 2878/100 = 28.78 Thus, the mean of the given distribution is 28.78.

Median In statistics, the median is a measure of central tendency, specifically a measure of the middle value of a dataset when it's arranged in ascending or descending order. The median is less sensitive to extreme values (outliers) compared to the mean, making it a useful measure of central tendency, especially when the data set contains outliers or is skewed. Steps: Arrange the data in ascending order (from smallest to largest) or descending order (from largest to smallest). If the number of data points is odd, the median is the middle value in the ordered list. If the number of data points is even, the median is the average of the two middle values.

Example For example, consider the dataset: 3,6,9,12,15. Since there are 5 data points (an odd number), the median is the middle value, which is 9. C onsider the dataset: 2,4,6,8. Since there are 4 data points (an even number), the median is the average of the two middle values, which is (4+6)/2=5.

Mode In statistics, the mode is the value that appears most frequently in a dataset. Unlike the mean and median, which are measures of central tendency, the mode is a measure of the data's "typical" value based on frequency. Identify the frequency of each unique value in the dataset . Determine which value has the highest frequency . This value is the mode. A dataset can have one mode (unimodal), two modes (bimodal), or more than two modes (multimodal). It's also possible for a dataset to have no mode if all values occur with the same frequency.

Example C onsider the dataset: 2,3,4,4,6,6,6,9. In this dataset, the value 6 appears most frequently (three times), so 6 is the mode. C onsider the dataset: 1,2,3,3,4,4,5. In this dataset, both 3 and 4 appear most frequently (twice each), so this dataset is bimodal, with modes of 3 and 4.

Standard Deviation T he standard deviation is defined as the deviation of the values or data from an average mean. Lower standard deviation concludes that the values are very close to their average. Whereas higher values mean the values are far from the mean value. Standard Deviation is of two types: Population Standard Deviation: It measures the dispersion or spread of the entire population. Sample Standard Deviation: It estimates the population standard deviation based on the sample.

Formula for S.D σ = Standard Deviation x i  = Terms Given in the Data μ = population mean x̄ = Sample mean n = Total number of Terms The formula for sample standard deviation involves a correction for the fact that it's based on a sample rather than the entire population. The denominator in the formula is adjusted by dividing by 𝑛−1 instead of n , where 𝑛 is the number of data points in the sample. This correction is known as Bessel's correction.

Example During a survey, 6 students were asked how many hours per day they study on an average? Their answers were as follows: 2, 6, 5, 3, 2, 3. Evaluate the standard deviation. Find the mean of the data: (2+6+5+3+2+3)/6 = 3.5 Mean =3.5

Construct the table x 1 x 1 −  x̄  (x 1 −  x̄) 2 2 -1.5 2.25 6 2.5 6.25 5 1.5 2.25 3 -0.5 0.25 2 -1.5 2.25 3 -0.5 0.25 = 13.5 Mean=3.5

Use the  Standard Deviation formula Sample Standard Deviation = 𝑠=√∑(𝑋−𝑋¯) 2 /𝑛−1 =√(13.5/[6-1]) =√[2.7] =1.643

Thanks for Watching! Please check the description box for the link to Machine Learning videos.