Descriptive statistics

26,040 views 26 slides Dec 04, 2017
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Sarfraz Ahmad
Lecturer
KUST, Kohat


Slide Content

Descriptive Statistics

Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Descriptive statistics are typically distinguished from  inferential statistics . With descriptive statistics you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone.

Descriptive Statistics We use descriptive statistics simply to describe what's going on in our data . Descriptive Statistics are used to present quantitative descriptions in a manageable form. Descriptive statistics help us to simplify large amounts of data in a sensible way. Descriptive statistics aims to summarize a  sample , rather than use the data to learn about the  population  that the sample of data is thought to represent.

Descriptive Statistics Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in papers reporting on human subjects, typically a table is included giving the overall  sample size , sample sizes in important subgroups (e.g., for each treatment or exposure group), and  demographic  or clinical characteristics such as the  average  age, the proportion of subjects of each sex, the proportion of subjects with related  comorbidities , etc .

Descriptive Statistics Some measures that are commonly used to describe a data set are measures of  Central tendency  and Measures of variability Measures of central tendency include the   mean ,  median  and  mode , Measures of variability include the  standard deviation  (or  variance ), the minimum and maximum values of the variables ,  kurtosis  and  skewness .

Descriptive Statistics Measures of Central Tendency Measures of Variability Mean Median Mode Range Variance Quartile Standard Deviation

Measures of Central Tendency Introduction A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. Measures of central tendency are sometimes called measures of central location . They are also called summary statistics.

Measures of Central Tendency Introduction The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode . The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others.

Measures of Central Tendency Mean (Arithmetic) The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

Measures of Central Tendency Mean (Arithmetic) If we have n values in a data set and they have values x 1 , x 2 , ..., x n , the sample mean, usually denoted by  (pronounced x bar), is:

Measures of Central Tendency Mean (Arithmetic) This formula is usually written in a slightly different manner using the Greek capitol letter,  , pronounced "sigma", which means "sum of...":

Measures of Central Tendency Why have we called it a sample mean? This is because, in statistics, samples and populations have very different meanings and these differences are very important, even if, in the case of the mean, they are calculated in the same way. To acknowledge that we are calculating the population mean and not the sample mean, we use the Greek lower case letter "mu", denoted as µ:

Measures of Central Tendency Median The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below: 65 55 89 56 35 14 56 55 87 45 92

Measures of Central Tendency Median We first need to rearrange that data into order of magnitude (smallest first ): Our median mark is the middle mark - in this case, 56 (highlighted in Red). It is the middle mark because there are 5 scores before it and 5 scores after it.  14 35 45 55 55 56 56 65 87 89 92

Measures of Central Tendency Mode The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.

Measures of Central Tendency Mode An example of a mode is presented below:

Measures of Central Tendency Mode Normally, the mode is used for categorical data where we wish to know which is the most common category, as illustrated below:

Measures of Central Tendency Mode We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with  exactly  the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many people might be close, but with such a small sample (30 people) and a large range of possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data.

Measures of Central Tendency

Measures of Central Tendency Summary of when to use the mean, median and mode Please use the following summary table to know what the best measure of central tendency is with respect to the different  types of variable . Type of Variable Best measure of central tendency Nominal Mode Ordinal Median Interval/Ratio (not skewed) Mean Interval/Ratio (skewed) Median

Measures Variability or Spread or Dispersion These are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range , quartiles , absolute deviation , variance and  standard deviation .

Measures Variability or Spread or Dispersion Variability is the extent to which data points in a statistical distribution or data set diverge from the average, or mean, value as well as the extent to which these data points differ from each other.

Measures Variability or Spread or Dispersion The simplest measure of dispersion is the  range . This tells us how spread out our data is. In order to calculate the range, you subtract the smallest number from the largest number. Just like the mean, the range is very sensitive to outliers . The   variance  is a measure of the average distance that a set of data lies from its mean. The variance is not a stand-alone statistic. It is typically used in order to calculate other statistics, such as the standard deviation. The higher the variance, the more spread out your data are .

Measures Variability or Spread or Dispersion There are four steps to calculate the variance: Calculate the mean. Subtract the mean from each data value. This tells you how far each value lies from the mean. Square each of the values so that you now have all positive values, then find the sum of the squares. Divide the sum of the squares by the total number of data in the set .

Measures Variability or Spread or Dispersion The   standard deviation  is the most popular measure of dispersion. It provides an average distance of the data set from the mean. Like the variance, the higher the standard deviation, the more spread out your data are. Unlike the variance, the standard deviation is measured in the same unit as the original data, which makes it easier to interpret. It is calculated by finding the square root of the variance .

Thank You
Tags