Probability & Statistics By: Yuni Yamasari Department of Informatics Engineering Unesa
https://www.youtube.com/watch?v=em8nBc-zRaM&t=61s
Probability & Statistics There are three types of lies, lies, damn lies, and statistics”, a quote attributed to Mark Twain among others. In academia, we hopefully pursue the truth when analyzing data. Companies, unfortunately, don’t always have this goal. We can use statistics to verify the validity of published results and find the truth, if raw data can be obtained . Statistics, used properly, is a tool for analyzing data and discovering and proving it’s true meaning. Walpole chapter 01
Probability & Statistics Can anybody tell me what probability is? -When we know the underlying model that governs an experiment , we use probability to figure out the chance that different outcomes will occur . For example, if we flip a fair coin 3 times , what is the probability of obtaining 3 the particular picture? - By definition, probability values are between 0 and 1 Walpole chapter 01
Probability & Statistics How do statistics compare to probability? -In statistics, we don’t know the underlying model governing an experiment . -All we get to see is a sample of some outcomes of the experiment . -We use that sample to try to make inferences about the underlying model governing the experiment. Walpole chapter 01
Samples and Populations In statistics, a population is the set of all possible outcomes of an experiment (may be infinite). In probability, we will call the set of all possible outcomes the sample space . A sample is a set of observations taken from a population. A random sample is selected so that every element in the population has an equal chance of being selected. Often in statistics, we compare samples from two different populations and try to determine statistically if the populations are significantly different. Walpole chapter 01
Measures of Sample Location The sample mean is the most important single statistic measuring the location of a sample. What is the common term for the sample mean? The numerical average of the sample observations. How is this calculated? The sum of the observations divided by the sample size n. The sample mean is an estimate of the population mean. For a set of n observations, x 1 , x 2 , …, x n , the sample mean is calculated as follows: Walpole chapter 01
Sample median & trimmed mean The sample median is another measure of location. What is the median of a sample? The observation separating the upper and lower halves of the sample. The middle observation if n is odd. What if n is even? The average of the two middle observations. The trimmed mean is calculated by eliminating the highest and lowest values in the sample and taking the mean of the remaining values. For a 10% trimmed mean, the largest 10% and the smallest 10% are eliminated Walpole chapter 01
Variability Measures: Variance Sample variability is critical to statistical calculations. Sample variance and standard deviation are the most important measures of variability. Does anyone know how to calculate variance? For a set of n observations, x 1 , x 2 , …, x n , the sample variance, s 2 , is calculated as follows: n –1 is called the degrees of freedom associated with the variance. This is the number independent squared deviations, or pieces of information that make up s 2 . Walpole chapter 01
Variability Measures: Variance Sample variability is critical to statistical calculations. Sample variance and standard deviation are the most important measures of variability. Does anyone know how to calculate variance? For a set of n observations, x 1 , x 2 , …, x n , the sample variance, s 2 , is calculated as follows: n –1 is called the degrees of freedom associated with the variance. This is the number independent squared deviations, or pieces of information that make up s 2 . Walpole chapter 01
Variability Measures: Variance The standard deviation, s, is the square root of the variance. What are the units of the standard deviation? What does it mean if the variance (and thus the standard deviation) are large? Range is the other measure of sample variability. How is the range of a sample of data calculated? The range is equal to x max – x min . Walpole chapter 01
Measures of Sample Variability The standard deviation, s, is the square root of the variance. What are the units of the standard deviation? What does it mean if the variance (and thus the standard deviation) are large? Range is the other measure of sample variability. How is the range of a sample of data calculated? The range is equal to x max – x min . Walpole chapter 01
Frequency Histogram Frequency histogram: Given a sample of data points, we divide data into equally-spaced intervals, and count the number of data points that fall into each interval. A histogram is a bar chart with the length of each bar proportional to the number of observations in that interval. A histogram for a sample will be an approximation of the probability distribution of the population. Probability distributions: Show much more about a population than just the mean and standard deviation. A distribution may be symmetric , or may be skewed to the right or the left. The tail of a distribution shows the distance from the mean of the outlying points (for example, the 95 th percentile point) Walpole chapter 01