AbdullahHassan885167
0 views
61 slides
Oct 10, 2025
Slide 1 of 61
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
About This Presentation
fqafweVCzdcvs
Size: 1.51 MB
Language: en
Added: Oct 10, 2025
Slides: 61 pages
Slide Content
Chapter 4
STATISTICS AND SAMPLING DISTRIBUTIONS
STATISTICAL INFERENCE AND SAMPLING DISTRIBUTION Statistical inference aims to draw conclusions or make decisions about a popu lation based on a sample selected from the population . Statistical inference uses quantities computed from the observations in the sample. A statistics defined as any function of the sample data that does not contain unknown parameters . The probability distribution of a statistic is called a sampling distribution . 3
WHAT IS SAMPLING ✅ Sampling means selecting a subset of data from a larger population. ✅ The goal is to analyze the sample and make inferences about the entire population. ✅ Different probability distributions govern how samples behave. 4
SAMPLING DISTRIBUTION A sampling distribution describes the probability distribution of a sample statistic (e.g., sample mean, variance). Important in statistical inference , as we use sample data to estimate population parameters . Three commonly used distributions : Z- Distribution → Used for normally distributed random variable Chi-Square Distribution → Used for variance testing . t-Distribution → Used for small sample mean comparisons . F-Distribution → Used for comparing variances of two samples . 5
CENTRAL LIMIT THEOREM States that the distribution of the sample mean approaches normality as sample size increases, regardless of the population's original distribution . If variable are sampled from any distribution with mean μ and variance , then the sampling distribution of the sample mean is: A llows us to use normal approximations for inference, even when the population distribution is unknown. 6
Chi-Square Distribution Used for testing variances and goodness-of-fit tests. If sample is normally distributed, then the sum of their squared standard normal variables follows a Chi-Square distribution : Key Properties: Skewed to the right. Shape depends on degrees of freedom ( df = n-1). Mean = df , Variance = 2 × df . Application : Testing variance of a process (e.g., quality control in manufacturing). Checking if a dataset follows a given distribution (e.g., normality test). 7
The t-Distribution Used for mean comparisons when sample size is small (n<30n < 30) and population standard deviation ( σ) is unknown. Defined as : Key Properties: Bell-shaped and symmetric, like the normal distribution. Has heavier tails than the normal distribution (higher probability of extreme values). As sample size increases, the t-distribution approaches the normal distribution. Application: Hypothesis testing for small samples (e.g., comparing sample means). Confidence intervals for small sample means. 8
The F-Distribution Used to compare two variances (Ratio of two Chi-Square variables). Defined as : Key Properties: Always positive (variances cannot be negative). Right-skewed and depends on two degrees of freedom. Used in ANOVA (Analysis of Variance). Application : Comparing process variability between two machines. ANOVA tests for comparing multiple means. 9
COMPARISON 10
Bernoulli Distribution What is the Bernoulli Distribution? The Bernoulli Distribution models a binary outcome ( success or failure ). It takes on two possible values: 1 (success) with probability ppp . 0 (failure) with probability 1−p1 - p1−p. Common applications: Quality control: Does a product pass or fail an inspection? Manufacturing defects: Is an item defective (1) or non-defective (0)? Survey responses: Did a customer like a service (1) or not (0)? 11
Poisson Distribution ✔ The Poisson Distribution models the number of events occurring in a fixed interval of time or space. ✔ It applies when events happen: Randomly Independently At a constant average rate ( λ) ✔ Common applications: Defect detection: How many defects are found in a batch of 1000 units? Customer service: How many calls arrive at a help desk per hour? Healthcare: How many patients visit an emergency room daily? 12
POINT ESTIMATORS 13
What is Point Estimators ✅ In statistics, we estimate unknown parameters of a population using sample data . ✅ A point estimator is a single value calculated from a sample that serves as an estimate of a population parameter. ✅ Examples of parameters : Mean (μ) : Average value of a population. Variance (σ²) : Measure of how much values vary. Poisson parameter (λ) : Expected number of events per time unit. 14
Point Estimators Poisson Parameter ( λ ) ✔ Used when estimating the rate of occurrences (e.g., defects per hour). ✔ Formula : Example : A factory counts the number of defective items in 10 production batches. The Poisson rate ( λ) estimates the defect rate per batch. 15
Point Estimators Binomial Parameter ( ) ✔ Used when estimating the probability of success in a binary (yes/no) process . ✔ Formula : ✔ Example: A manufacturer inspects 100 chips and finds 5 defective . The estimated defect probability is 16
Estimating Standard Deviation 17 1. Using the Sample Standard Deviation (s) ✔ Formula: ✔ Correction factor c4c ensures unbiased estimation .
Estimating Standard Deviation 18 2. Using the Range Method ✔ When the sample size is small (n ≤ 6) , we estimate σ using the range R: ✔ Example: A technician measures the smallest & largest diameters in a batch. Uses the range to estimate standard deviation instead of calculating s
STATISTICAL INFERENCES
Estimating Standard Deviation 20 ✔ Statistical inference helps us make conclusions about a population based on sample data. ✔ Two main types: Parameter Estimation – Estimating population values using sample statistics. Hypothesis Testing – Checking if a claim about a population is true using sample data.
What is Hypothesis Testing 21 ✔ A hypothesis is a claim or assumption about a population parameter (e.g., mean, variance). ✔ There are two hypotheses in testing: Null Hypothesis (H0) : The default assumption (e.g., "The machine produces screws with an average diameter of 1.5 cm"). Alternative Hypothesis (H1) : A competing claim (e.g., "The screws are not 1.5 cm on average").
ERRORS IN HYPOTHESIS TESTING 22
STEPS IN HYPOTHESIS TESTING State the hypotheses ( H0 and H1). Choose the significance level (α) (commonly 5% or 1 %). Select the test statistic (e.g., Z-test or t-test ) Compute the test statistic from sample data Compare with the critical value or p-value to make a decision : If the test statistic is beyond the critical value, reject H0. If the p-value is smaller than α, reject H0 . 23
Z-TEST 24
EXAMPLE
LECTURE 7
Significan e Level The significance level ( α) is the probability of rejecting a true null hypothesis . It represents the risk of making a Type I error , which occurs when we conclude that there is an effect or difference when there isn’t one . α=0.05 (5%) → This means there is a 5% chance of rejecting H0 when it is actually true. 27
Confidence Level If α = 0.05, the confidence level is 95 % . Confidence Level we are 95% confident that the true population parameter lies within our interval. Check the probability value of 0.949 from Z-table to get 1.645. 28
Confidence Interval An interval estimate of a parameter is the interval between two statistics that includes the true value of the parameter with some probability. For example, to construct an interval estimator of the mean ,we must find two statistics L and U such that: 29
Confidence Interval with Variance 30
Significan e Level Given that the standard deviation of response time is 8 milliseconds , and the sample mean response time is 79.25 milliseconds , find a 95% two-sided confidence interval for the population mean response time . Standard deviation = 8 Sample size = 25 31
P – value Hypothesis Test 32 The P-value is the smallest level of significance that would lead to rejection of the null hypothesis H0 The p - value shows the likelihood of your data occurring under the null hypothesis. P - values help determine statistical significance . Statistical significance refers to the likelihood that the observed effect in a study is not due to random chance If the p-value is less than or equal to the chosen significance level (α) (typically 0.05 or 0.01 ), the results are considered statistically significant .
P – Va lue 33
Inference on Mean, with Variance Unknown 34 For Confidence Interval:
INFERENCE ON VARIANCE Hypothesis testing: Chi – Square : 35
Confidence Interval 36
INFERENCE ON POPULATION PROPORTION 37
CONFIDENCE INTERVAL 38
TYPE II ERROR 39
STATISTICAL INFERENCE FOR TWO SAMPLES
DIFFERENCE IN MEAN, VARIANCE KNOWN 41 CONFIDENCE INTERVAL: TEST STATISTIC:
CONT 42
VARIANCES UNKNOWN 43
CONT 44 Degrees of Freedom ( df ):
CONFIDENCE INTERVAL 45 CASE 1: VARIANCE ARE EQUAL CASE 2: VARIANCE ARE NOT EQUAL
VARIANCE OF TWO NORMAL DISTRIBUTIONS 46
CONFIDENCE INTERVAL 47 Upper and Lower bounds respectively:
INFERENCES OF TWO POPULATION PROPORTION 48 Test Statistic: Confidence Interval:
HYPOTHESIS TESTING 49 An estimator for Parameter (p) :
ANALYSIS OF VARIANCE(ANOVA)
Analysis of Variance Analysis of Variance (ANOVA) is a statistical technique used to determine whether there are statistically significant differences between the means of three or more independent groups. It helps in testing if the variation among group means is due to actual differences or just random chance . 51 Used to compare means of three or more groups. Determines if observed differences are statistically significant. Helps in quality control and process optimization
EXAMPLE 52 A team of engineers responsible for the study decides to investigate four levels of hardwood concentration: 5%, 10%, 15%, and 20%, to improve tensile strength. Factor : Independent variable (e.g., hardwood concentration). Levels : Different values of the factor (e.g., 5%, 10%, 15%, 20%). Response Variable : Measurement outcome (e.g., tensile strength). Replicates : Multiple observations per level. H0 : No significant difference in mean tensile strength.
ANOVA 53 Can be described a statistical linear model: = Overall mean Effect of the treatment (difference from the overall mean) random error component (normally distributed)
ANOVA 55 To measure Variability: To calculate SST:
F- Test 56 To Calculate F statistic:
LINEAR REGRESSION MODEL Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X) using a straight-line equation . Linear regression finds the best-fitting line by minimizing the differences (errors) between the observed data points and the predicted values 57
CONT For minimizing errors you find partial derivatives with respect to : 58
HYPOTHESIS TESTING 59 For Multiple Regression Model: https://www.youtube.com/watch?v=EFdlFoHI_0I
CONFIDENCE INTERVAL FOR REGRESSION 60
PREDICTION ACCURACY Prediction Error Sum of Squares: R – square: 61