q3qkbrdbiwfbilwhfweihweiufhweiufuewhf27.pptx

AbdullahHassan885167 0 views 61 slides Oct 10, 2025
Slide 1
Slide 1 of 61
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61

About This Presentation

fqafweVCzdcvs


Slide Content

Chapter 4

STATISTICS AND SAMPLING DISTRIBUTIONS

STATISTICAL INFERENCE AND SAMPLING DISTRIBUTION Statistical inference aims to draw conclusions or make decisions about a popu lation based on a sample selected from the population . Statistical inference uses quantities computed from the observations in the sample. A statistics defined as any function of the sample data that does not contain unknown parameters . The probability distribution of a statistic is called a sampling distribution . 3

WHAT IS SAMPLING ✅ Sampling means selecting a subset of data from a larger population. ✅ The goal is to analyze the sample and make inferences about the entire population. ✅ Different probability distributions govern how samples behave. 4

SAMPLING DISTRIBUTION A sampling distribution describes the probability distribution of a sample statistic (e.g., sample mean, variance). Important in statistical inference , as we use sample data to estimate population parameters . Three commonly used distributions : Z- Distribution → Used for normally distributed random variable Chi-Square Distribution → Used for variance testing . t-Distribution → Used for small sample mean comparisons . F-Distribution → Used for comparing variances of two samples . 5

CENTRAL LIMIT THEOREM States that the distribution of the sample mean approaches normality as sample size increases, regardless of the population's original distribution . If variable are sampled from any distribution with mean μ and variance , then the sampling distribution of the sample mean is: A llows us to use normal approximations for inference, even when the population distribution is unknown.   6

Chi-Square Distribution Used for testing variances and goodness-of-fit tests. If sample is normally distributed, then the sum of their squared standard normal variables follows a Chi-Square distribution : Key Properties: Skewed to the right. Shape depends on degrees of freedom ( df = n-1). Mean = df , Variance = 2 × df . Application : Testing variance of a process (e.g., quality control in manufacturing). Checking if a dataset follows a given distribution (e.g., normality test).   7

The t-Distribution Used for mean comparisons when sample size is small (n<30n < 30) and population standard deviation ( σ) is unknown. Defined as : Key Properties: Bell-shaped and symmetric, like the normal distribution. Has heavier tails than the normal distribution (higher probability of extreme values). As sample size increases, the t-distribution approaches the normal distribution. Application: Hypothesis testing for small samples (e.g., comparing sample means). Confidence intervals for small sample means. 8

The F-Distribution Used to compare two variances (Ratio of two Chi-Square variables). Defined as : Key Properties: Always positive (variances cannot be negative). Right-skewed and depends on two degrees of freedom. Used in ANOVA (Analysis of Variance). Application : Comparing process variability between two machines. ANOVA tests for comparing multiple means. 9

COMPARISON 10

Bernoulli Distribution What is the Bernoulli Distribution? The Bernoulli Distribution models a binary outcome ( success or failure ). It takes on two possible values: 1 (success) with probability ppp . 0 (failure) with probability 1−p1 - p1−p. Common applications: Quality control: Does a product pass or fail an inspection? Manufacturing defects: Is an item defective (1) or non-defective (0)? Survey responses: Did a customer like a service (1) or not (0)? 11

Poisson Distribution ✔ The Poisson Distribution models the number of events occurring in a fixed interval of time or space. ✔ It applies when events happen: Randomly Independently At a constant average rate ( λ) ✔ Common applications: Defect detection: How many defects are found in a batch of 1000 units? Customer service: How many calls arrive at a help desk per hour? Healthcare: How many patients visit an emergency room daily? 12

POINT ESTIMATORS 13

What is Point Estimators ✅ In statistics, we estimate unknown parameters of a population using sample data . ✅ A point estimator is a single value calculated from a sample that serves as an estimate of a population parameter. ✅ Examples of parameters : Mean (μ) : Average value of a population. Variance (σ²) : Measure of how much values vary. Poisson parameter (λ) : Expected number of events per time unit. 14

Point Estimators Poisson Parameter ( λ ) ✔ Used when estimating the rate of occurrences (e.g., defects per hour). ✔ Formula : Example : A factory counts the number of defective items in 10 production batches. The Poisson rate ( λ) estimates the defect rate per batch. 15

Point Estimators Binomial Parameter ( ) ✔ Used when estimating the probability of success in a binary (yes/no) process . ✔ Formula : ✔ Example: A manufacturer inspects 100 chips and finds 5 defective . The estimated defect probability is   16

Estimating Standard Deviation 17 1. Using the Sample Standard Deviation (s) ✔ Formula: ✔ Correction factor c4c ensures unbiased estimation .

Estimating Standard Deviation 18 2. Using the Range Method ✔ When the sample size is small (n ≤ 6) , we estimate σ using the range R: ✔ Example: A technician measures the smallest & largest diameters in a batch. Uses the range to estimate standard deviation instead of calculating s

STATISTICAL INFERENCES

Estimating Standard Deviation 20 ✔ Statistical inference helps us make conclusions about a population based on sample data. ✔ Two main types: Parameter Estimation – Estimating population values using sample statistics. Hypothesis Testing – Checking if a claim about a population is true using sample data.

What is Hypothesis Testing 21 ✔ A hypothesis is a claim or assumption about a population parameter (e.g., mean, variance). ✔ There are two hypotheses in testing: Null Hypothesis (H0​) : The default assumption (e.g., "The machine produces screws with an average diameter of 1.5 cm"). Alternative Hypothesis (H1​) : A competing claim (e.g., "The screws are not 1.5 cm on average").

ERRORS IN HYPOTHESIS TESTING 22

STEPS IN HYPOTHESIS TESTING State the hypotheses ( H0 and H1​). Choose the significance level (α) (commonly 5% or 1 %). Select the test statistic (e.g., Z-test or t-test ) Compute the test statistic from sample data Compare with the critical value or p-value to make a decision : If the test statistic is beyond the critical value, reject H0​. If the p-value is smaller than α, reject H0 . 23

Z-TEST 24

EXAMPLE

LECTURE 7

Significan e Level The significance level ( α) is the probability of rejecting a true null hypothesis . It represents the risk of making a Type I error , which occurs when we conclude that there is an effect or difference when there isn’t one . α=0.05 (5%) → This means there is a 5% chance of rejecting H0​ when it is actually true. 27

Confidence Level If α = 0.05, the confidence level is 95 % . Confidence Level we are 95% confident that the true population parameter lies within our interval. Check the probability value of 0.949 from Z-table to get 1.645.   28

Confidence Interval An interval estimate of a parameter is the interval between two statistics that includes the true value of the parameter with some probability. For example, to construct an interval estimator of the mean ,we must find two statistics L and U such that: 29

Confidence Interval with Variance 30

Significan e Level Given that the standard deviation of response time is 8 milliseconds , and the sample mean response time is 79.25 milliseconds , find a 95% two-sided confidence interval for the population mean response time . Standard deviation = 8 Sample size = 25 31

P – value Hypothesis Test 32 The P-value is the smallest level of significance that would lead to rejection of the null hypothesis H0 The  p - value  shows the likelihood of your data occurring under the null hypothesis.  P - values  help determine statistical significance . Statistical significance refers to the likelihood that the observed effect in a study is not due to random chance If the p-value is less than or equal to the chosen significance level (α) (typically 0.05 or 0.01 ), the results are considered statistically significant .

P – Va lue 33

Inference on Mean, with Variance Unknown 34 For Confidence Interval:

INFERENCE ON VARIANCE Hypothesis testing: Chi – Square : 35

Confidence Interval 36

INFERENCE ON POPULATION PROPORTION 37

CONFIDENCE INTERVAL 38

TYPE II ERROR 39

STATISTICAL INFERENCE FOR TWO SAMPLES

DIFFERENCE IN MEAN, VARIANCE KNOWN 41 CONFIDENCE INTERVAL: TEST STATISTIC:

CONT 42

VARIANCES UNKNOWN 43

CONT 44 Degrees of Freedom ( df ):

CONFIDENCE INTERVAL 45 CASE 1: VARIANCE ARE EQUAL CASE 2: VARIANCE ARE NOT EQUAL

VARIANCE OF TWO NORMAL DISTRIBUTIONS 46

CONFIDENCE INTERVAL 47 Upper and Lower bounds respectively:

INFERENCES OF TWO POPULATION PROPORTION 48 Test Statistic: Confidence Interval:

HYPOTHESIS TESTING 49 An estimator for Parameter (p) :

ANALYSIS OF VARIANCE(ANOVA)

Analysis of Variance Analysis of Variance (ANOVA) is a statistical technique used to determine whether there are statistically significant differences between the means of three or more independent groups. It helps in testing if the variation among group means is due to actual differences or just random chance . 51 Used to compare means of three or more groups. Determines if observed differences are statistically significant. Helps in quality control and process optimization

EXAMPLE 52 A team of engineers responsible for the study decides to investigate four levels of hardwood concentration: 5%, 10%, 15%, and 20%, to improve tensile strength. Factor : Independent variable (e.g., hardwood concentration). Levels : Different values of the factor (e.g., 5%, 10%, 15%, 20%). Response Variable : Measurement outcome (e.g., tensile strength). Replicates : Multiple observations per level. H0 : No significant difference in mean tensile strength.

ANOVA 53 Can be described a statistical linear model: = Overall mean Effect of the treatment (difference from the overall mean) random error component (normally distributed)  

HYPOTHESIS NULL HYPOTHESIS: ALTERNATE HYPOTHESIS: H1 :   54

ANOVA 55 To measure Variability: To calculate SST:

F- Test 56 To Calculate F statistic:

LINEAR REGRESSION MODEL Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X) using a straight-line equation . Linear regression finds the best-fitting line by minimizing the differences (errors) between the observed data points and the predicted values 57

CONT For minimizing errors you find partial derivatives with respect to :   58

HYPOTHESIS TESTING 59 For Multiple Regression Model: https://www.youtube.com/watch?v=EFdlFoHI_0I

CONFIDENCE INTERVAL FOR REGRESSION 60

PREDICTION ACCURACY Prediction Error Sum of Squares: R – square: 61