AbdullahHassan885167
0 views
53 slides
Oct 10, 2025
Slide 1 of 53
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
About This Presentation
wefwqefw
Size: 2.28 MB
Language: en
Added: Oct 10, 2025
Slides: 53 pages
Slide Content
Chapter 3
DESCRIPTIVE STATISTICS
LEAF AND STEM DIAGRAM A Stem-and-Leaf Diagram (or Stem Plot ) is a visual representation of numerical data that helps organize and display data in a way that preserves the original values while showing the distribution. It is particularly useful for small-to-moderate-sized datasets and provides insights similar to histograms but with actual data points retained. How It Works: Each number is split into two parts : Stem: The leading digits (e.g., tens place). Leaf: The last digit (e.g., one place). The stems are listed in order in a column, and the leaves are written beside their corresponding stems. 3
percentiles If your SAT score is at the 85th percentile , you scored higher than 85% of test-takers . 6
HISTOGRAM A histogram is a graphical representation of data distribution that uses bars to show how frequently different values occur in a dataset. Unlike a bar chart, a histogram represents continuous data grouped into intervals (bins) . METHOD: Group values of the variable into bins, then count the number of observations that fall into each bin Plot frequency (or relative frequency) versus the values of the variable 7
HISTOGRAM Can figure out proportion of defects from histogram 8
Summary of data MEAN MEDIAN MODE PERCENTILES OUTLIERS 9
Standard deviation and variance 10
BOX AND WHISKER PLOT A Box and Whisker Plot (Box Plot) is a statistical graph that summarizes a dataset using five key values : Minimum (Smallest Value) First Quartile (Q1 – 25th Percentile) Median (Q2 – 50th Percentile) Third Quartile (Q3 – 75th Percentile) Maximum (Largest Value) 11
COMPARATIVE BOX PLOTS 12
Probability distributions A probability distribution describes how the values of a random variable are distributed. It provides the probability of each possible outcome in a dataset. 13 Probability Distribution Continuous Discrete
Probability distribution Discrete probability: Continuous Probability: 14
DISCRETE PROBABILITY DISTRIBUTION 15 Distribution Definition Key Characteristics Example Use Cases Bernoulli Distribution Models a single trial with two possible outcomes: Success (1) or Failure (0). - One trial only - Two possible outcomes (Binary) - pp = probability of success - Coin flip (Heads/Tails) - Defective vs. Non-defective product - A customer purchasing (Yes/No) Binomial Distribution Models the number of successes in n independent Bernoulli trials. - Fixed number of trials n - Each trial is independent - Probability of success p is constant - Number of defective items in a batch of 50 - Number of students passing an exam - Number of customers making a purchase Poisson Distribution Models the probability of k events occurring in a fixed time/space interval, given a known average rate λ\lambda. - Counts occurrences over time or space - Mean λ\lambda is also the variance - Events occur randomly and independently - Number of customer arrivals per hour - Defects in a 1-meter fabric roll - Number of calls received at a call center per minute
Discrete probability distribution Example: A semiconductor manufacturing process produces thousands of chips per day . 1% of the chips do not conform to specifications ( p = 0.01 ). Every hour, an inspector selects a random sample of 25 chips . Each chip is either conforming (good) or nonconforming (defective) . The goal is to find the probability of observing 0 or 1 defective chips in the sample. 16
STEP 1 17
STEP 2 18
STEP 3 19
UNIFORM DISTRIBUTION 20
CALCULATION 21
Mean of probability distribution Mean for discrete probability: Mean for Continuous Probability: 22
Variance of probability distributions Variance of Discrete Probability Distribution: Variance for continuous Probability Distribution: 23
Summary ✅ Mean (μ) represents the center of mass of the distribution. ✅ Variance (σ²) and Standard Deviation (σ) measure how spread out the values are. ✅ Higher variance means more dispersion , affecting decision-making in quality control, business, and finance . 24
HYPERGEOMETRIC DISTRIBUTION The Hypergeometric Distribution models situations where we randomly select a sample without replacement from a finite population , and we are interested in the number of successes (items of interest) in that sample. 25
HYPERGEOMETRIC PROBABILITY 26
Key points 27 ✔ Sampling Without Replacement: Each draw affects the next one. ✔ Finite Population: Unlike the binomial distribution , where the population is considered infinite , here we sample from a fixed N . ✔ No Independence: The probability of selecting an item changes as selections are made.
example 28
STEP 2 29
POISSON DISTRIBUTION The Poisson distribution is a discrete probability distribution used to model the probability of a certain number of events occurring in a fixed interval of time, space, volume, or area . It is particularly useful when events occur randomly, independently, and at a constant average rate . A hospital receives an average of 4 emergency patients per hour . Assuming that patient arrivals follow a Poisson distribution , what is the probability that exactly 3 patients arrive in an hour? https://www.youtube.com/watch?v=3z-M6sbGIZ0 30
POISSON DISTRIBUTIONS 31
EXAMPLE A call center receives an average of 3 calls per minute . Assuming the number of calls follows a Poisson distribution , find the probability that exactly 5 calls are received in a given minute. 32
CONT 33
NEGATIVE BINOMIAL DISTRIBUTION The Negative Binomial Distribution models the number of Bernoulli trials needed to achieve a fixed number of successes . It differs from the Binomial Distribution , where the number of trials is fixed and we count the number of successes . 34
NEGATIVE BINOMIAL DISTRIBUTION 35
Continuous probabilities
NORMAL DISTRIBUTION 37
KEY PROPERTIES 38 Symmetry : The curve is perfectly balanced around the mean (μ). That means the left and right sides mirror each other. Unimodal : There's only one peak, and it occurs at the mean. This is where most values cluster. Total Area = 1 : The entire area under the curve represents all possible outcomes, so it adds up to 1 (or 100% probability). Empirical Rule (68-95-99.7 Rule) : 68% of data falls within 1 standard deviation (σ) of the mean. 95% falls within 2σ . 99.7% falls within 3σ .
STANDARD NORMAL DISTRIBUTION 39 To convert any value from a normal distribution into the standard normal form, use: Where: X is the value you're analyzing μ is the mean of the dataset σ is the standard deviation Z tells you how many standard deviations X is from the mean
CENTRAL LIMIT THEOREM 40 The Central Limit Theorem says that if you take many random samples from any population, the average of those samples will follow a normal distribution. This is true even if the original population is not normally distributed . As the number of samples increases, the distribution of the sample means becomes more bell-shaped and centered around the population mean. The spread of this distribution depends on the standard deviation of the population and the size of the sample .
LOGNORMAL DISTRIBUTION 41 A lognormal distribution is a type of probability distribution where the logarithm of the variable follows a normal distribution. In other words, if you take the natural log of a lognormally distributed variable, the result will be normally distributed. This means: The original variable is always positive . It is skewed to the right , meaning it has a long tail on the higher end. It’s useful for modeling things like income, stock prices, and medical imaging intensities , where values can't be negative and tend to cluster low but occasionally spike high
LOGNORMAL DISTRIBUTION 42 If a variable X follows a lognormal distribution, then : Where : W is a normally distributed variable with mean θ and variance ω² Taking the natural logarithm of X gives :
PROBABILITY DENSITY FUNCTION 43
Exponential distribution The Exponential Distribution is a continuous probability distribution used to model the time between independent events occurring at a constant rate . It is commonly used in reliability analysis, queuing theory, and failure modeling. Its opposite of poisson . It is used to measure waiting times between random events that occur independently and at a constant average rate . It is memoryless , meaning that past occurrences do not affect future probabilities . Examples: ✔ Time until a machine fails (reliability analysis). ✔ Time between customer arrivals at a service center. ✔ Time between earthquakes occurring in a region. 44
EXPONENTIAL DISTRIBUTION 45
Gamma distribution The Gamma Distribution is a continuous probability distribution that models the time required for multiple events to occur . It is an extension of the Exponential Distribution , which models the time until a single event. The Gamma Distribution applies when: ✔ Modeling waiting times until multiple events occur (e.g., time until 5 machine failures). ✔ Modeling sums of exponential variables (e.g., total repair time for multiple parts). ✔ Predicting waiting times in queueing systems. 46
GAMMA DISTRIBUTION 47
WEILBULL DISTRIBUTION 48 The Weibull Distribution is a continuous probability distribution commonly used in reliability engineering, failure analysis, and survival studies . It models the time to failure of a product, system, or component and is flexible enough to represent different types of failure rates . Key Concept: It describes how failure rates change over time. By adjusting its shape parameter β , the distribution can model increasing, constant, or decreasing failure rates .
CONT 49
PARAMETERS 50
CONT 51
COMPARISON 52 Distribution What it Models Exponential Distribution Time until the first event occurs Gamma Distribution Time until the k- th event occurs Poisson Distribution Number of events in a fixed time period Weibull Distribution Time until failure, but with flexible failure rates
53 Distribution Definition Example Use Case Normal Distribution A symmetric, bell-shaped distribution where values cluster around the mean. Used when data has natural variations. Product weights in manufacturing: If a machine produces bolts with an average weight of 5g, most bolts will weigh close to 5g, with a few slightly heavier or lighter. Lognormal Distribution A distribution where the logarithm of the variable follows a normal distribution. Models multiplicative processes. Stock prices: If a stock price starts at $100, its future price may multiply rather than add, making it better modeled by a lognormal distribution. Exponential Distribution Used to model the time between independent events happening at a constant rate. Machine failures: If a machine has an average failure rate of once every 10 hours, the time between failures follows an exponential distribution. Gamma Distribution Extends the exponential distribution by modeling the time required for multiple events to occur. Call center wait times: The time until 3 customers arrive at a service desk follows a gamma distribution. Weibull Distribution A flexible distribution used in reliability analysis to model time to failure. Can represent different failure patterns (early failures, constant failure rate, wear-out failures). Lifespan of car tires: Some fail early due to manufacturing defects, while others last longer, making the Weibull distribution useful for predicting failures over time.