2 Probability distributions Is summary of the probabilities of occurrence of the different levels of a random variable. A random variable is a variable in a study where subjects have been selected randomly A variable is a characteristic that varies from subject to subject
3 Probability distributions 2 If had heights (X) of 10 women selected randomly: 152, 160, 165, 158, 155, 153, 168, 165, 163, 156. The height (X) is a random variable. Can determine probability that it has any given value or range of values. E.g. probability that X is less than 160cm = 5 / 10. or probability that height is 155cm = 1 / 10
4 Probability distributions 3 To create probability distributions the determined probabilities can be plotted against characteristic (X) Can use theoretical distribution to fit distribution of variable of interest. Examples of theoretical distributions are: Binomial & Poisson distributions – deal with discrete random variables (take on only integers) - Normal (Gaussian) – deals with continuous random variables
Binomial distribution Is theoretical distribution that is applicable to events that have binary outcomes Two possible outcomes denoted as A and B P(A) = Π P(B) = 1- Π Probability stays same each time event occurs Outcome is independent from one trial to another 5
Binomial distribution (2) Gives probability that specified outcome occurs in a given number of independent trials If an experiment involving this event is repeated n times and the outcome is independent from one trial to another, what is the probability that outcome A occurs exactly X times? Or equivalently, what proportion of the n outcomes will be A? 6
Binomial distribution (3) Assume population of men with localized prostate tumor and pretreatment PSA < 10 studied, & probability of 5-year survival = 0.8. S represent event of 5-year survival; π = P(S) = 0.8 D represent death before 5 years; 1 – π = P(D)= 0.2 Consider group of n = 2 men with a localized prostate tumor and pretreatment PSA < 10. 7
Binomial distribution (4) What is probability that exactly two men live 5 years? P(Survival for patient 1 and survival patient 2) Apply multiplicative rule since the events are independent survival of one patient does not affect survival of another P(S1 and S2) = P(S) X P(S) = 0.8 x 0.8 = 0.64 8
Binomial distribution (5) What is probability that exactly one man lives 5 years? P(S1 and D2) or P(D1 and S2) Both multiplicative rule and addition rule apply P(S)X(PD) + P(D)XP(S) = (0.8x0.2)+(0.2x0.8) =0.32 What is probability that none lives 5 years? P(D1 and D2) = P(D) x P(D) = 0.2 x 0.2 = 0.04 9
Binomial distribution (6) The process could be repeated for any number of trials or any number of patients but to simply process can use the binomial formula ! Refers to factorial Example: 8! = 8x7x6x5x4x3x2x1 10 n combination x Gives the number of times that particular combination of events can occur out of n independent trials [ nCx ]
Binomial distribution (6) Suppose 10 men with prostate cancer are chosen The probability that all 10 men survive n=10, X=10 Probability that exactly 10 men survive? Probability that at least 2 men survive? P(X≥2) = P(2)+P(3)……+P(10) = 1- P(X<2) = 1-[P(1) + P(0)] = 0.999995801 Probability that at most 3 men survive? P(X≤3) = P(0)+P(1)+P(2)+P(3) = 0.00086436 11
Binomial distribution (7) The mean of a binomial distribution is nπ Thus for above distribution of prostate cancer patients with n=10 mean = 10x0.8 = 8 The standard deviation is square root of nπ (1-π) (10x0.8x0.2)^0.5=1.265 Thus the parameters of binomial distribution are n and π because they are the only two parameters needed to completely describe the binomial distribution 12
Binomial distribution (8) Studies of binary variables often report proportions rather than the number of subjects with a certain characteristic Even in such scenarios n and π are the parameters required Since a proportion is got by dividing X by n the mean of a proportion becomes π The standard deviation is 13
Poisson distribution Is a discrete distribution when outcome is number of times an event occurs Used to determine the probability of rare events Gives probability that outcome occurs specified number of times when the number of trials is large and the probability of any one occurrence is small used to plan number of beds needed in ICU of a hospital number of ambulances needed on call model number of cells in a given volume of fluid number of bacterial colonies growing in a certain amount of medium, or the emission of radioactive particles from a specified amount of radioactive material 14
Poisson distribution (2) Consider random variable representing number of times event occurs in given time or space interval. probability of exactly X occurrences is given by the formula λ is value of both the mean and the variance and it is the only parameter of the Poisson distribution 15
Poisson distribution (3) Consider random variable representing number of times event occurs in given time or space interval. probability of exactly X occurrences is given by the formula λ is value of both the mean and the variance and it is the only parameter of the Poisson distribution 16
Poisson distribution (4) Suppose hospitalizations for patients with medical and surgical treatment follows Poisson distribution Poisson is applicable because the chance that patient goes into hospital is small Can be assumed to be independent from patient to patient After 11 years, the 390 patients randomized to medical group were admitted total 1256 times (mean = ?) Patients randomized to surgical group were admitted 1487 times (mean = ?) 17
Poisson distribution (5) Probability of exactly 0 hospitalizations in medical group ? λ = 3.22 P(0) = 0.03996 Probability of exactly 2 hospitalizations in medical group? 18
19 Normal distribution Distribution for continuous random variables Is smooth bell shaped curve, symmetrical about mean of the distribution ( μ ). Standard deviation of distribution is symbolized by σ . Since it is a probability distribution, the area under the curve is = 1. Half area is on left of the mean, half on right. It ranges from -∞ to +∞
20 Normal distribution Approximately 68% of the data lies within 1 SD of the mean Approximately 95% of the data lies within 2 SD of the mean Approximately 99.7% of the data lies within 3 SD of the mean This is known as the empirical rule .
21 Normal distribution …continued Can be transformed to standard normal distribution (z) which has mean = 0, standard deviation = 1. Transformation is made by: z = X is the observation that you wish to transform µ is the mean of the characteristic in the population σ is standard deviation of characteristic in population
22 Normal distribution …continued Example: Supposing SBP in normal healthy individuals is normally distributed with μ = 120 mmHg, σ = 10mmHg. To get proportion of subjects with BP above 130 mmHg Z = Draw the distribution to see which area is of interest Read off the values from the z-table attached at end of these slides
23 Normal distribution …continued Proportion with BP between 110 & 130 mmHg What proportion of subjects have BP less than 100 or above 140mmHg? What proportion have BP less than 90mmHg or above 140mmHg?
24 Normal distribution …continued Proportion of subjects with BP above 133? Value that divides distribution into lower 2.5 and upper 97.5% What BP would divide the distribution into: lower 5% and upper 95%? Upper 10% and lower 90%
25 Normal distribution …continued What BP would divide the distribution into: lower 20% and upper 80%? The specific z-value is not appearing in the table therefore calculate using the closest values to get it Look at area in one tail and you see there is area of 0.212 and area of 0.198 between which we shall find the 0.2 corresponding to question of interest
Finding z-value when missing from table Z area 0.80 0.212 ? 0.200 0.85 0.198 Therefore difference of 0.02 corresponds to (0.002/0.014)*0.05 = 0.007 0.85-0.007 = 0.843 Difference in area 0.212-0.198 = 0.014 corresponds to difference of (0.85-0.80) 0.05 in z Therefore difference of 0.012 corresponds to (0.012/0.014)*0.05 = 0.043 Z corresponding to 0.2 = 0.8+0.043=0.843 26