Probabilistics and Biostatistics explore the mathematical foundations of uncertainty and chance. They apply probability theory to biological and health sciences, enabling researchers to analyze data, predict outcomes, and make informed decisions. These fields are essential for designing experiments,...
Probabilistics and Biostatistics explore the mathematical foundations of uncertainty and chance. They apply probability theory to biological and health sciences, enabling researchers to analyze data, predict outcomes, and make informed decisions. These fields are essential for designing experiments, interpreting results, and understanding patterns in medical and epidemiological research.
Size: 756.8 KB
Language: en
Added: Nov 01, 2025
Slides: 28 pages
Slide Content
Probability Concepts and Probability Distributions Lecturer: Waweru Nyamu
Pollster Sample Size Ruto (Poll) Raila (Poll) Fieldwork Dates Official Results Infotrak 2,261 47.4% 44.4% Aug 5-6, 2022 Ruto: 50.49% Raila: 48.85% TIFA Research 2,411 46.7% 44.4% Aug 5, 2022 Ruto: 50.49% Raila: 48.85% Ipsos 6,105 47% 41% Late July 2022 Ruto: 50.49% Raila: 48.85% Official Election 22,120,458 (Registered Voters) 50.49% (7,176,141 votes) 48.85% (6,942,930 votes) Aug 9, 2022 Ruto : 50.49% Raila : 48.85% Kenya 2022 Presidential Election: Polling Data vs Official Results
Metric Figure Total Registered Voters 22,120,458 Total Valid Votes Cast 14,232,685 Voter Turnout ≈ 65% Average Poll Sample Size ≈ 3,600 respondents Margin of Error (Polls) ± 2-3 percentage points Key Statistics Summary: This table shows how the relatively small sample sizes used in polling (2,261-6,105 respondents) were used to predict the outcome for the entire electorate of over 22 million registered voters, with the final results falling within the margin of error of most pre-election polls
Margins of Error for Final Pre-Election Polls How the Margin of Error is Calculated For a simple random sample at the 95% confidence level, the margin of error is approximately calculated as: MoE ≈ 1 / √n Where *n* is the sample size. Infotrak : 1 / √2261 ≈ 1 / 47.55 ≈ ±2.1% TIFA: 1 / √2411 ≈ 1 / 49.10 ≈ ±2.0% Ipsos : 1 / √6105 ≈ 1 / 78.14 ≈ ±1.3% Pollster Sample Size Margin of Error (Approx.) Confidence Level Infotrak 2,261 ±2.1% 95% TIFA Research 2,411 ±2.0% 95% Ipsos 6,105 ±1.3% 95%
What This Means in Context The margin of error creates a range (or confidence interval) around the poll's reported figure Infotrak : Ruto 47.4% ±2.1% = 45.3% - 49.5% Raila 44.4% ±2.1% = 42.3% - 46.5% TIFA: Ruto 46.7% ±2.0% = 44.7% - 48.7% Raila 44.4% ±2.0% = 42.4% - 46.4% Ipsos : Ruto 47% ±1.3% = 45.7% - 48.3% Raila 41% ±1.3% = 39.7% - 42.3% Key Takeaway: The official result for William Ruto ( 50.49% ) was just outside the upper bound of the confidence intervals for Infotrak and TIFA, and clearly outside for Ipsos Raila's official result ( 48.85% ) was significantly higher than the upper bounds of all the polls' ranges This indicates that while the polls correctly showed a very tight race, there was a systematic shift or a "late swing" towards Raila that the final polls did not fully capture, pushing the actual results beyond the predicted margins of error.
W hat might be reasons for all polls not predicting correctly Methodological and Technical Challenges The "Undecided Voter" Problem: A significant portion of respondents (often 8-12% in the final polls) were undecided or refused to answer. How these voters break in the final days is the greatest source of uncertainty. In 2022, it appears a large majority of them broke decisively for Raila Odinga in the final 48-72 hours, which the polls could not capture Sampling Bias: While pollsters try to create a perfect miniature of the electorate, it's incredibly difficult. Urban vs. Rural: Polls might over-represent easily accessible urban populations and under-sample voters in remote rural areas, who may have different voting patterns. "The Bradley Effect" / Social Desirability Bias: Some respondents may not have been truthful about their preferred candidate, perhaps due to perceived social pressure or fear of going against a perceived public mood. They might have said "undecided" or even named the candidate they thought was more popular. Turnout Modeling: A poll is a measure of intent among a sample of people. Predicting who will actually vote is a separate and critical step. Pollsters must weight their results based on their model of the electorate. If their turnout model was incorrect—for example, if they overestimated the youth turnout or underestimated the turnout in a specific candidate's stronghold—the final prediction will be off
W hat might be reasons for all polls not predicting correctly Political and Behavioral Factors Last-Minute Swing: The Kenyan election campaign was incredibly dynamic. The final week saw a massive mobilization of political machinery, emotive appeals, and potentially game-changing endorsements (like the influence of President Kenyatta campaigning for Raila ). Voters, particularly the undecided, can make up their minds at the very last minute, a shift that the final polls (conducted 2-4 days before the election) could not measure. Strategic Voting ("Anti-Agency" Voting): Some voters may have been expressing protest against the outgoing administration ("the system") by supporting Ruto , who positioned himself as the "hustler" and outsider. This sentiment can be difficult to gauge in a poll but can powerfully influence the final vote. The "Shy Voter" Phenomenon: Related to social desirability bias, this is when supporters of a particular candidate are less willing to disclose their preference to pollsters. In this context, some Ruto supporters, believing their candidate was fighting the political "system," might have been less forthcoming
W hat might be reasons for all polls not predicting correctly The Unique Kenyan Context The "Incumbency Factor" Without an Incumbent: For the first time, Kenya had no sitting president on the ballot. This created a highly unpredictable environment. The "state machinery" was perceived to be behind Raila , while the deputy president was the "insider-outsider." This complex dynamic was historically unprecedented and hard for polling models to capture. Poll Fatigue and Mistrust: In an environment where polls are sometimes contested, some respondents may be suspicious and refuse to participate or give dishonest answers, potentially skewing the sample. Logistical and Security Concerns: In some regions, conducting polls can be challenging due to security or accessibility issues, which can lead to the under-representation of certain communities. Conclusion It wasn't that the polls were "wrong" in a vacuum; they provided a snapshot of a race that was statistically tied within the margin of error a few days out. The ultimate result was driven by factors that are exceptionally difficult for any pollster to measure perfectly: The late break of undecided voters overwhelmingly towards Raila , which closed the gap. A turnout model that slightly favored Ruto's strongholds. The inability to fully capture the intensity of strategic voting and the full impact of the final campaign blitz. This combination of factors pushed the final results, particularly Raila's , just beyond the upper bounds of the polls' confidence intervals. It underscores that polls are indicators of a moment in time, not infallible predictors of the future
Introduction to Probability Probability is the likelihood or chance that a particular event will occur It is expressed as a number between 0 and 1 , or as a percentage (0% to 100%) Probability is a measure of the likelihood that a particular event will occur. It is expressed as a number between 0 and 1 (or 0% to 100%), where: = impossible event, (the event will not occur) 1 = certain event, (the event will certainly occur) Example in Health Context: The probability of a newborn being male is approximately 0.5 (50 %) The probability of developing malaria in an endemic area might be 0.3 (30 %) Importance of Probability in Health Research Helps quantify uncertainty in medical diagnoses and treatment outcomes Used in epidemiology to measure disease risk and association Forms the foundation for statistical inference and hypothesis testing Essential for understanding diagnostic tests (sensitivity, specificity, predictive values In health research and biostatistics, probability helps quantify uncertainty — that is, how likely an event (like disease occurrence or test result) is to happen Understanding probability and its distributions allows health professionals to make evidence-based predictions and decisions
Probability The probability of an event equals the number of times it happens divided by the number of opportunities These numbers can be determined by experiment or by knowledge of the system For instance, rolling a die (singular of dice) The chance of rolling a 2 is 1/6, because there is a 2 on one face and a total of 6 faces. So, assuming the die is balanced, a 2 will come up 1 time in 6 Probability is a measure of how likely it is for an event to happen We name a probability with a number from 0 to 1 If an event is certain to happen, then the probability of the event is 1 If an event is certain not to happen, then the probability of the event is 0 Probability: what is the chance that a given event will occur? For us, what is the chance that a child, or a family of children, will have a given phenotype? Probability is expressed in numbers between 0 and 1(Probability = 0 means the event never happens; probability = 1 means it always happens) The total probability of all possible event always sums to 1
Probability If it is uncertain whether or not an event will happen, then its probability is some fraction between 0 and 1 (or a fraction converted to a decimal number) Chance Chance is how likely it is that something will happen When a meteorologist states that the chance of rain is 50%, the meteorologist is saying that it is equally likely to rain or not to rain If the chance of rain rises to 80%, it is more likely to rain If the chance drops to 20%, then it may rain, but it probably will not rain
Rules of Probability The AND Rule of Probability The probability of 2 independent events both happening is the product of their individual probabilities Called the AND rule because “this event happens AND that event happens” For example, what is the probability of rolling a 2 on one die and a 2 on a second die For each event, the probability is 1/6, so the probability of both happening is 1/6 x 1/6 = 1/36 Note that the events have to be independent: they can’t affect each other’s probability of occurring An example of non-independence: you have a hat with a red ball and a green ball in it The probability of drawing out the red ball is 1/2, same as the chance of drawing a green ball However, once you draw the red ball out, the chance of getting another red ball is 0 and the chance of a green ball is one
Rules of Probability The OR Rule of Probability The probability that either one of 2 different events will occur is the sum of their separate probabilities For example, the chance of rolling either a 2 or a 3 on a die is 1/6 + 1/6 = 1/3 NOT Rule The chance of an event not happening is 1 minus the chance of it happening For example, the chance of not getting a 2 on a die is 1 - 1/6 = 5/6 This rule can be very useful Sometimes complicated problems are greatly simplified by examining them backwards
Rules of Probability Combining the Rules What is the chance of rolling 2 dice and getting a 2 and a 5? The trick is, there are 2 ways to accomplish this: a 2 on die A and a 5 on die B, or a 5 on die A and a 2 on die B Each possibility has a 1/36 chance of occurring, and you want either one or the other of the 2 events, so the final probabilty is 1/36 + 1/36 = 2/36 = 1/18
Probability Questions
Probability Questions Donald is rolling a number cube labeled 1 to 6. Which of the following is LEAST LIKELY? A. an even number B. an odd number C. a number greater than 5
Rules of Probability Rule Description Example (Health Context) 1. Addition Rule (Either/Or Events) If two events A and B are mutually exclusive: ( P(A { or } B) = P(A) + P(B) ) Probability that a patient has malaria or typhoid 2. Multiplication Rule (Joint Events) If two events A and B are independent : ( P(A and B) = P(A) * P(B ) ) Probability that a patient is male and HIV positive 3. Complement Rule ( P not A) = 1 - P(A) ) Probability that a person does not have hypertension = 1 – P(has hypertension ) 4. Conditional Probability Probability that A occurs given that B has occurred: ( P(A B) = \ frac {P(A \text{ and } B)}{P(B)} )
Probability Distributions A probability distribution shows how probabilities are distributed over possible values of a random variable A probability distribution describes how probabilities are distributed over the values of a random variable There are two types : Type Description Examples Discrete Takes on specific, countable values Binomial, Poisson Continuous Takes on any value within a range Normal distribution
Binomial Distribution Models the number of successes in a fixed number of independent trials, each with the same probability of success Assumption's There are two possible outcomes (success or failure) Each trial is independent (subsequent trials do not depend on the previous trials) Probability of success (p) is constant ( Constant probability of success ) The number of trials (n) is fixed ( Fixed number of trials ) Probability that exactly 5 out of 20 patients will respond to a new drug (if response rate is 40 %) Probability of exactly 3 babies being born with a birth defect in 100 births
Health Example: If probability of a newborn being male = 0.5, what’s the probability of getting exactly 3 males in 5 births? There’s a 31.25% chance of having exactly 3 male babies in 5 births
Poisson Distribution Models the number of events occurring in a fixed interval of time or space, given the average rate of occurrence Used for modeling rare events that occur: Randomly and independently At a constant average rate (λ) over time or space Assumptions: Events occur independently Average rate is constant Two events cannot occur at exactly the same time Number of hospital admissions per day in an emergency department Number of new malaria cases per week in a community
Normal Distribution (Gaussian Distribution) A symmetric, bell-shaped distribution that is completely described by its mean (μ) and standard deviation (σ ) A continuous, bell-shaped , symmetrical distribution that describes many biological and health variables Mean (μ), Median, and Mode are equal Shape depends on mean (μ) and standard deviation (σ) Total area under the curve = 1 (or 100%) Properties: Symmetrical around the mean Mean = median = mode About 68% of data lie within ±1 SD of the mean About 95% within ±2 SD About 99.7% within ±3 SD Health Example: If systolic blood pressure (BP) among adults is normally distributed with μ = 120 mmHg and σ = 10 mmHg: 68% of individuals have BP between 110–130 mmHg 95% between 100–140 mmHg This helps identify abnormal values (e.g., hypertension above 140 mmHg)
Bayes’ Theorem A rule used to update probability estimates when new information (evidence) is available A mathematical formula used to update the probability of a hypothesis based on new evidence Diagnostic testing: Updating the probability of disease given a test result Clinical decision making: Incorporating new symptoms or test results Epidemiology: Estimating disease risk based on exposure
Applications in Health Research Distribution Health Application Example Binomial - Clinical trial success rates - Genetic inheritance patterns - Vaccination effectiveness Poisson - Disease outbreak counts - Patient arrival rates in clinics - Adverse drug reaction counts Normal - Anthropometric measurements - Laboratory reference ranges - Quality control in lab tests Bayes' Theorem - Diagnostic test interpretation - Medical decision support systems - Risk prediction models
Applications in Health Research Probability Concept Health Application Binomial Distribution Estimating the probability of success/failure outcomes (e.g., positive vs negative test results) Poisson Distribution Modeling rare health events (e.g., maternal deaths, new infections, accidents) Normal Distribution Describing biological variables (BP, BMI, cholesterol levels) Bayes’ Theorem Calculating predictive values in screening and diagnostic tests Probability Rules Evaluating combined risks (e.g., co-infections, treatment outcomes)
Summary Table
Key Takeaways Probability quantifies uncertainty and guides decision-making in health Probability quantifies uncertainty in health outcomes Binomial → discrete, two outcomes (yes/no) Binomial distribution models yes/no outcomes in fixed trials Poisson → discrete, rare events over time/space Poisson distribution models rare events over time/space Normal → continuous, common biological variables Normal distribution describes many biological variables Bayes’ theorem links test results to true disease probability Bayes' Theorem updates probabilities with new evidence Mastering these helps interpret epidemiological , clinical , and laboratory data accurately These concepts form the foundation for statistical inference in medical research