Biostatistics and its application along with problems and solutions

RatulNath7 103 views 34 slides Sep 30, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

basics of biostatistics and some problems and solutions


Slide Content

Introduction to biostatistics

Definition of Biostatistics Biostatistics is the application of statistical methods to biological data including agriculture and medicine . It involves designing experiments, collecting and analyzing data, and interpreting the results. Descriptive and inferential statistics Descriptive statistics and inferential statistics are both methods used to analyze data, but they have different purposes:  Descriptive statistics Summarizes and describes the characteristics of a data set, such as mean, median, and standard deviation. It helps to organize, analyze, and present data in a meaningful way, and provides a quick overview of the sample data. Descriptive statistics state facts and proven outcomes from a population and there is no uncertainty.

Inferential statistics Analyzes sample data to make predictions, estimates, or other generalizations about a larger population. It involves using probability theory to infer characteristics of the population from which the sample was drawn. Inferential statistics can help to come into conclusions and make predictions based on the data, such as testing a hypothesis or assessing whether the data is generalizable to the broader population. However, because inferential statistics are based on samples and predictions, they can be more prone to errors. Two main types of errors that can occur are Type I Error, also known as a "false positive", and Type II Error, also known as a "false negative".

Father of Biostatistics – Sir Francis Galton Francis Galton (1822-1911) was an English explorer, geographer, anthropologist, and statistician who laid the foundations of eugenics and modern statistical methods that led him to be known as the Father of Biostatistics. He was the first to apply statistical concepts and techniques to solve biological problems related to heredity and inheritance. He studied the application of statistics to analyze biological variations using techniques like correlation and regression. Galton made pioneering contributions in establishing the use of statistics in biology.

Scope and utility of statistics in biology In taxonomy and systematics: Numerical taxonomy often uses numerical data for the classification of organisms. The chief advantage of numerical taxonomy is repeatability and objectivity. The variable characteristics of different organisms recorded are examined statistically to ascertain whether they belong to the same species. 2. In population studies: Estimating microbial populations in soil, water/air, and aquatic organisms is always difficult . Population of one or more species in a wider region is also impossible without statistics. Proper statistical methods can solve this problem easily. Similarly, the estimation of the biomass of an ecosystem can easily be done by the regression equation. 3. In ecology and environmental science: Various physicochemical and biological parameters are often tested for their interdependence. Correlation analysis between abiotic and biotic factors is a common ecological interest. The association of two groups or species is tested through a chi-square test.

4. In anatomy, physiology and biochemistry: Comparison between structural patterns of individuals belonging to the same or different population or within the population, and average height or length of specimens from different geographical locations are some of the instances where statistics can be used. Similarly, the normal rate of heartbeat, respiratory cycles, number of blood cells, etc. can be tested by using proper statistical methods. 5. In medicine and pharmacology To find out the relative potency of a new drug or to compare the action of two different drugs or the action of the same drug in different doses or different individuals can be tested by appropriate statistical tools. In medical science also, statistical methods are exclusively used. For instance, the association between two attributes like tobacco and oral cancer, blood transfusion and AIDS, mosquito and Japanese encephalitis, etc. High degree of correlation between the attributes confirms that one is associated with the other.

6. In cytology and genetics Various laws and hypotheses of cytology and genetics can be tested as to their correctness, by analyzing the observed data. Chi-square tests or binomial tests are usually carried out to test the Mendelian hypothesis. Statistics can be used in population genetics too. 7. In animal husbandry and aquaculture Statistical methods will be of great help in assessing the stocking number, food requirement, and assessment of growth of poultry and farm animals. The regression techniques are used in constructing yield equations in aminol husbandry and aquaculture. 8. In agriculture and forestry To study the efficiency of new varieties, statistical designs are formulated first in the form of blocks or plots. Analysis of variance and co-variance are used to determine the efficiency.

Limitations of statistics Although statistics is used in various disciplines it has its limitations also. For instance, 1. Statistics does not deal with a single item or few data. To establish something, the data should be adequate so that appropriate statistical techniques can be applied. Individual items or a few data can prove nothing. 2. Statistical results are true only on averages. They mostly deal with the average of any characteristics of any population. The individual items of the population may differ very much from the central value. 3. Statistics are the means and not a solution to the problem. It must not be assumed that the statistical method is the only method for use in research. It needs intelligent uses for intelligent conclusions. 4. Statistics does not reveal the entire story. In the absence of details, statistics may lead to the wrong conclusion. 5. It can not deal with qualitative information. 6. Statistics may be misused easily

Collection of data: Sources and Sampling methods The collection of data is the primary job of any statistical study. It is the primary requisite of any research work. Main sources of biological data are- a. field surveys b. laboratory experiments c. official and scientific publications Data collected for the first time which are original and collected by the researchers themselves called Primary data The data collected from various institutions, scientific organizations, and governmental sources are called Secondary data the difference between these two types of data is only of their pattern of usage-the same primary data collected by an individual or organization become secondary when used by others.

Generally, primary data are collected through one of the following ways- Personal observation Data collection by trained persons Data collection by post

Population and Sample Most of the scientific work is carried out by drawing a small sample from a large population By population, it refers to all individuals of a given species or target individuals of a specified area about which inferences are to be made. In other words, it is the group of items or individuals under study. The entire population is generally too large to take into consideration, hence, a section of the population called a sample, is used for actual observation. The conclusion drawn from the sample study, however, is applicable for the entire population.

Sampling The methodology for collecting samples is called sampling The goal of sampling is to get information about the population parameters So, care should be taken that the sample selected for the study is representative of the whole population If the sampling is proper, the conclusion about the population is accurate. Hence the primary objective of sampling is to collect maximum information about the population Salient features of sampling Sampling is based on two principles- a. the law of statistical regularity. Which states that any set of items taken from a large group will tend to possess the same characteristic features of the large group b. the law of inertia of large numbers which reveals that the larger of the size of the sample, the more it will be informative

However, the mere size of the sample does not indicate that it is a good sample A good sampling system is characterized by a frequency distribution with small variance and a mean estimate about the same as the population value However, even if utmost care has been taken during sampling, the results may have errors-called sampling errors Sampling errors are of two types- Biased error Unbiased error

Sampling methods The various sampling methods can be grouped into two heads 1. non-random sampling/selective sampling 2. random sampling A random sample is a sample selected in such a way that every item in the population has an equal chance of occurrence Random sampling is again divided into two types 1. simple random sampling 2. restricted random sampling In simple random sampling, the individuals of a population from which a sample is to be taken, are allotted numbers called random numbers Now, the desired number of samples can be selected by lottery method or can be determined by tables like Fisher and Yates, Rand Corporation’s random numbers etc.

The restricted random sampling is also divided into two types 1. stratified sampling : if the population is heterogenous, the entire population is first divided into several relatively homogenous sections (strata) and the sampling each section separately. The estimates of all the strata can be combined and that will give the estimate of the whole population. 2. Multistage or sub sampling: when the composition of a population is such that the time taken for measuring even a small sample is considerably high, then sampling can be done in two or more stages. In the first stage, the population is divided into number of subpopulations (primary units), and from these sub populations, a sample is taken. In the next stage, from these primary units, a secondary or subsamples of individuals is taken.

MEASUREs OF CENTRAL TENDENCY After collecting and presenting data in tabular or graphical forms, the next step is to obtain some form of summary of observed value. Measures of central tendency is a typical representative of a set of data. It is commonly a single number called the average. The average is typical of the whole group describing some features or characteristics of entire set of data. Central tendency or averages are broadly of two types Mathematical averages Locational averages Mathematical averages can again be divided into the following sub-groups Arithmetic mean Weighted mean Geometric mean Harmonic mean Quadric mean

Arithmetic mean Arithmetic mean, also called simply as mean, is based on all observations in the sample. It is least affected by the sampling fluctuations and as such it is a stable average. The mean is the center of gravity balancing the values on either side of it For individual series: the mean is simply a ratio between sum of the sample observations and the sample size. The formula for calculating mean of an individual series is x̄= ∑X/n Here, x̄ = mean X=values of samples n= number of values in the sample

Illustration Calculate the mean value of CO 2 concentration in ppm recorded from a domestic pond at a monthly interval 3.75, 5.8, 11.2, 7.1, 6.3, 4.4, 6.8, 8.1, 9.2, 6.2, 5.5, and 4.0 Solution: Here, ∑X= (3.75+ 5.8+ 11.2+ 7.1+ 6.3+ 4.4+ 6.8+ 8.1+ 9.2 + 6.2+ 5.5+4.0)=78.3 n= 12 Hence, Mean, x̄ = ∑X/n = 78.3/12 = 6.52

In discrete series , the calculation of the mean is different. This is because frequencies have to be taken into consideration while computing mean. For practical purpose, the mean of a discrete series can be calculated x̄= ∑ fX /n Here, x̄ = mean ∑ fX =summation of the products of the variables with their respective frequencies n= sample size or total frequencies ( ∑f ) Illustration Find the mean number of beans per plant given the following frequencies of occurrence No. of beans per plant 2 3 4 5 6 7 8 9 No. of plants 21 26 16 15 15 9 4 8

Solution: No. of beans per plant (X) No. of plants (f) fx 2 22 44 3 24 72 4 16 64 5 15 75 6 11 66 7 7 49 8 2 16 9 5 45 ∑f =n=102 ∑ fx =431

Applying the formula x̄=∑ fX /n ∑f =n=102 ∑ fx =431 x̄= 431/102=4.23 Hence, the average number of beans per plant is 4.23

In the case of continuous series where the frequencies are grouped into classes the midpoint of classes are taken into account computing the mean using following formula x ̄=∑mf/n m= midpoint of the class n= ∑f Illustration Compute the mean number of ectoparasites per leaf from the following table No. of ectoparasites 0-5 5-10 10-15 15-20 20-25 25-30 30-35 No. of leaf observed 17 8 20 21 14 16 9

Solution: Let us first find the midpoints (m) of each class followed by the product of the midpoints and their respective frequencies (f) No. of ectoparasites (x) No. of leaf observed (f) Midpoint of the class (m) mf 0-5 17 2.5 42.5 5-10 8 7.5 60.0 10-15 20 12.5 250.0 15-20 21 17.5 367.5 20-25 14 22.5 315.0 25-30 16 27.5 440.0 30-35 9 32.5 292.5 n= ∑f=105 ∑mf=1767.5

Applying the formula we have x ̄=∑mf/n m= midpoint of the class n= ∑f x ̄= 1765.5/105 =16.81 Hence the mean number of ectoparasites per leaf is 16.82 = 17

MEDIAN Median is defined as the locational average which divides a series into exactly two equal halves. Unlike the arithmetic mean it is not the center of gravity but that value which is found exactly in the middle of the series when arranged either in ascending or descending order. characteristics of median It is simple to understand, easy to calculate, sometimes by simple inspection. it eliminates the effect of extreme items and can be computed even if the extreme values of the series are unknown. Median usually lies in the distribution and as such it can be determined graphically. Median can be estimated even for qualitative phenomena. Median is not always rigidly defined. When the series contains even number of obsercations , the median is calculated by taking simple mean of the two middle scores. Median is not always representative of the series. If the series contains only a few numbers and widely scattered values such as 1, 3, 45, 799, 5887-the median of the series is 45, which can not be taken as the representative of the series.

For individual series Illustration The following are the lengths (cm) of 20 fish specimens. Calculate the median size of the data 10, 16, 28, 15, 30, 17, 24, 34, 18, 21, 11, 15, 42, 37, 19, 16, 14, 22, 28, and 12 Solution After arranging the data in ascending order 10, 11, 12, 14, 15, 15, 16, 16, 17, 18, 19, 21, 22, 24, 28, 28, 30, 34, 37, 42 M= (n+1)/2 th item=(20+1)/2 th item=10.5 th item Size of 10.5 th item=(10 th item+11 th item)/2 = (18+19)/2 = 18.5

Discrete series Illustration: Calculate the median rice yield from the following data Rice yield (kg/ha) 1500 2000 2500 3000 3500 4000 No. of villages 8 6 9 6 5 2 Solution: Rice yield (kg/ha) (X) 1500 2000 2500 3000 3500 4000 No. of villages (f) 8 6 9 6 5 2 Cumulative frequency ( cf ) 8 14 23 29 34 36 Applying the formula M=(n+1)/2th item = (36+1)/2 =18.5 th item Size of 18.5 th item = 2500 kg/ha

Continuous series In case of continuous series, The cumulative ( cf ) frequency of each class is worked out and then the median size ( cf /2) is determined. the exact value of the median class is then determined by applying the following formula M where., L= lower limit of the class in which the median value lies n= total frequency (∑f) cf =cumulative frequency of the class preceding the median class f= simple frequency of the median class i - class interval  

Illustration The production of onion is recorded in 160 farms and the frequency distribution is depicted below. Calculate the median of this distribution Yield of onion (in t) 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 No. of farms 19 22 27 36 28 16 9 3

Solution Yield of onion (in t) (X) No. of farms (f) Cumulative frequency ( cf ) 0-1 19 19 1-2 22 41 2-3 27 68 3-4 36 104 4-5 28 132 5-6 16 148 6-7 9 157 7-8 3 160 n (∑f)=160 Median= size of n/2 = 160/2 = 80 th item Hence, the median lies in the class 3-4

Applying the formula M where., L= lower limit of the class in which the median value lies =3 n= total frequency (∑f) =160 cf =cumulative frequency of the class preceding the median class = 68 f= simple frequency of the median class =36 i - class interval =1 M =3+0.33 =3.33 t  

Mode Mode is defined as the value of the variable which occurs most frequently in a distribution. Like median, it is a locational average representing as the most typical value of the distribution owing to the fact that modal value occurs most often in a set of data. Geometrically, it is point around which most of the observations are concentrated Gaphically it is the peak of the frequency distribution The mode may be used for describing qualitative data. For individual series Mode (Z) = 3 median- 2 mean

The leaf diameters (in mm) of floating aquatic vegetations sampled from 20 different water bodies are recorded as follows 7.7, 7.4, 7.5, 7.7, 7.8, 7.6, 7.4, 7.8, 7.4, 7.6, 7.3, 7.6, 7.7, 7.7, 7.5, 7.3, 7.4, 7.5, 7.3 and 7.3 Solution: Variables (X) Frequency (f) Cumulative frequency (Cf) 7.3 4 4 7.4 4 8 7.5 3 11 7.6 3 14 7.7 4 18 7.8 2 20

Z= 3 M- 2 x̄ Z= 3(7.5)-2(7.525) =22.5-15.05 =7.45 Hence the mode distribution is 7.45 mm
Tags