Statistics
Diagram
Graph
Histogram
Frequency Polygon
Ogive
Pictogram
Box Plot
Frequency Distribution
Central Tendency
Arithmetic Mean
Median
Mode
Harmonic Mean
Geometric Mean
Am >= Gm >= Hm
Symmetrical...
Statistics
Diagram
Graph
Histogram
Frequency Polygon
Ogive
Pictogram
Box Plot
Frequency Distribution
Central Tendency
Arithmetic Mean
Median
Mode
Harmonic Mean
Geometric Mean
Am >= Gm >= Hm
Symmetrical Distribution
Skewed Distribution
Dispersion
Range
Standard Deviation
Variance
Coefficient Of Variation
Mean Deviation
Quartile Deviation
Skewness
Kerl Perason’s Skewness
Probability
Bionomial
Poisson Distribution
Normal Distribution
Normal Curve
Inflection Point
Test Of Hypothesis
Null Hypothesis
Alternate Hypothesis
Type I Type Ii Error
Level Of Significance
Critical Value One Tailed Test Two Tailed
Test Of Significance
T Test
Chi Square Test
Anova / F Test
Z Test
Z Score & Fisher Z :
P Value
Error
Standard Error
Sampling Error
Experimental Design
Crd (Completely Randomized Design)
Edf(Error Degree Of Freedom)
Rbd (Randomized Block Design)
Lsd (Latent Square Design) :
Spd (Split Plot Design)
Correlation Regression :
Correlation :
Regression :
Correlation Coefficient (Pearson R)
Probability Method
Non-Probability Method
Size: 1.24 MB
Language: en
Added: Oct 06, 2025
Slides: 9 pages
Slide Content
STATISTICS
A specialized branch of mathematics - RA Fisher
Statistics - both singular and plural
it is QUANTITATIVE data , it may FINITE and INFINITE
DIAGRAM
simple bar: single character , multiple bar: multiple (ONE dimension)
Component bar : bar height depends on TOTAL
Percentage : bar height SAME for all
Pie chart : component of factor = SECTOR, alternative STEP BAR diagram
Bar : base v – bar h , column : base h – bar v
GRAPH : Graphical representations for grouped quantitative data
HISTOGRAM:
classified based on the class intervals
suitable for calculating MODE
EQUAL class interval if not then bar height proportion to frequency DENSITY
No gap between bars due to CONTINUOUS class
Bar height = Corresponding frequency of respective class
FREQUENCY POLYGON : dots against the mid-points connected by STRAIGHT line
FREQUENCY CURVE : dots against the mid-points connected by SMOOTH/FREE HAND line
OGIVE / cumulative frequency curve (value v/s cumulative frequency)
Less than ogive : plotted against upper boundary of class interval
More than ogive : plotted against lower boundary of class interval
2 type ogive intersect at MEDIAN
Can be calculated GRAPHICALLY PARTITION, Median, Decile, Percentile
PICTOGRAM: Non dimension, less accurate, used by DILETTANTE, data in COUNT, PICTURE
BOX PLOT: Multiple group continuously, handle SKEWED data well, Identify OUTLIER
FREQUENCY DISTRIBUTION
Frequency of a variable is always INTEGER
Frequency Distribution can be both CONTINUOUS and DISCREATE
Individual series : DISCRETE series, each variant frequency is 1
Open end distribution: UNCERTEIN first and last class
Simple frequency distribution : All distinct value with their frequency
Group frequency distribution : All value in their CLASSES with their FREQUENCY
Continuous variable : any number , discrete: only INTEGER , VARIET : single observation
TABLE :
simple table : one factor/variable , Complex : 2 or more
first/heading column : STUB , first row? / column heading : CAPTION
CENTRAL TENDENCY
ARITHMETIC MEAN :
most common, BEST, Rigidly defines, based on all observation
not based on position, works even data lack, affected least by fluctuations
Can’t calculate qualitative data and open-end data, MOST affected by extreme value
MEDIAN :
Middle most, QUALITATIVE data (example : Intelligence, ability)
Not affected by extreme value , positional average, open-end series , datalack work
in case of even number item or continuous series result out of series,
Slight change = drastic change , use only in MEAN DEVIATION , not take all observation
MODE :
Most/Max frequent (CONCENTRATED), Qualitative (but less than median), positional measure
Not effected by extreme value , Large number value = observation of maximum frequency
Example : shoe/garment size , meteorological forecasting
HARMONIC MEAN :
Reciprocal of A.M OF Reciprocal of values ( example : average speed, distance , rate)
rigidly defined on all observations, amenable to further algebraic treatment.
Most suitable for HIGHLY VARIABLE series and when greater weight to smaller observations
Avarage speed : for same distance = 2AB/(A+B) , for same time (A+B)/2
GEOMETRIC MEAN :
best when data is RATIO or PERCENTAGE ; Example : Bacterial growth , cell division
MISCELLANEOUS:
Add/sub/mul/dev by any value with all the value of series will change same in mean
Quadratic mean : for negative value ; QM > AM
Most UNSTABLE is Geometric Mean
Normally : AM >= GM >= HM but for SAME OBSERVATION: AM = GM = HM
Median=Middle value=50
th
percentile = 2
nd
quartile= 5
th
decile
Symmetrical distribution : Mean = Median = Mode
Skewed distribution : Mean – Mode = 3 (Mean - Median)
DISPERSION
dispersion : scatternets or variation of observations from their average
RANGE :
Used in quality control, weather forecasts, share price analysis
STANDARD DEVIATION :
positive square-root of the arithmetic mean of the Square of the deviations of the given observation
from their arithmetic mean
basis for measuring the COEFFICIENT OF CORRELATION and sampling ,
Have characteristics of MEAN, possible further algebraic treatment,
have same UNIT of original, can’t use for COMPARISON
VARIANCE :
variance = (SD)^2 , if all value same then variance is 0
Average of sum of square of deviation, Unit is different of original
COEFFICIENT OF VARIATION :
C.V = (SD/Mean)x100 , a RELATIVE measure of dispersion
More C.V. = more variable, less stable, less homogeneous.
MEAN DEVIATION :
MD is minimum at MEDIAN, Take all observations
Sum sq of deviation minimum when taken from MEAN
Ignore sign of deviation in central tendency
QUARTILE DEVIATION :
(Q3-Q1)/2, Positional, Coefficient = (Q3-Q1)/(Q3+Q1) , Only Can calculate OPEN-END
SKEWNESS :
Lack of symmetry of tails in FD (Frequency DIstribution) curve
Negative : u3 < 0, LEFT tail more elongated, Mean < median < Mode comes to LEFT
POSITIVE : RIGHT tailed elongated , Mean > median > Mode comes to RIGHT
Kerl perason’s Skewness = (Mean - Mode) / SD
MISCELLANEOUS
4SD=5MD=6QD=2/3R
How to calculate SD
BEST/most reliable : SD , Worst : QD , Unitless : CV
EXTREME : Most- Range, SD ; Least – QD, MD
All are absolute but CV is RELATIV
All are changes with scale but not with origin (CV unaltered)
PROBABILITY
A' U B' = (A n B)' ; BD , PD = discrete (PMF) ; ND = continuous (PDF)
BIONOMIAL DISTRIBUTION :
success or failure ; p+q=1 and p(x)=(N/x)(p^x . q^{n-x})
AM (U1)= np ; variance (U2) = npq ; skewness (U3)= npq(q-p) ; kurtosis (U4)=npq(1+3pq{n-2})
P < ½ = +ve skewed ; > ½ -ve skewed ; = ½ SYMMETRY
Mean > Variance ; N = 1 tends to barnouli ; = infinite tends to poisson
POISSON DISTRIBUTION
Here the Lamda(y) = parameter of PD = Mean = Variance = Skewness = always > 0
Kurtosis (U4) = 3y^2 + y ; ex-death, defect, miscall
NORMAL DISTRIBUTION :
deMovire, Bess shape; curve under 1; Symmetric about mean;
Mean=Median=Mode ; U3=0 ; U4=3 ; Range : - ∞ to + ∞ ;
RANGE = 6u ; MD = 4/5u ; QD = 2/3u
NORMAL CURVE
68% of data lies within ±1σ of the mean.
95% of data lies within ±2σ of the mean.
99.7% of data lies within ±3σ of the mean.
inflection point : changes its curvature : x = μ ± σ
TEST OF HYPOTHESIS
Null Hypothesis – H0 – No difference – RA Fisher |
Alternate – H1 ; H 1 : µ1 < µ2 = left tailed ; H 1 : µ1 > µ2 = right tailed
Type I error : Alfa : Rejecting H0 when it is true
Type II : Beta : Accepting H0 when it is false
DF : Total Number - Constraint = N-K
LOS (Level of significance): Maximum probability of Type I error (5% or 1 %)
Critical value : decide wheather accept/reject Null Hypothesis
One tailed test –critical region falls on one end (H1 : U1 > U2 or, U1 < U2)
Two tailed test – critical region falls on either end (H1 : U1 not equal to U2)
Large sample n≥30 : Z test ; Small sample ,n<30 : t , F, Chi Square
Critical Region : Depends on Type I error size
TEST OF SIGNIFICANCE
T TEST
Sample <30 ; - Gosset ; Paired and Impaired
Helps to observe significance of Correlation coefficient, regression coefficient
CHI SQUARE TEST
Sample >50 ; Non parametric ; Helmet & pearson ; (ex-genetic porblem)
ANOVA / F TEST
df = t – 1 ; Treatment = BETWEEN; Error = WITHIN
If F ≈ 1: Variance between groups ≈ variance within groups ⇒ no difference b/w treatments.
If F >> 1: b/w groups > w/w groups ⇒ at least one treatment mean is significantly different.
Larger F-values typically suggest stronger evidence against the null hypothesis.
Z TEST :
Asymptotic ; >30 ; RA Fisher ; (ex-tea drinker)
Z cal < Z tab -We accept the Ho
Two tailed 5% 1.96 , 1% 2.58 ; One tailed 5% 1.65, 1% 2.33
Z SCORE & FISHER Z :
P VALUE : P value < 0.05 or <5% = reject Null Hypothesis
Z-test when population SD is known; otherwise t-test.
Chi- for categorical data, ANOVA for comparing more than 2 means
ERROR
STANDARD ERROR
SE = SD / root of N
SAMPLING ERROR
Sampling error = Estimation – Parameter = Sample statistics – population parameter
Sampling Error : Due to random sampling variability
Non-Sampling Error : Due to bias, measurement, data entry, etc.
EXPERIMENTAL DESIGN
for TOS (Test of significance) – RA Fisher
CRD (COMPLETELY RANDOMIZED DESIGN)
One way classification, No way control or elimination
When material is LIMITED and HOMOGENOUS (ex-soil and pot experiment)
1.Replication (Independent)
2.Randomization (used)
3.Local control (not uused – due to CRD works on HOMOGENOUS only)
EDF(Error degree of Freedom) : t(r-1) Maximum among all;
FG (Fertility gradient) : zero (as it is homogeneous)
RBD (RANDOMIZED BLOCK DESIGN)
Two way classification, One way control
Use all 3 principle
FG = 1 (one direction) ; EDF = (r-1)(t-1)
Max treatment: <21 (optimum 5-12)
More accurate than CRD , MOSTLY Used
LSD (LATENT SQUARE DESIGN) :
For 5-12 treatment, Square shape ; Row = Column = Treatment = Replication
It is INCOMPLETE (because it should t cube but we take t square)
FG = 2 ; EDF = (t-1)(t-2) or (r-1)(r-2) or (t-1)(r-2) or (c-1)(c-2)
SPD (SPLIT PLOT DESIGN) :
2 treatment: Main (larger - Manure, DOS, ploughing) Submain (smaller – fertilizer , variety) , error 2
SrPD (Strip Plot Design) : both are MAIN ; error 3
CORRELATION REGRESSION :
CORRELATION :
2 way ; Dependent Variable (one effect another) ; Value : +1 to -1 ; ex – Demand & Price
Type : +ve (equal proportion) , -ve (inversely) , zero (non effect)
Measurement: scattered(most used) , kerl pearson , superman rank
REGRESSION :
Average relationship b/w variable in term of original unit of data (stripping back to average)
By Fransis Galton ; One way ; Range : - ∞ to + ∞ ; Variable dependent and independent
Independent of Origin but dependent of Scale ; AM of regression > AM of correlation
y = ax + b (a = regression coefficient or slope , b = intercept)
CORRELATION COEFFICIENT (PEARSON R)
Range : −1≤r≤1 , Unitless , r=1: perfect positive linear relationship , r=0: no linear correlation
T test for r
SAMPLING
PROBABILITY METHOD
Simple Random Sampling (SRS) Everyone has equal chance — like lottery draw
Systematic Sampling Select every kth item (e.g., every 10th student)
Stratified Sampling Divide population into groups (strata), then randomly sample from each group
Cluster Sampling Divide into clusters (e.g., villages), randomly select whole clusters, not individuals
Multistage Sampling Combine methods — e.g., pick districts (cluster), then schools (SRS) within them
NON-PROBABILITY METHOD
Convenience Sampling Choose whoever is easy to reach (e.g., asking friends)
Judgmental (Purposive) Sampling You choose samples based on what you think is best
Quota Sampling Set quota per group (e.g., 50 men, 50 women), but choose non-randomly
Snowball Sampling For hard-to-find groups (e.g., drug users), ask each participant to refer others
Census : All unit ; Sample survey : selected unit
Finite population : SWR (Sampling with replacement) ; Infinite population : SWOR