Agricultural statistics - Statistical science JRF note by Subham Mandal (part 1).pdf

SubhamMandal40 0 views 9 slides Oct 06, 2025
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

Agricultural statistics - Statistical science JRF / ICAR AIEEA note by Subham Mandal

Statistics
Diagram
Graph
Histogram
Frequency Polygon
Ogive
Pictogram
Box Plot
Frequency Distribution
Central Tendency
Arithmetic Mean
Median
Mode
Harmonic Mean
Geometric Mean
Am >= Gm >= Hm
Symmetrical...


Slide Content

STATISTICS
A specialized branch of mathematics - RA Fisher
Statistics - both singular and plural
it is QUANTITATIVE data , it may FINITE and INFINITE

DIAGRAM
 simple bar: single character , multiple bar: multiple (ONE dimension)
 Component bar : bar height depends on TOTAL
 Percentage : bar height SAME for all
 Pie chart : component of factor = SECTOR, alternative STEP BAR diagram
 Bar : base v – bar h , column : base h – bar v

GRAPH : Graphical representations for grouped quantitative data
HISTOGRAM:
 classified based on the class intervals
 suitable for calculating MODE
 EQUAL class interval if not then bar height proportion to frequency DENSITY
 No gap between bars due to CONTINUOUS class
 Bar height = Corresponding frequency of respective class
FREQUENCY POLYGON : dots against the mid-points connected by STRAIGHT line
FREQUENCY CURVE : dots against the mid-points connected by SMOOTH/FREE HAND line
OGIVE / cumulative frequency curve (value v/s cumulative frequency)
 Less than ogive : plotted against upper boundary of class interval
 More than ogive : plotted against lower boundary of class interval
 2 type ogive intersect at MEDIAN
 Can be calculated GRAPHICALLY PARTITION, Median, Decile, Percentile

PICTOGRAM: Non dimension, less accurate, used by DILETTANTE, data in COUNT, PICTURE
BOX PLOT: Multiple group continuously, handle SKEWED data well, Identify OUTLIER
FREQUENCY DISTRIBUTION
 Frequency of a variable is always INTEGER
 Frequency Distribution can be both CONTINUOUS and DISCREATE
 Individual series : DISCRETE series, each variant frequency is 1
 Open end distribution: UNCERTEIN first and last class
 Simple frequency distribution : All distinct value with their frequency
 Group frequency distribution : All value in their CLASSES with their FREQUENCY
 Continuous variable : any number , discrete: only INTEGER , VARIET : single observation

TABLE :
 simple table : one factor/variable , Complex : 2 or more
 first/heading column : STUB , first row? / column heading : CAPTION

CENTRAL TENDENCY

ARITHMETIC MEAN :

most common, BEST, Rigidly defines, based on all observation
not based on position, works even data lack, affected least by fluctuations
Can’t calculate qualitative data and open-end data, MOST affected by extreme value

MEDIAN :

Middle most, QUALITATIVE data (example : Intelligence, ability)
Not affected by extreme value , positional average, open-end series , datalack work
in case of even number item or continuous series result out of series,
Slight change = drastic change , use only in MEAN DEVIATION , not take all observation

MODE :
Most/Max frequent (CONCENTRATED), Qualitative (but less than median), positional measure
Not effected by extreme value , Large number value = observation of maximum frequency
Example : shoe/garment size , meteorological forecasting

HARMONIC MEAN :

Reciprocal of A.M OF Reciprocal of values ( example : average speed, distance , rate)
rigidly defined on all observations, amenable to further algebraic treatment.
Most suitable for HIGHLY VARIABLE series and when greater weight to smaller observations
Avarage speed : for same distance = 2AB/(A+B) , for same time (A+B)/2

GEOMETRIC MEAN :

best when data is RATIO or PERCENTAGE ; Example : Bacterial growth , cell division
MISCELLANEOUS:
Add/sub/mul/dev by any value with all the value of series will change same in mean
Quadratic mean : for negative value ; QM > AM

Most UNSTABLE is Geometric Mean
Normally : AM >= GM >= HM but for SAME OBSERVATION: AM = GM = HM
Median=Middle value=50
th
percentile = 2
nd
quartile= 5
th
decile
Symmetrical distribution : Mean = Median = Mode
Skewed distribution : Mean – Mode = 3 (Mean - Median)


DISPERSION

dispersion : scatternets or variation of observations from their average

RANGE :
Used in quality control, weather forecasts, share price analysis

STANDARD DEVIATION :
positive square-root of the arithmetic mean of the Square of the deviations of the given observation
from their arithmetic mean

basis for measuring the COEFFICIENT OF CORRELATION and sampling ,
Have characteristics of MEAN, possible further algebraic treatment,
have same UNIT of original, can’t use for COMPARISON

VARIANCE :
variance = (SD)^2 , if all value same then variance is 0
Average of sum of square of deviation, Unit is different of original

COEFFICIENT OF VARIATION :
C.V = (SD/Mean)x100 , a RELATIVE measure of dispersion
More C.V. = more variable, less stable, less homogeneous.

MEAN DEVIATION :

MD is minimum at MEDIAN, Take all observations
Sum sq of deviation minimum when taken from MEAN
Ignore sign of deviation in central tendency

QUARTILE DEVIATION :
(Q3-Q1)/2, Positional, Coefficient = (Q3-Q1)/(Q3+Q1) , Only Can calculate OPEN-END

SKEWNESS :
Lack of symmetry of tails in FD (Frequency DIstribution) curve
Negative : u3 < 0, LEFT tail more elongated, Mean < median < Mode comes to LEFT
POSITIVE : RIGHT tailed elongated , Mean > median > Mode comes to RIGHT
Kerl perason’s Skewness = (Mean - Mode) / SD

MISCELLANEOUS
4SD=5MD=6QD=2/3R
How to calculate SD
BEST/most reliable : SD , Worst : QD , Unitless : CV
EXTREME : Most- Range, SD ; Least – QD, MD
All are absolute but CV is RELATIV
All are changes with scale but not with origin (CV unaltered)

PROBABILITY


A' U B' = (A n B)' ; BD , PD = discrete (PMF) ; ND = continuous (PDF)

BIONOMIAL DISTRIBUTION :
success or failure ; p+q=1 and p(x)=(N/x)(p^x . q^{n-x})
AM (U1)= np ; variance (U2) = npq ; skewness (U3)= npq(q-p) ; kurtosis (U4)=npq(1+3pq{n-2})
P < ½ = +ve skewed ; > ½ -ve skewed ; = ½ SYMMETRY
Mean > Variance ; N = 1 tends to barnouli ; = infinite tends to poisson

POISSON DISTRIBUTION

Here the Lamda(y) = parameter of PD = Mean = Variance = Skewness = always > 0
Kurtosis (U4) = 3y^2 + y ; ex-death, defect, miscall

NORMAL DISTRIBUTION :

deMovire, Bess shape; curve under 1; Symmetric about mean;
Mean=Median=Mode ; U3=0 ; U4=3 ; Range : - ∞ to + ∞ ;
RANGE = 6u ; MD = 4/5u ; QD = 2/3u

NORMAL CURVE
68% of data lies within ±1σ of the mean.
95% of data lies within ±2σ of the mean.
99.7% of data lies within ±3σ of the mean.

inflection point : changes its curvature : x = μ ± σ



TEST OF HYPOTHESIS
Null Hypothesis – H0 – No difference – RA Fisher |
Alternate – H1 ; H 1 : µ1 < µ2 = left tailed ; H 1 : µ1 > µ2 = right tailed
Type I error : Alfa : Rejecting H0 when it is true
Type II : Beta : Accepting H0 when it is false
DF : Total Number - Constraint = N-K
LOS (Level of significance): Maximum probability of Type I error (5% or 1 %)
Critical value : decide wheather accept/reject Null Hypothesis

One tailed test –critical region falls on one end (H1 : U1 > U2 or, U1 < U2)
Two tailed test – critical region falls on either end (H1 : U1 not equal to U2)
Large sample n≥30 : Z test ; Small sample ,n<30 : t , F, Chi Square
Critical Region : Depends on Type I error size


TEST OF SIGNIFICANCE


T TEST
Sample <30 ; - Gosset ; Paired and Impaired
Helps to observe significance of Correlation coefficient, regression coefficient

CHI SQUARE TEST
Sample >50 ; Non parametric ; Helmet & pearson ; (ex-genetic porblem)

ANOVA / F TEST

df = t – 1 ; Treatment = BETWEEN; Error = WITHIN



If F ≈ 1: Variance between groups ≈ variance within groups ⇒ no difference b/w treatments.
If F >> 1: b/w groups > w/w groups ⇒ at least one treatment mean is significantly different.
Larger F-values typically suggest stronger evidence against the null hypothesis.

Z TEST :
Asymptotic ; >30 ; RA Fisher ; (ex-tea drinker)
Z cal < Z tab -We accept the Ho
Two tailed 5% 1.96 , 1% 2.58 ; One tailed 5% 1.65, 1% 2.33

Z SCORE & FISHER Z :


P VALUE : P value < 0.05 or <5% = reject Null Hypothesis

Z-test when population SD is known; otherwise t-test.
Chi- for categorical data, ANOVA for comparing more than 2 means

ERROR

STANDARD ERROR
SE = SD / root of N



SAMPLING ERROR
Sampling error = Estimation – Parameter = Sample statistics – population parameter
Sampling Error : Due to random sampling variability
Non-Sampling Error : Due to bias, measurement, data entry, etc.

EXPERIMENTAL DESIGN
for TOS (Test of significance) – RA Fisher

CRD (COMPLETELY RANDOMIZED DESIGN)
One way classification, No way control or elimination
When material is LIMITED and HOMOGENOUS (ex-soil and pot experiment)
1.Replication (Independent)
2.Randomization (used)
3.Local control (not uused – due to CRD works on HOMOGENOUS only)
EDF(Error degree of Freedom) : t(r-1) Maximum among all;
FG (Fertility gradient) : zero (as it is homogeneous)



RBD (RANDOMIZED BLOCK DESIGN)
Two way classification, One way control
Use all 3 principle
FG = 1 (one direction) ; EDF = (r-1)(t-1)

Max treatment: <21 (optimum 5-12)
More accurate than CRD , MOSTLY Used

LSD (LATENT SQUARE DESIGN) :
For 5-12 treatment, Square shape ; Row = Column = Treatment = Replication
It is INCOMPLETE (because it should t cube but we take t square)
FG = 2 ; EDF = (t-1)(t-2) or (r-1)(r-2) or (t-1)(r-2) or (c-1)(c-2)

SPD (SPLIT PLOT DESIGN) :
2 treatment: Main (larger - Manure, DOS, ploughing) Submain (smaller – fertilizer , variety) , error 2
SrPD (Strip Plot Design) : both are MAIN ; error 3


CORRELATION REGRESSION :
CORRELATION :
2 way ; Dependent Variable (one effect another) ; Value : +1 to -1 ; ex – Demand & Price
Type : +ve (equal proportion) , -ve (inversely) , zero (non effect)
Measurement: scattered(most used) , kerl pearson , superman rank

REGRESSION :
Average relationship b/w variable in term of original unit of data (stripping back to average)
By Fransis Galton ; One way ; Range : - ∞ to + ∞ ; Variable dependent and independent
Independent of Origin but dependent of Scale ; AM of regression > AM of correlation
y = ax + b (a = regression coefficient or slope , b = intercept)

CORRELATION COEFFICIENT (PEARSON R)

Range : −1≤r≤1 , Unitless , r=1: perfect positive linear relationship , r=0: no linear correlation
T test for r

SAMPLING

PROBABILITY METHOD

Simple Random Sampling (SRS) Everyone has equal chance — like lottery draw
Systematic Sampling Select every kth item (e.g., every 10th student)
Stratified Sampling Divide population into groups (strata), then randomly sample from each group
Cluster Sampling Divide into clusters (e.g., villages), randomly select whole clusters, not individuals
Multistage Sampling Combine methods — e.g., pick districts (cluster), then schools (SRS) within them


NON-PROBABILITY METHOD

Convenience Sampling Choose whoever is easy to reach (e.g., asking friends)
Judgmental (Purposive) Sampling You choose samples based on what you think is best
Quota Sampling Set quota per group (e.g., 50 men, 50 women), but choose non-randomly
Snowball Sampling For hard-to-find groups (e.g., drug users), ask each participant to refer others

Census : All unit ; Sample survey : selected unit
Finite population : SWR (Sampling with replacement) ; Infinite population : SWOR