BBA 2ND SEM STATISTIC.pdf

11,236 views 71 slides Jul 11, 2022
Slide 1
Slide 1 of 71
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71

About This Presentation

statistic


Slide Content

Prof . T RAMA KRISHNA RAO (8839271225 )



BBA 2
nd
SEM
STATISTICS

Prof . T RAMA KRISHNA RAO (8839271225 )

PT R.S.S.U
BBA II
Statistics


Unit-I
Meaning and definition of Statistics; Scope and Limitations of Statistics; Processing and
Presentation of Data.
Unit-II
Measures of Central Tendencies; Mean, Geometric Mean , Median, Mode.
Unit-III
Measure of Variation : Standard Deviation and Skewness.
Unit-IV
Correlation Analysis – Karlpearson’s co-efficient of Correlation.
Unit-V
Index Number, Time Series Analysis

Prof . T RAMA KRISHNA RAO (8839271225 )


Unit 1
Statistics




PT R.S.S.UNIVERSITY PREVIOUS YEAR QUESTION PAPERS

2016

Q.1 Define statistic Explain the ways in which statistical data can be presented with the help of suitable example?
Q.2 Different between classification and tabulation , mention the requisites of a good statistical table?

2015

Q.1 Explain the meaning and scope of statistics bringing out its importance in field of business?
Q.2 What do you mean by data ?what are objectives Explain different kind of classification of data?
Q.3Draw a histogram to represent the following frequency distribution .
Marks 0-10 10-20 20-40 40-50 50-60 60-70 70-90 90-100
No of students 4 6 14 16 14 10 16 5

2014

Q.1 Define statistics ,what are the main function ?discuss briefly the limitation of statistics ?
Q.2 What is tabulation ? what are its use ? mention the items that a good statistical table contain?
Q.3 Draw a frequency polygon for the following distribution
Class interval 15-25 25-35 35-45 45-55 55-65 65-75
Frequency 10 16 18 15 13 4

2013

Q.1 Explain the meaning and scope of statistics bringing out its importance in field of business?
Q.2 What is meant by classification ? what precaution are to be taken in selecting class intervals?
Q.3 Represent the following data by a Pie chart?
Food 87
Clothing 24
Recreation 11
Education 13
Rent 25
Miscellaneous 20









Meaning and definition of Statistics; Scope and Limitations of Statistics; Processing
and Presentation of Data

Prof . T RAMA KRISHNA RAO (8839271225 )

STATISTICS
Meaning:
“Statistics”, that a word is often used, has been derived from the Latin word ‘Status’ that means a group of numbers or figures; those
represent some information of our human interest.
collecting information about states and other information which was needed about their people, their number, revenue of the state etc.
Definition:
The term ‘Statistics’ has been defined in two senses, i.e. in Singular and in Plural sense.
In plural sense, it means a systematic collection of numerical facts and in singular sense; it is the science of collecting, classifying and
using statistics.
A. In the Plural Sense:
“Statistics are numerical statements of facts in any department of enquiry placed in relation to each other.” —A.L.
Bowley
“The classified facts respecting the condition of the people in a state—especially those facts which can be stated in
numbers or in tables of numbers or in any tabular or classified arrangement.” —Webster
These definitions given above give a narrow meaning to the statistics as they do not indicate its various aspects as are
witnessed in its practical applications. From the this point of view the definition given by Prof. Horace Sacrist appears to
be the most comprehensive and meaningful:
“By statistics we mean aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a
predetermined purpose, and placed in relation to each other.”—Horace Sacrist
B. In the Singular Sense:
“Statistics refers to the body of technique or methodology, which has been developed for the collection, presentation and
analysis of quantitative data and for the use of such data in decision making.” —Ncttor and Washerman
“Statistics may rightly be called the science of averages.” —Bowleg
“Statistics may be defined as the collection, presentation, analysis, and interpretation of numerical data.” —Croxton and
Cowden
Some Modern Definitions:
“Statistics is a body of methods for making wise decisions on the face of uncertainty.” —Wallis and Roberts
“Statistics is a body of methods for obtaining and analyzing numerical data in order to make better decisions in an uncertain world.” —
Edward N. Dubois


Stages of Investigations:
1. Collection of Data:It is the first stage of investigation and is regarding collection of data. It is determined that which method of
collection is needed in this problem and then data are collected.

Prof . T RAMA KRISHNA RAO (8839271225 )

2. Organisation of Data:It is second stage. The data are simplified and made comparative and are classified according to time and place.
3. Presentation of Data:In this third stage, organised data are made simple and attractive. These are presented in the form of tables
diagrams and graphs.
4. Analysis of Data:Forth stage of investigation is analysis. To get correct results, analysis is necessary. It is often undertaken using
Measures of central tendencies, Measures of dispersion, correlation, regression and interpolation etc.
5. Interpretation of Data:In this last stage, conclusions are enacted. Use of comparisons is made. On this basis, forecasting is made
Nature of Statistics
1. Statistics is Science :- Science, by definition, is a systematic body of knowledge which studies the cause and effect
relationship and endeavors to find out generalization. If we take the various statistical methods in consideration, we can define
statistics as a science in which we study:Numerous methods of collecting, editing, classifying, tabulating and presenting facts
using graphs and diagrams Several ways of condensing data regarding various social, political, and economic problems This is
done to establish a relationship between various facts. Also, it helps in analyzing and interpreting problems and forecast them
too.
2. Statistics is Art :- If Science is knowledge, Art is action or the actual application of science. While Science teaches us to know,
Art teaches us to do. statistics as an art of applying the science of scientific methods. As an art, statistics offer a better
understanding and solution to problems in real life as it offers quantitative information.While there are several statistical
methods, the successful application of the methods is dependent on the statistician’s degree of skill and experience.
According to Tippet, “Statistic is both a science and an art. It is a science in that its methods are basically systematic and have general
application and art in that their successful application depends, to a considerable degree, on the skill and special experience of the
statistician, and on his knowledge of the field of application.”
Characteristics
1. Statistics are Aggregate of Facts: Only those facts which are capable of being studied in relation to time, place or frequency
can be called statistics. Individual, single or unconnected figures are not statistics because they cannot be studied in relation to
each other. Due to this reason, only aggregate of facts e.g., data relating to I.Q. of a group of students, academic achievement of
students, etc. are called statistics and are studied in relation to each other.
2. Statistics are Affected to a marked Extent by Multiplicity, of Causes:Statistical data are more related to social sciences and
as such, changes are affected to a combined effect of many factors. We cannot study the effect of a particular cause on a
phenomenon. It is only in physical sciences that individual causes can be traced and their impact is clearly known. In statistical
study of social sciences, we come to know the combined effect of multiple causes.
3. Statistics are Numerically Expressed:Qualitative phenomena which cannot be numerically expressed, cannot be described as
statistics e.g. honesty, goodness, ability, etc. But if we assign numerical expression, it maybe described as ‘statistics’.
4. Statistics are Enumerated or estimated according to Reasonable Standards of Accuracy:The standard of estimation and of
accuracy differs from enquiry to enquiry or from purpose to purpose. There cannot be one standard of uniformity for all types of
enquiries and for all purposes. A single student cannot be ignored while calculating I.Q. of 100 students in group whereas 10
soldiers can be easily ignored while finding out I.Q. of soldiers of whole country.
5. Statistics are Collected in a Systematic Manner:In order to have reasonable standard of accuracy statistics must be collected
in a very systematic manner. Any rough and haphazard method of collection will not be desirable for that may lead to improper
and wrong conclusion. Accuracy will also be not definite and as such cannot be believed.
6. Statistics for a Pre-determined Purpose:The investigator must have a purpose beforehand and then should start the work of
collection. Data collected without any purpose is of no use. Suppose we want to know intelligence of a section of people, we
must not collect data relating to income, attitude and interest. Without having a clear idea about the purpose we will not be in a
position to distinguish between necessary data and unnecessary data or relevant data and irrelevant data.
7. Statistics are Capable of being Placed in Relation to each other:Statistics is a method for the purpose of comparison etc. It
must be capable of being compared, otherwise, it will lose much of its value and significance. Comparison can be made only if
the data are homogeneous.

Importance and Scope of Statistics:
(i) Statistics in Planning:Statistics is indispensable in planning—may it be in business, economics or government
level. The modern age is termed as the ‘age of planning’ and almost all organisations in the government or
business or management are resorting to planning for efficient working and for formulating policy decision.To
achieve this end, the statistical data relating to production, consumption, birth, death, investment, income are of
paramount importance. Today efficient planning is a must for almost all countries, particularly the developing
economies for their economic development.
(ii) Statistics in Mathematics:Statistics is intimately related to and essentially dependent upon mathematics. The
modern theory of Statistics has its foundations on the theory of probability which in turn is a particular branch of

Prof . T RAMA KRISHNA RAO (8839271225 )

more advanced mathematical theory of Measures and Integration. Ever increasing role of mathematics into
statistics has led to the development of a new branch of statistics called Mathematical Statistics.Thus Statistics
may be considered to be an important member of the mathematics family. In the words of Connor, “Statistics is a
branch of applied mathematics which specialises in data.”
(iii) Statistics in Economics:Statistics and Economics are so intermixed with each other that it looks foolishness to
separate them. Development of modern statistical methods has led to an extensive use of statistics in
Economics.All the important branches of Economics—consumption, production, exchange, distribution, public
finance—use statistics for the purpose of comparison, presentation, interpretation, etc. Problem of spending of
income on and by different sections of the people, production of national wealth, adjustment of demand and
supply, effect of economic policies on the economy etc. simply indicate the importance of statistics in the field of
economics and in its different branches.Statistics of Public Finance enables us to impose tax, to provide subsidy,
to spend on various heads, amount of money to be borrowed or lent etc. So we cannot think of Statistics without
Economics or Economics without Statistics.
(iv) Statistics in Social Sciences:Every social phenomenon is affected to a marked extent by a multiplicity of factors
which bring out the variation in observations from time to time, place to place and object to object. Statistical
tools of Regression and Correlation Analysis can be used to study and isolate the effect of each of these factors
on the given observation.Sampling Techniques and Estimation Theory are very powerful and indispensable tools
for conducting any social survey, pertaining to any strata of society and then analysing the results and drawing
valid inferences. The most important application of statistics in sociology is in the field of Demography for
studying mortality (death rates), fertility (birth rates), marriages, population growth and so on.In this context
Croxton and Cowden have rightly remarked:“Without an adequate understanding of the statistical methods, the
investigators in the social sciences may be like the blind man groping in a dark room for a black cat that is not
there. The methods of statistics are useful in an over-widening range of human activities in any field of thought
in which numerical data may be had.”
(v) Statistics in Trade:As already mentioned, statistics is a body of methods to make wise decisions in the face of
uncertainties. Business is full of uncertainties and risks. We have to forecast at every step. Speculation is just
gaining or losing by way of forecasting. Can we forecast without taking into view the past? Perhaps, no. The
future trend of the market can only be expected if we make use of statistics. Failure in anticipation will mean
failure of business.Changes in demand, supply, habits, fashion etc. can be anticipated with the help of statistics.
Statistics is of utmost significance in determining prices of the various products, determining the phases of boom
and depression etc. Use of statistics helps in smooth running of the business, in reducing the uncertainties and
thus contributes towards the success of business.
(vi) Statistics in ResearchWork:The job of a research worker is to present the result of his research before the
community. The effect of a variable on a particular problem, under differing conditions, can be known by the
research worker only if he makes use of statistical methods. Statistics are everywhere basic to research activities.
To keep alive his research interests and research activities, the researcher is required to lean upon his knowledge
and skills in statistical methods.
Limitations of Statistics
1. Qualitative Aspect Ignored:The statistical methods don’t study the nature of phenomenon which cannot be expressed in
quantitative terms.Such phenomena cannot be a part of the study of statistics. These include health, riches, intelligence etc. It
needs conversion of qualitative data into quantitative data.

2. It does not deal with individual items:It is clear from the definition given by Prof. Horace Sacrist, “By statistics we mean
aggregates of facts…. and placed in relation to each other”, that statistics deals with only aggregates of facts or items and it does
not recognize any individual item. Thus, individual terms as death of 6 persons in a accident, 85% results of a class of a school
in a particular year, will not amount to statistics as they are not placed in a group of similar items. It does not deal with the
individual items, however, important they may be.

3. It does not depict entire story of phenomenon:When even phenomena happen, that is due to many causes, but all these causes
can not be expressed in terms of data. So we cannot reach at the correct conclusions. Development of a group depends upon
many social factors like, parents’ economic condition, education, culture, region, administration by government etc. But all
these factors cannot be placed in data. So we analyse only that data we find quantitatively and not qualitatively. So results or
conclusion are not 100% correct because many aspects are ignored.

4. It is liable to be miscued:As W.I. King points out, “One of the short-comings of statistics is that do not bear on their face the
label of their quality.” So we can say that we can check the data and procedures of its approaching to conclusions. But these
data may have been collected by inexperienced persons or they may have been dishonest or biased. As it is a delicate science
and can be easily misused by an unscrupulous person. So data must be used with a caution. Otherwise results may prove to be
disastrous.

5. Laws are not exact:As far as two fundamental laws are concerned with statistics:(i) Law of inertia of large numbers and(ii)
Law of statistical regularity, are not as good as their science laws.They are based on probability. So these results will not always

Prof . T RAMA KRISHNA RAO (8839271225 )

be as good as of scientific laws. On the basis of probability or interpolation, we can only estimate the production of paddy in
2008 but cannot make a claim that it would be exactly 100 %. Here only approximations are made.

6. Results are true only on average: the results are interpolated for which time series or regression or probability can be used.
These are not absolutely true. If average of two sections of students in statistics is same, it does not mean that all the 50 students
is section A has got same marks as in B. There may be much variation between the two. So we get average results.
“Statistics largely deals with averages and these averages may be made up of individual items radically different from each
other.” —W.L King

7. To Many methods to study problems:In this subject we use so many methods to find a single result. Variation can be found by
quartile deviation, mean deviation or standard deviations and results vary in each case.
“It must not be assumed that the statistics is the only method to use in research, neither should this method of considered the
best attack for the problem.” —Croxten and Cowden
Data
The facts and figures which can be numerically measured are studied in statistics. Numerical measures of same characteristic is known as
observation and collection of observations is termed as data. Data are collected by individual research workers or by organization through
sample surveys or experiments, keeping in view the objectives of the study. The data collected may be:
• Primary Data
• Secondary Data
Primary Data
Primary data means the raw data (data without fabrication or not tailored data) which has just been collected from the source and has not
gone any kind of statistical treatment like sorting and tabulation. The term primary data may sometimes be used to refer to first hand
information.
Sources of Primary Data :- The sources of primary data are primary units such as basic experimental units, individuals, households.
Following methods are used to collect data from primary units usually and these methods depends on the nature of the primary unit.
Published data and the data collected in the past is called secondary data.
• Personal Investigation : -The researcher conducts the experiment or survey himself/herself and collected data from it. The
collected data is generally accurate and reliable. This method of collecting primary data is feasible only in case of small scale
laboratory, field experiments or pilot surveys and is not practicable for large scale experiments and surveys because it take too
much time.
• Investigators :- The trained (experienced) investigators are employed to collect the required data. In case of surveys, they
contact the individuals and fill in the questionnaires after asking the required information, where a questionnaire is an inquiry
form having a number of questions designed to obtain information from the respondents. This method of collecting data is
usually employed by most of the organizations and its gives reasonably accurate information but it is very costly and may be
time taking too.
• Questionnaire :- The required information (data) is obtained by sending a questionnaire (printed or soft form) to the selected
individuals (respondents) (by mail) who fill in the questionnaire and return it to the investigator. This method is relatively cheap
as compared to “through investigator” method but non-response rate is very high as most of the respondents don’t bother to fill
in the questionnaire and send it back to investigator.
• Local Sources : -The local representatives or agents are asked to send requisite information who provide the information based
upon their own experience. This method is quick but it gives rough estimates only.
• Telephone :- The information may be obtained by contacting the individuals on telephone. Its a Quick and provide accurate
required information.
• Internet :- With the introduction of information technology, the people may be contacted through internet and the individuals
may be asked to provide the pertinent information. Google survey is widely used as online method for data collection now a
day. There are many paid online survey services too.
Secondary Data
Data which has already been collected by someone, may be sorted, tabulated and has undergone a statistical treatment. It is fabricated or
tailored data.
Sources of Secondary Data
• Government Organizations;- Federal and Provincial Bureau of Statistics, Crop Reporting Service-Agriculture Department,
Census and Registration Organization etc.
• Semi-Government Organization ;- Municipal committees, District Councils, Commercial and Financial Institutions like banks
etc

Prof . T RAMA KRISHNA RAO (8839271225 )

• Teaching and Research Organizations:-Research Journals and Newspapers
• Internet
Primary and Secondary Data in Statistics :The difference between primary and secondary data in Statistics is that Primary data
is collected firsthand by a researcher (organization, person, authority, agency or party etc) through experiments, surveys, questionnaires,
focus groups, conducting interviews and taking (required) measurements, while the secondary data is readily available (collected by
someone else) and is available to the public through publications, journals and newspapers.
Classification of Data:
Data classification is broadly defined as the process of organizing data by relevant categories so that it may be used and protected more
efficiently. On a basic level, the classification process makes data easier to locate and retrieve. Data classification is of particular
importance when it comes to risk management, compliance, and data security.
Data classification involves tagging data to make it easily searchable and trackable. It also eliminates multiple duplications of data, which
can reduce storage and backup costs while speeding up the search process. Though the classification process may sound highly technical,
it is a topic that should be understood by your organization’s leadership.
Data is of two types:
qualitative data and quantitative data. : Qualitative data are data that represent a quality. Whereas, quantitative data are data that represent
a numeric quality.
Definition of Classification of Data:
According to Secrist, “Classification is the process of arranging data into sequences and groups according to their common
characteristics”.
In other words, classification of data is the process of organizing data into groups according to various parameters. The most crucial
parameter is the similarities that exist among data.
For example, the number of students who have registered for a sports event can be classified on the following basis:
• Gender
• Age
• Weight
• Height
• Institutions/Colleges
• Sports played by them etc.
Functions of Classification of Data:
1. Studying relations – classifying the collected data helps analyse and study the relationships between them. Moreover, the
organization of statistical data can enable effective decision making.
2. Condense the data – sometimes the data collected for statistical manipulations are wide and raw. In order to make decisions
based on the data, it is crucial to make the data more comprehensive. This can be done with the help of tabulation. Hence,
classifying the data provides a condensed form of it that can be easily comprehensible.
3. Treatment of data – data collected from various sources is meaningless by itself. The data so collected should undergo
manipulation in order to be useful for decision making. It becomes difficult to treat raw and unclassified data and is hence
important to classify the data before doing so. Classification of data helps facilitate the statistical treatment of the data.
4. Comparisons – wide, raw and unclassified data is impossible to deal with and arrive at any conclusion. Conclusions cannot be
arrived at without treating the data and making a statistical analysis. Hence, classified/organized/tabulated data enables analysts
to make meaningful comparisons on various criteria.
Rules For Classifying Data:
1. classification of the collected data is a very important technique while performing statistical treatments. It is all the more
important to remember the rules of classifying the data. These rules form the backbone and act as guiding principles for well-
classified data. These rules are mentioned below:
2. Unambiguous – the classes should be rigid and unambiguous (clear). An unclear classification can have severe consequences
and can also impact all further statistical treatments.
3. Exhaustive – every classified data must be exhaustive in the sense that they should belong to one of the classes or categories.
4. Stability – in order to facilitate effective comparisons of data, it is important that the classified data are stable. Classified data
should be stable in the sense that the same classification pattern must be adopted throughout the analysis. Adopting different
classification techniques for the same analysis would lead to ambiguity.

Prof . T RAMA KRISHNA RAO (8839271225 )

5. Suitable for the purpose – it is crucial to remember the objective of the report or analysis while classifying data. Avoid
classifying the data in a manner that does not suit the purpose of the inquiry.
6. Flexibility – it is important to classify data in a manner that allows future modification. Due to changing conditions, there may
arise the need to change the statistical methods and data classifications. In such a situation, a flexible classification of data
would solve many issues.
Problems With Classifying Data:
1. Classification of data has many functions and various benefits. But there are also some key issues in organizing data. The most
important problems associated with it are mentioned below:
2. Organizing data can be a very tedious and complex task for many companies or individuals.
3. Classifying data is a purely instinctive and a non-intuitive action that can lead to misjudgements. These misjudgements can
often cause a lot of inconvenience and errors.
4. Redoing the entire process of classification can be very time consuming and nerve-racking.
5. Classifying data can be done only with the help of a statistical analyst.
6. It is impossible to classify data without having moderate knowledge on the same.
Organization of Data:
1. Chronological Classification – The chronological classification of data emphasizes the occurrence of time. Under this type of
data classification, data is classified on the bases of differences in time. The time series data (used frequently in economic and
business statistics) is an example of data being classified in a chronological manner.

2. Geographical Classification – The geographical organization of data emphasizes on the geographical representation of data.
Under this type of data classification, data is classified on the basis of geographical boundaries and location differences.
Classifying based on states, cities and districts is a geographical classification. Classifying based on countries and continents are
also examples of data being classified in a geographical manner.

3. Qualitative Classification – The qualitative classification of the data emphasizes on certain qualitative phenomenon of the
data. Under this type of data classification, data is classified on the basis of qualitative measurements. Classifying based on
qualities like honesty, intelligence and also aptitude are some examples of data being classified in a qualitative manner.

4. Quantitative Classification – The quantitative classification of the data emphasizes on certain quantitative phenomenon of the
data. Under this type of data classification, data is classified on the basis of quantitative measurements. Classifying based on
quantities like sales, profits, age, height and also weight are some examples of data being classified in a quantitative manner.
Introducing Tabulation:
Tabulation refers to the process of arranging all the collected data in a tabular format. Tabulation is also the systematic presentation of
data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. Tabulation is an important
device for presenting data in a condensed manner that is easily understandable and furnishes maximum information. It also facilitates easy
comparison between 2 or more parameters.
There are 7 key parts of a table
1. Table number
2. Table title
3. Headnotes (also known as prefatory notes)
4. Captions
5. The body of the table
6. Foot-note
7. Source note
Tabulation is mandatory to create charts and graphical representations. Data, tabulation and these diagrammatic representations are very
important in the process of policy making, decision making and formulation of strategies.
STEPS FOR EFFECTIVE DATA CLASSIFICATION
1. Understand the Current Setup: Taking a detailed look at the location of current data and all regulations that pertain to your
organization is perhaps the best starting point for effectively classifying data. You must know what data you have before you
can classify it.

2. Creating a Data Classification Policy: Staying compliant with data protection principles in an organization is nearly impossible
without proper policy. Creating a policy should be your top priority.

Prof . T RAMA KRISHNA RAO (8839271225 )

3. Prioritize and Organize Data: Now that you have a policy and a picture of your current data, it’s time to properly classify the
data. Decide on the best way to tag your data based on its sensitivity and privacy.

Different between classification and tabulation ,

BASIS FOR COMPARISON CLASSIFICATION TABULATION
Meaning
Classification is the process of grouping
data into different categories, on the
basis of nature, behavior, or common
characteristics.
Tabulation is a process of summarizing
data and presenting it in a compact
form, by putting data into statistical
table.
Order After data collection After classification
Arrangement Attributes and variables Columns and rows
Purpose To analyse data To present data
Bifurcates data into Categories and sub-categories Headings and sub-headings


Requisites of good statistical table
1. Suit the purpose
2. Scientifically prepared
3. Clarity
4. Manageable size
5. Columns and rows should be numbered
6. Suitably approximated
7. Attractive getup
8. Units should be mentioned
9. Averages & totals should be given
10. Logically arranged
11. Proper lettering
Frequency
The frequency of any value is the number of times that value appears in a data set. So from the above examples of colours, we can say two
children like the colour blue, so its frequency is two. So to make meaning of the raw data, we must organize. And finding out the
frequency of the data values is how this organisation is done.
Frequency Distribution
Many times it is not easy or feasible to find the frequency of data from a very large dataset. So to make sense of the data we make a
frequency table and graphs.
Types of Frequency Distribution:The frequency distribution is further classified into five. These are:
1. Exclusive Series
2. Inclusive Series
3. Open End Series
4. Cumulative Frequency Series
5. Mid-Values Frequency Series
Exclusive Series

In such a series, for a particular class interval, all the data items having values ranging from its lower limit to just below the upper limit
are counted in the class interval. In other words, we do not include the items that have values less than the lower limit, equal to the upper
limit and greater than the upper limit.Note that here the upper limit of a class repeats itself in the lower limit of the next interval. This is
the most used type of frequency distribution.
Weight Frequency
40-50 2
50-60 10

Prof . T RAMA KRISHNA RAO (8839271225 )

60-70 5
70-80 3


Inclusive Series

On the contrary to exclusive series, an inclusive series includes both its upper and lower limit. Of course, this means that we do not
include the items with values less than the lower limit and greater than the upper limit.
Marks Frequency
10-19 5
20-29 13
30-39 6

Open End Series

In an open-end series, the lower limit of the first class in the series and the upper limit of the last class in the series is missing. Instead,
there is ‘below the lower limit’ of the first class and ‘lower limit and above the lower limit’ of the last class.
Age Frequency
Below 5 4
5-10 6
10-20 10
20 and above 8
Cumulative Frequency Series

In a cumulative frequency series, we either add or subtract the frequencies of all the preceding class intervals to determine the frequency
for a particular class. Further, the classes are converted into either ‘less than the upper limit’ or ‘more than the lower limit’.

Mid-Values Frequency Series

A mid-value frequency series is the one in which we have the mid values of class intervals and the corresponding frequencies. In other
words, the mid values represent the range of a particular class interval.
GRAPH OF DATA FREQUENCY
1. Histogram
2. Bar Graphs
3. Polygons
4. pie chart
5. Line Graphs
6. Ogive Graph / Cumulative Frequency
Histogram

A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This
allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc. An example of a
histogram, and the raw data it was constructed from, is shown below:

36 25 38 46 55 68 72 55 36 38
67 45 22 48 91 46 52 61 58 55

construct histogram from a continuous variable

To construct a histogram from a continuous variable you first need to split the data into intervals, called bins. In the example above, age
has been split into bins, with each bin representing a 10-year period starting at 20 years. Each bin contains the number of occurrences of
scores in the data set that are contained within that bin. For the above data set, the frequencies in each bin have been tabulated along with
the scores that contributed to the frequency in each bin

Bin Frequency Scores Included in Bin
20-30 2 25,22
30-40 4 36,38,36,38
40-50 4 46,45,48,46
50-60 5 55,55,52,58,55

Prof . T RAMA KRISHNA RAO (8839271225 )

60-70 3 68,67,61
70-80 1 72
80-90 0 -
90-100 1 91

Notice that, unlike a bar chart, there are no "gaps" between the bars (although some bars might be "absent" reflecting no frequencies).
This is because a histogram represents a continuous data set, and as such, there are no gaps in the data (although you will have to decide
whether you round up or round down scores on the boundaries of bins).

Bar graph

A bar graph is a chart that uses bars to show comparisons between categories of data. The bars can be either horizontal or vertical. Bar
graphs with vertical bars are sometimes called vertical bar graphs. A bar graph will have two axes. One axis will describe the types of
categories being compared, and the other will have numerical values that represent the values of the data. It does not matter which axis is
which, but it will determine what bar graph is shown. If the descriptions are on the horizontal axis, the bars will be oriented vertically, and
if the values are along the horizontal axis, the bars will be oriented horizontally.

Types of Bar Graphs

There are many different types of bar graphs. They are not always interchangeable. Each type will work best with a different type of
comparison. The comparison you want to make will help determine which type of bar graph to use. First we'll discuss some simple bar
graphs.

vertical bar :- A simple vertical bar graph is best when you have to compare between two or more independent variables. Each variable
will relate to a fixed value. The values are positive and therefore, can be fixed to the horizontal value.


Horizontal bar graph:- If your data has negative and positive values but is still a comparison between two or more fixed independent
variables, it is best suited for a horizontal bar graph. The vertical axis can be oriented in the middle of the horizontal axis, allowing for
negative and positive values to be represented.

Range Bar Graph represents a range of data for each independent variable. Temperature ranges or price ranges are common sets of data
for range graphs. Unlike the above graphs, the data do not start from a common zero point but begin at a low number for that particular
point's range of data. A range bar graph can be either horizontal or vertical.

Prof . T RAMA KRISHNA RAO (8839271225 )

.




Difference Between A Bar Chart And A Histogram

The major difference is that a histogram is only used to plot the frequency of score occurrences in a continuous data set that has been
divided into classes, called bins. Bar charts, on the other hand, can be used for a great deal of other types of variables including ordinal
and nominal data sets.

Polygons

A frequency polygon is almost identical to a histogram, which is used to compare sets of data or to display a cumulative frequency
distribution. It uses a line graph to represent quantitative data.

Statistics deals with the collection of data and information for a particular purpose. The tabulation of each run for each ball in cricket
gives the statistics of the game. Tables, graphs, pie-charts, bar graphs, histograms, polygons etc. are used to represent statistical data
pictorially.

In the upcoming discussion let us discuss how to represent a frequency polygons. These are visually substantial method of representing
quantitative data and its frequencies.

To draw frequency polygons, we begin with, drawing histograms and follow the following steps:

Step 1- Choose the class interval and mark the values on the horizontal axes
Step 2- Mark the mid value of each interval on the horizontal axes.
Step 3- Mark the frequency of the class on the vertical axes.
Step 4- Corresponding to the frequency of each class interval, mark a point at the height in the middle of the class interval
Step 5- Connect these points using the line segment.
Step 6- The obtained representation is a frequency polygon



Solution: Following steps are to be followed to construct a histogram from the given data:
• The heights are represented on the horizontal axes on a suitable scale as shown.
• The number of students is represented on the vertical axes on a suitable scale as shown.
• Now rectangular bars of widths equal to the class- size and the length of the bars corresponding to a frequency of the class
interval is drawn.
• ABCDEF represents the given data graphically in form of frequency polygon as:

Prof . T RAMA KRISHNA RAO (8839271225 )










PIE CHART

A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart,
the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents. While it is named for
its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is
generally credited to William Playfair's Statistical Breviary of 1801
Represent the following data by a Pie chart?
Food 87
Clothing 24
Recreation 11
Education 13
Rent 25
Miscellaneous 20

Exp Persentage Degree
Food 8700 48.33333333 174
Clothing 2400 13.33333333 48
Recreation 1100 6.111111111 22
Education 1300 7.222222222 26
Rent 2500 13.88888889 50
Miscellaneous 2000 11.11111111 40
total salary 18000 100 360

Convert percentage to degree = ( 360 * Percentage ) /100

Prof . T RAMA KRISHNA RAO (8839271225 )




Line Graphs

Line Graphs are used to display quantitative values over a continuous interval or time period. A Line Graph is most frequently used to
show trends and analyse how the data has changed over time.Line Graphs are drawn by first plotting data points on a Cartesian coordinate
grid, then connecting a line between all of these points. Typically, the y-axis has a quantitative value, while the x-axis is a timescale or a
sequence of intervals. Negative values can be displayed below the x-axis.

The direction of the lines on the graph works as a nice metaphor for the data: an upward slope indicates where values have increased and a
downward slope indicates where values have decreased. The line's journey across the graph can create patterns that reveal trends in a
dataset.

When grouped with other lines (other data series), individual lines can be compared to one another. However, avoid using more than 3-4
lines per graph, as this makes the chart more cluttered and harder to read. A solution to this is to divide the chart into smaller multiples
(have a small Line Graph for each data series).
Food 8700
Clothing 2400
Recreation 1100
Education 1300
Rent 2500
Miscellaneous 2000
total salary 18000





Ogive Graph / Cumulative Frequency

An ogive (oh-jive), sometimes called a cumulative frequency polygon, is a type of frequency polygon that shows cumulative frequencies.
In other words, the cumulative percents are added on the graph from left to right.

Food, 174
Clothing , 48Recreation, 22
Education
, 26
Rent , 50
Miscellaneous, 40
Food
Clothing
Recreation
Education
Rent
Miscellaneous
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Series1

Prof . T RAMA KRISHNA RAO (8839271225 )

An ogive graph plots cumulative frequency on the y-axis and class boundaries along the x-axis. It’s very similar to a histogram, only
instead of rectangles, an ogive has a single point marking where the top right of the rectangle would be. It is usually easier to create this
kind of graph from a frequency table.

Draw an Ogive Graph :Example question: Draw an Ogive graph for the following set of data:

02, 07, 16, 21, 31, 03, 08, 17, 21, 55 03, 13, 18, 22, 55, 04,14, 19, 25, 57,06, 15, 20, 29, 58.

Step 1: Make a relative frequency table from the data. The first column has the class limits, the second column has the frequency (the
count) and the third column has the relative frequency (class frequency / total number of items):


Step 2: Add a fourth column and cumulate (add up) the frequencies in column 2, going down from top to bottom. For example, the second
entry is the sum of the first row and the second row in the frequency column (5 + 5 = 10), and the third entry is the sum of the first,
second, and third rows in the frequency column (5 + 5 + 6 = 16):


Step 3: Add a fifth column and cumulate the relative frequencies from column 3. If you do this step correctly, your values should add up
to 100% (or 1 as a decimal):


Step 4: Draw an x-y graph with percent cumulative relative frequency on the y-axis (from 0 to 100%, or as a decimal, 0 to 1). Mark the x-
axis with the class boundaries.

Step 5: Plot your points. Note: Each point should be plotted on the upper limit of the class boundary. For example, if your first class
boundary is 0 to 10, the point should be plotted at 10.

Step 6: Connect the dots with straight lines. the ogive is one continuous line, made up of several smaller lines that connect pairs of dots,
moving from left to right.

Draw Histogram ,Bar Graphs,Polygons,piechart,Line Graphs ,Ogive Graph / Cumulative Frequency

Q.1

Prof . T RAMA KRISHNA RAO (8839271225 )

X: 0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59
F: 5 8 7 11 9 10

Q.2

Section Average marks in Mathematics No. of Students
A 75 50
B 60 60
C 55 50

Q .3
Wages ` 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of workers: 25 30 45 15 25 30
[Ans: 35, 40]

Q.4
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequency: 50 10 20 40 20 30 30
[Ans: 30; 28]

Q.5
Marks No. of Students Marks No. of Students
Less than 10
Less than 20
10 – 30
30 and above
5
20
35
60
40 – 50
50 and above
60 and above
10
25
9

Prof . T RAMA KRISHNA RAO (8839271225 )

UNIT 2
STATISTICS


UNIVERSITY PREVIOUS YEAR QUESTION PAPERS
2016
Q. 1 Calculate the average daily sales from the following data by assumed mean method
Daily sales 40 50 60 70 80
No od salesman 5 6 10 12 3
Ans : 60.55

Q.2 find out median from the following table:
Daily Wages no of employees Daily Wages no of employees
50-59 15 90-99 45
60-69 40 100-109 40
70-79 50 110-109 15
80-89 60 Ans 84.08

2015
Q.1 What do you mean by Arithmetic mean ? Discuss its merits and demerits Also state its importance properties?
Q.2 An incomplete distribution is given below
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 10 20 ? 40 ? 25 12
Total frequencies if median value is 35 (Ans :35,25)
Q.3 calculate mode from the following
Marks 0 -10 10-20 20-40 40-50 50-70
No of student 2 7 18 15 8
Ans 35.71

2014
Q.1 What do you mean by central tendency ? what are the common measures of central tendency?
Q.2 Given below is the distributation of weights of a group of 60 student in class
Weight 30-34 35-39 40-44 45-49 50-54 55-59 60-64
No of student 3 5 12 18 14 6 2
Ans: 47.5
Q.3 find the geometric mean from the following data:
Diameter 130 135 140 145 143 148 149 150
No of screwa 3 4 6 6 3 5 2 2
Ans: 142.3

2013
Q.1 the purpose of an average is to represent a group of individual values in simple and concise manner so that a quick understanding of
the general size of individual in the group can be made easily Explain?
Q .2 Find the missing frequency from the following data:
Class interval 0-10 10-20 20-30 30-40 40-50
Frequency 3 5 ? 3 2
The mean of the distribution is 23Ans:7
Q.3 calculate median from the following data :
Value Frequency Value Frequency
Less then 10 4 Less then 50 96
Less then 20 16 Less then 60 112
Less then 30 40 Less then 70 120
Less then 40 76 Less then 80 125
Ans: 36.25


Measures of Central Tendencies; Mean, Median, Mode, Geometric Mean.

Prof . T RAMA KRISHNA RAO (8839271225 )

MEASURES OF CENTRAL TENDENCY
Meaning
The word measures means ‘methods’ and the word Central Tendency means ‘average value’ of any statistical series. The , the
combined term measures of central tendency means the methods of finding out the central value or average value of a statistical
series or any series of quantitative information.
Definitions
According to Croxton and Cowden, “An averages value is a single value within the range of the data that is used to represent
all the values in the series. Since an average is somewhere within the range of the data, it is sometimes called a measure of
central value.”
In the words of Clark,“Average is an attempt to find one single figure to describe whole of figures.”
Characteristics
1. It is a single figure expressed in some quantitative form.
2. It lies between the extreme values of a series
3. It is a typical value that represents all the values of a series
4. It is capable of giving a central ideal about the series it represents
5. It is determined by some method or procedure.

Essentials of a Good Average
1. It should have clear Definition – The definitions of an average should be clear and unambiguous. It should be defined in
the form of an algebraic formula, so that each person calculating the average from a set of data, arrives at the same
figure.
2. It hold be simple to understand and easy to calculate – An average should be simple so that everybody could able to
understand without any dubious meaning. The method for calculation of average should be such that everybody can
calculate the same in an easier way.
3. It should be based on all the observations – Average is not representative unless the entire data are taken for its
calculation. So in order to make an average ideal it should be based on all the items of the series.
4. It should be suitable for further mathematical treatment – An ideal average should possess some important
mathematical property, so that it will be easier on the part of person using the same for further mathematical or
statistical analysis. By no way the use of the average figure should be restricted for single purpose rather by that average
can be used for calculation of other statistical measures like dispersion, correlation, regression and others.
5. It should not be affected by extreme items – In a sample, there may be wide variation of figures. The extreme items
i.e., highest values and lowest values, are of much higher or lower than other values. In such case, the average so
calculated will be greatly influenced by these extreme values and it cannot be treated as the true representative of
the whole distribution.

Various Measures of Central Tendency
A. Mathematical Averages:
 Arithmetic Average or Mean
 Geometric Mean
 Harmonic Mean
B. Positional averages:
 Median
 Mode
 Quartiles
 Deciles
 Percentiles
C. Miscellaneous Averages:
 Moving Average
 Progressive Average

Prof . T RAMA KRISHNA RAO (8839271225 )

Mean
“ Mean of a series is the sum of the values of a variable divided by the number of observations. “
�̅=
∑�
�

Method Individual Series Discrete Series Continues Series

Direct method
�̅=
∑�
�
�̅=
∑��
�
�̅=
∑��
�

Short cut method
�̅=??????+
∑�
�
�̅=??????+
∑��
�
�̅=??????+
∑��
�

Step deviation method
�̅=??????+
∑�′
�
×� �̅=??????+
∑��′
�
×� �̅=??????+
∑��′
�
�
Shortest method
??????̅=�
�−??????(
∑??????�
�
−�) mL = mid value of last class
Combined Mean

�̅
�.�.�=
�
��̅
�+�
��̅
�+�
��̅
�
�
�+�
�+�
�


Properties of Arithmetic Mean
1. The sum of the deviations of the items from the actual mean is always zero.
2. The sum of the squares of deviations of items from the arithmetic mean is minimum i.e., less than the sum of the squares
of deviations of items from any other value.
3. The sum of the given values of a series is equal to the product of their arithmetic average and number of items of the
series.
4. The sum of the number if items of a series are equal to the quotient of the sum of the values of the items and their
arithmetic mean.
Advantages of Mean
1. It is easy to understand and simple to compute
2. It is rigidly defined and there is no scope for ambiguity or misunderstanding about its meaning and nature.
3. Its value is based on each and every items of the data. With every change in any item, value of average will change.
4. Arrangement like ascending or descending order of data is not required while computing arithmetic mean.
5. It is not very much affected by fluctuations in sampling and thus its result is relatively dependable.
6. It can be reused for further statistical computations.

Disadvantages of Mean
1. In some cases where extreme items are either too big or small, then average is greatly affected by values of these
extreme items. Thus it fails to be the true representative of the series.
2. Its value cannot be determined graphically
3. In certain cases, arithmetic mean may give absurd result.

Prof . T RAMA KRISHNA RAO (8839271225 )


[Arithmetic Mean]

1. What do you understand by measures of averages? Explain features and functions of averages
2. Define the term ‘Averages’. Discuss the functions and types of statistical averages.
3. Explain different methods of measuring averages with examples
4. State various functions of measures of averages.
5. Why are the averages also known as central tendency? Examine the features of central tendency.
6. What is a statistical average? Explain features of good average
7. What are the functions and limitations of averages/
8. What is arithmetic mean? Explain its properties, merits and limitations

Practical Problems:
1. Find mean income of 10 employees in an organization.
Income ` (000) 10.2 15.5 18.9 20.2 25.4 26.2 29.3 31.4 32.5 32.9
[Ans: 24.25]

2. The following are the daily savings of a group of workers in a factory calculate average saving.
Savings ` 10 11 12 13 15 16 18 20 22 23 25
No of workers 2 3 5 8 9 10 15 8 6 5 4
[Ans: 17.19]

3. From the following data relating to daily wages of certain workers in a factory compute the average marks under direct and short-cut
method.
Wages ` 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 - 70
No. of workers: 7 8 9 12 16 6 2
[Ans: 33]

4. From the data given below find the mean under the step deviation method.
X: 0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59
F: 5 8 7 11 9 10
[Ans: 32.7]

5. From the following data relating to marks in Statistics secured by a batch +3 Commerce students, find out the mean marks:
Marks above: 0 10 20 30 40 50
No. of Students: 50 40 35 27 15 8
[Ans: 30]

6. Calculate the average marks of the students from the following data;
Marks below; 10 20 30 40 50 60 70 80
No. of Students: 15 35 60 84 96 127 198 250
[Ans: 50.4]

7. From the following data compute the arithmetic average under the step deviation method:
Marks below: 100 80 60 40 20
No of students: 60 55 40 35 5
[Ans: 45]

8. Find the missing frequencies of the following series, if the arithmetic average is 29.75 and the total number of items is 200:
Wages ` 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of workers: 25 ? 45 ? 25 30
[Ans: 35, 40]

9. Find the missing frequencies of the following series, if the arithmetic average ins 39.5 and the total number of items is 100:
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequency: 5 10 ? 4 20 3 ?
[Ans: 30; 28]

10. From the following frequency distribution, find the value of the median:
Marks No. of Students Marks No. of Students
Less than 10
Less than 20
10 – 30
30 and above
5
20
35
60
40 – 50
50 and above
60 and above
10
25
9
11. calculate arithmetic mean from the following from the following data:
wages (in Rs) NO of workers wages (in Rs) NO of workers
less then 48 5 72-80 8
less then 56 12 80 and above 19

Prof . T RAMA KRISHNA RAO (8839271225 )

48-64 29 88 and above 5
64 and above 31
13. Calculate mean from following data:
• 5 persons get less then rs 5
• 12 persons get less then rs 10
• 22 persons get less then rs 15
• 30 persons get less then rs 20
• 36persons get less then rs 25
• 40 persons get less then rs 30

12. The average percentage of marks secured by 200 students of Arts and Commerce is 50. The mean percentage of marks of the Arts
students is 40 and that of the commerce students is 60. Find the number of Arts and Commerce students separately. [Ans: 100; 100]

13. The average marks secured in Economics by all the Commerce and Arts students in their Board examination is 60. The average of
such mark of the Commerce students is 70, and that of Arts student is 50. Find the ratio of the number of students in the Commerce
and Arts class. [Ans: 1:1]

14. The arithmetic average of a series of 20 items has been computed as 400. While computing, two values 450 and 360 have been taken
as 540 and 630. Find correct value of the mean. [Ans: 382]

15. In a B. Com class of 128 students, 48 have failed securing 25 marks on an average. If the total marks of all the students be 5120, find
the average marks secured by the students passing the test. [Ans; 49]

Prof . T RAMA KRISHNA RAO (8839271225 )



GEOMETRIC MEAN
G.M. is the nth root of the product of ‘n’ items of a series. It is found out by multiplying all the ‘n’ values of a series and extracting
nth root of the product.
Direct method
Individual Series Discrete and Continuous Series
�.�=√??????
�×??????
�×??????
�×⋯×??????
�
�
�.�=√�
�??????
��
�??????
��
�??????
�⋯�
�??????
�
�

Logarithmic Method
Individual Series Discrete and Continuous Series
�.�=??????���
∑����
�
�.�=??????���
∑(�����)
�

Uses of geometric Mean:
1. Geometric mean is useful in calculating the average of ratios.
2. It is useful in calculating the average of changes i.e., percentage increase of decrease in sales, production, population, rate of
interest or any other variables
3. It is considered as the best of averages where more weights are to be given to small items, and less weights to large items,
4. It is most suitable in constructing index numbers.

Properties of G.M.
1. The product of items of a series will remain unchanged if each item is replaced by the geometric mean.
2. The sum of the deviations of the logarithm of the original observations above and below the logarithm of the geometric mean
are equal to zero
3. If geometric means and the number of items of two series are known, combined geometric mean can be computed.
4. If G.M. and the number of items are known, the product of the values can be found out by using the formula (G.M)
n


Advantages of G.M
1. It is based on all the items of the series.
2. It is capable of further algebraic treatment.
3. It is less affected by the extreme item
4. It is specially useful in determining the average of ratios and percentage.
5. It is a suitable average in determining rates of change in any variables.
6. It is very much useful in construction of an idal index number
7. It is hardly affected by the fluctuation of sampling.

Disadvantages of Geometric Mean
1. It is not easily understood and difficult ot calculate
2. It any value of a series is Zero, then the value of G.M will also be zero.
3. It gives comparatively more weights to smaller items and less weight to larger items.

Exercise – B [Geometric Mean]

1. Find the G.M of the series: 133; 141; 125; 173; 182 [Ans:
149]
2. Calculate the G.M of the figures: 5, 10, 192, 14374, 20498,
120674. [Ans: 126.9]
3. From the following figures find the G.M:
X: 10 20 30 40 50 60
F: 12 15 25 10 6 2
[Ans: 25.30]

4. Calculate the G.M for the following distribution:
X: 0 –
10
10 –
20
20 –
30
30 –
40
40 –
50
F: 14 23 27 21 15
[Ans: 20.80]

5. Calculate the weighted Geometric Mean from the following
data:
Groups Index
Number
Weights
Food 125 7
Clothing 133 5
Fuel and Lighting 141 4
House Rent 173 1
Miscellaneous 182 3
[Ans: 139.8]

Median
Median refers to that value of the variable which divides the series into two equal parts, one part consists of all values
greater than the median and other part consists of all values less than the median. It is a positional average.
Direct method:
Individual & Discrete Series:�=??????������
�+�
�
������
Continuous Series: �=??????������
�
�
������
Interpolation method:
For ascending series: �=�
�+
�
�
(�−�)
Where;
�
�= Lower Limit of median Class
I = Class interval of Median Class
f = Respective frequency of Median Class
m =
�
�

c = Previous Cumulative Frequency of Median Class
Properties of Median:
1. Median is an average of position.
2. The sum of the deviations taken from the median ignoring plus and minus signs will be less than the sum of deviations
from any other arbitrary point.
3. If median and number of items are known, missing frequencies can be traced out.
4. Advantages of median:
5. It is easy to calculate and simple to understand it is rigidly defined.
6. It is not affected by the extreme items of a series.
7. It can be determined easily in open end series and unequal class intervals.
8. It can be calculated graphically.
9. It is useful when the data cannot be measured quantitatively such as honesty, wealth, intelligence etc.
10. It can be located by inspection from the series.

Disadvantages of median:
1. It is not based on all the observations of the series, hence may not be representative in many cases.
2. It is not cable of further algebraic treatment.
3. It is very much affected by fluctuations in sampling
4. Median ignores the values of extreme items.
5. It is erratic if the number of items is small.
6. It cannot be determined if the data are not arranged in proper form either ascending or descending order.

[Median]
1. Determine the value of the median from the following series
X: 5 7 9 12 10 8 7 15 21
[Ans: 9]

2. From the following frequency distribution determine the value of median:
Wages (`): 35 55 45 60 70 65 75 80
No. of
Workers:
25 10 12 9 16 8 15 5
[Ans: 60]

3. From the following data given below calculate the median:
Classes: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequency: 7 18 24 32 10 6 5
[Ans: 30.625]

4. From the following data determine the value of the median:
X: 0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59
F: 5 10 12 8 9 6
[Ans: 27.83]

5. From the following data find out the value of the median:
Marks: Below 20 20 – 30 30 – 50 50 – 70 70 above
No. of Students: 3 4 10 5 3
[Ans: 41]

6. Locate the value of the median from the following series:
Marks less than 10 20 30 40 50 60 70
No. of Students: 3 10 18 24 33 38 40
[Ans: 33.34]

7. From the following data find out the value of the median:
Marks above 0 10 20 30 40 50 60
No. of
students:
100 80 65 53 43 25 12
[Ans: 33]

8. From the following frequency distribution, find the value of the median:
Marks No. of Students Marks No. of Students
Less than 10
Less than 20
10 – 30
30 and above
5
20
35
60
40 – 50
50 and above
60 and above
10
25
9
[Ans: 34]

9. From the data given below, trace out the missing frequency when the median is 70:
X: 0 – 20 20 – 40 40 – 60 60 – 80 80 – 100 100 – 120 120 – 140
F: 5 7 8 ? 10 6 4
[Ans: 20]

10. From the following series, find out the missing frequencies, if its median be 25 and number of students 100:
Marks: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of students: 20 10 ? 15 ? 5
[Ans: 40, 10]

11. From the following series, trace out the missing frequencies, if its median is 27.5 and number of items is 50
X: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
F: 4 ? 20 ? 7 3
[Ans: 6, 10]
12. Assume N= 100 and there are class intervals all of equal intervals all of equal size the first class intervals is 10 and under
20 the cumulative frequency of the 5
th
, 6
th
,7
th
and 8
th
class intervals are 45,70,90,and 99 respectively .find out median

Mode
Mode is that value in a series which occurs with the greatest frequency. In the words of Croxton and Cowden, “The mode of
a distribution is value at the point around which the items tend to be most heavily concentrated. It may be regarded as
the most typical of a series of values.”
Advantages of Mode:
1. It is very simple to calculate, as it can be found even by inspection.
2. It is not affected by extreme items.
3. For open end class intervals it can be determined straight away without estimating the two extreme class limits.
4. It can also be used in case of qualitative phenomenon as its calculation depends on the frequencies.
5. It can be determined graphically.
6. It is understood by a layman as it refers to a value containing maximum frequency.

Disadvantages of mode:
1. It is not rigidly defined.
2. It is not based on all the observations. Any change in extreme items will not affect the mode value.
3. It is affected by the fluctuation of sample.
4. It cannot be determined directly in case of bimodal or multimodal series.
5. It is not capable of further algebraic treatment.
6. It cannot be determined from a series of unequal class intervals unless they are arranged in a proper manner.

Choice of a suitable average
It is known that not a single average is suitable for all practical purposes. The different averages have different
characteristics and there is no universally accepted average. The choice of a particular average is usually determined on the
basis of the purpose for which investigation is undertaken. For sound statistical analysis, the choice of the average depends
upon:
1. The nature and availability of data;
2. The nature of the variable involved;
3. The purpose of the investigation;
4. The system of classification adopted, and
5. The use of the average for further statistical computations.

Choice of a suitable average is very important because it may lead to fallacious conclusions. The following points should
be remembered while selecting a particular average:
A. Arithmetic mean should be used when:
1. The distribution is not very asymmetrical.
2. The series does not have very large or very small item
3. The series does not have open end class intervals.
4. All values of the series are considered as equally important.

B. Median should be used when:
1. The series has unequal class intervals.
2. The series has open end class intervals.
3. The purpose is to determine the rank of various values.

C. Mode should be used when:
1. The purpose is to find out the most frequently items of a series.

2. The data are qualitative in nature.
3. The purpose is to find out the most common item of a series.
4. The purpose is to find the average number of children per household, average size of the shirt collar or shoes, average
number of rooms per household etc.

D. Geometric mean should be used when:
1. Ratios, rates and percentages are to be averaged
2. More weights are to be given to small items and less weights to large items.
3. It is required to construct index numbers.

E. Harmonic man should be used hen:
1. It is required to find out the average speed, average time to do a particular work, and average price at which an item
can be bought or sold.
2. It is required to compute the average rate of change in profit or loss of a concern.

Limitations of averages:
1. Sometimes an average might give very absurd result. For example, the average number of children per family might
come out in fractions which are obviously absurd.
2. An average being a single figure gives only the central idea of a phenomenon and does not reveal its entire story.
3. In certain types of distributions like U shaped distributions, an average files to represent the entire series,
4. Since average is a single figure representing the characteristics f a given distribution, proper are should be taken in its
interpretation, otherwise it might lead to very misleading conclusions.

[Mode]

1. The following are the size of shoes worn by 9 persons. Calculate the modal size:
Size: 5 4 4.5 5.5 4.5 6 4.5 4 4.5
[Ans: 4.5]

2. Find out the mode from the following observations:
Income (in `) 300 600 900 1200 1500 1800 2100
Employees: 4 8 29 11 18 13 5
[Ans: ` 900]

3. Find out the mode from the following data using an analysis table:
X: 3 4 5 6 7 8 9 10 11 12
F: 30 40 38 44 45 42 38 35 30 45
[Ans: 7]

4. Calculate the mode from the following data:
Marks: 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30
Students: 10 15 25 20 12
[Ans: 18.3]
5. Calculate the modal value from the following frequency distribution:
X: 0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99
F: 6 29 87 181 247 263 133 43 9 2
[Ans: 47.55]

6. Find out the mode from the following data:
Less than: 5 10 15 20 25 30 35 40 45
No. of
items
29 224 465 582 634 644 650 653 655
[Ans: 11.35]

7. From the following data given below find the mode;

Wages ` (above): 30 40 50 60 70 80 90
No. of Workers: 520 470 399 210 105 45 7
[Ans: `55.84]

8. From the following series, determine the value of mode:
Marks below: 100 90 80 70 60 50 40 30 20 10
No. of
Students:
50 45 43 36 30 20 16 11 6 3
[Ans: 56]
9. Locate the value of the mode from the data given below by the appropriate method:
X: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
F: 4 6 20 32 33 17 8 2
[Ans: 40.05]
10. Find out the missing frequencies in the following series, if the mode is 34 and the number of items are 60:
Wages ` 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
No. of Students: 8 7 ? 20 ? 6 4
[Ans; 10, 5]

11. From the data given below, find out the missing frequencies, if median is 67, mode is 68 and number of observations is 115:
X: 0 – 20 20 – 40 40 – 60 60 – 80 80 – 100 100 – 120 120 – 140
F: 2 8 30 ? ? ? 2
[Ans: 50, 20, 3]

12. In the following wage distribution, the median and mode are ` 33.5 and 34 rspectivly. But three class frequencies are missing.
Find out them:
X: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 Total
F: 4 16 ? ? ? 6 4 230
[Ans: 60, 100, 40]

Unit-3
MEASURE OF VARIATION


PT R.S.S.UNIVERSITY PREVIOUS YEAR QUESTION PAPERS
2016
Q. 1 A factory produce two type of electric lamps A and B in an Experiment relating to their lives the following result were
obtained :
length of life No of lamps (A) No of lamps (B)
500-700 5 4
700-900 11 30
900-1100 26 12
1100-1300 10 8
1300-1500 8 6
Ans: SD of A 21.64% , SD of B 23.41%
A ismore consistent

Q.2 Calculate the standard deviation of the following distribution in by taking assumed mean:
Age no of persons Age no of persons
20-25 170 35-40 45
25-30 110 40-45 40
30-35 80 45-50 35
Ans : 7.936

2015

Q.1 (A)What do you mean by mean deviation ? how is it different from standard deviation ?
(B)For a certain distribution the arithmetic mean is 45 median is 48 and Karl pearson coefficient of skewness is 0.4
calculate. (1) mode (2) standard deviation (3) the coefficient of variation
Ans (1) mode=54 (2) standard deviation= -22.5 (3) the coefficient of variation= -50
Q.2 Calculate the standard deviation of the following data obtained by 5 student in group marks are 8, 12 ,13 , 15 ,22 Ans : 4.60

2014

Q.1 (A)What do you mean by deviation ? how is it different from standard deviation ?
(B) Karl pearson coefficient of skewness is 0.5, the median is 42 and mode 32 calculate. (1) mean (2) standard deviation
(3) the coefficient of variation .Ans (1) mean=47 (2) standard deviation= 30 (3) the coefficient of variation= 63.83%

Q.2 Calculate the standard deviation of the following data :
X 20 30 40 50 60 70
Frequency 8 12 20 10 6 4
Ans 13.75
2013

Q.1 (A) Explain the meaning of the coefficient of variation mention how its is different from variance
(B) Calculate the standard deviation of the following data : 160, 160, 161, 162, 163, 163, 163, 164, 164, 170
Ans: 2.72

Q.2 calculate coefficient of skewness by any method of given data.
Wages 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No of person 1 3 11 21 43 32 09
Ans: -0.18
Measure of Variation : Standard Deviation and Skewness

PARTITION VALUES

Quartiles Deciles Percentiles

Quartiles

The median of a distribution splits the data into two equally-sized groups. In the same way, the quartiles are the three values that
split a data set into four equal parts. Note that the 'middle' quartile is the median. The upper quartile describes a 'typical' mark for
the top half of a class and the lower quartile is a 'typical' mark for the bottom half of the class. The quartiles are closely related to
the histogram of a data set. Since area equals the proportion of values in a histogram, the quartiles split the histogram into four
approximately equal areas.


Individual SERIES
For Odd series Q1 =Value of
(N+1)∗1
4
th item
FOR even Series Q1 =value of
(
N
4
+
1+N
4
)∗1
4
th item
Discrete Series
Q1=Value of
(N+1)∗1
4
th item
Q2=Value of
(N+1)∗2
4
th item
Q3=Value of
(N+1)∗3
4
th item
Continuous Series
For ascending series: M=L
1+
i
f
(m−c)
Where;
L
1= Lower Limit of median Class
I = Class interval of Median Class
f = Respective frequency of Median Class
for Q1 , m =
??????∗1
4
for Q2 , m =
??????∗2
4
for Q3 , m =
??????∗3
4

c = Previous Cumulative Frequency of Class

Deciles

In a similar way, the deciles of a distribution are the nine values that split the data set into ten equal parts.You should not try to
calculate deciles from small data sets -- a single class of marks is too small to get useful values since the extreme deciles are very
variable. However the deciles can be useful descriptions for larger data sets such as national distributions for marks from standard
tests.

Individual SERIES
For Odd series D1 =Value of
(N+1)∗1
10
th item
FOR even Series D1 =value of
(
N
10
+
1+N
10
)∗1
10
th item
Discrete Series
D1=Value of
(N+1)∗1
10
th item
D2=Value of
(N+1)∗2
10
th item
D9=Value of
(N+1)∗9
10
th item
Continuous Series
For ascending series: M=L
1+
i
f
(m−c)
Where;
L
1= Lower Limit of median Class
I = Class interval of Median Class
f = Respective frequency of Median Class
for D1 , m =
??????∗1
10
for D2 , m =
??????∗2
10
for D9 , m =
??????∗9
10

c = Previous Cumulative Frequency of Class

Percentiles

In a similar way, the percentiles of a distribution are the 99 values that split the data set into a hundred equal parts. These
percentiles can be used to categorise the individuals into percentile 1, ..., percentile 100. A very large data set is required before
the extreme percentiles can be estimated with any accuracy. (The 'random' variability in marks is especially noticeable in the
extremes of a data set.)

Individual SERIES
For Odd series P1 =Value of
(N+1)∗1
100
th item

FOR even Series P1 =value of
(
N
100
+
1+N
100
)∗1
100
th item
Discrete Series
P1=Value of
(N+1)∗1
100
th item
P2=Value of
(N+1)∗2
100
th item
P65=Value of
(N+1)∗65
100
th item
Continuous Series
For ascending series: M=L
1+
i
f
(m−c)
Where;
L
1= Lower Limit of median Class
I = Class interval of Median Class
f = Respective frequency of Median Class
for P1 , m =
??????∗1
100
for P2 , m =
??????∗2
100
for P65 , m =
??????∗65
100

c = Previous Cumulative Frequency of Class



1. From the following data find out quartiles deciles percentiles
.
Weight in Kg. 47 50 58 45 53 59 47 60 49


From the following data find out quartiles deciles percentiles
Size of items; 5 15 25 35 45 55 65 75 85
Frequency: 3 8 15 20 25 10 9 6 4

From the following data find out quartiles deciles percentiles
Marks: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
No. of Students: 5 8 15 16 6

Measures of Dispersion

Formulae of Measures of Dispersion

On dispersion by the methods of limits

1. Range = L – S
2. Co-efficient of Range =
L−S
L+S

3. Inter-quartile Range = Q
3−Q
1
4. Coefficient of Inter-quartile Range =
Q3−Q1
Q3+ Q1

5. Semi inter quartile range or Quartile deviation: Q.D. =
Q3−Q1
2

6. Co-efficient of Q.D =
Q3−Q1
Q3+ Q1


On dispersion by the method of computation:

1. Mean deviation:
Individual series Discrete and Continuous series
Mean Deviation: δ =
∑|D|
N
Mean Deviation: δ =
∑f|D|
N


2. Coefficient of M.D
From Mean From Median From Mode

Coeff. M.D. =
δ
Mean



Coeff. M.D. =
δ
Median


Coeff. M.D. =
δ
Mode



3. Standard Deviation
Methods Individual series Discrete / Continuous
Series
Direct method (based
on deviation from
Mean)
σ =√
∑x
2
N
σ =√
∑fx
2
N

Short-cut Method (on
assumed Mean)
σ =√
∑d
x
2
N
−(
∑dx
N
)
2
σ =√
∑fd
x
2
N
−(
∑fdx
N
)
2

Step-deviation method
σ =√
∑d
2
N
−(
∑d
N
)
2
σ =√
∑fd
2
N
−(
∑fd
N
)
2

Method based on values
(when assumed mean is
taken as zero)
σ =√
∑X
2
N
−(
∑X
N
)
2
σ =√
∑FX
2
N
−(
∑FX
N
)
2


4. Other Formulae

Variance: V = σ
2


Standard deviation of 1
st
‘N’ natural numbers: σ =√
1
12
(N
2
−1)
Coefficient of Standard Deviation: Coeff. σ =
σ
Mean

Coefficient of Variance: Coeff. C.V. =
σ
Mean
×100
Range = 6 σ; Q.D =
2
3
×σ ; and M.D. =
4
5
×σ

DISPERSION

Meaning:

The word dispersion means deviation or difference. In statistics dispersion refers to deviation of the values of a variable from
their central value. Measures of dispersion indicate the extent to which individual items vary from their averages i.e., Mean,
Median or Mode. It shows the spread of items of a series from their central value.

Definition:
1. According to A. L. Bowely, “Dispersion is the measure of variation of the items.”
2. According to L. R. Connor “Dispersion is a measure of the extent to which the individual items vary”
3. According to Spiegal, “the degree to which numerical data tend to spread about an average value is called the variation
of dispersion of the data.”

Characteristics of dispersion:

For the foregoing definition, the essential characteristics of a measures of dispersion can be outlined as under:
1. It consists of different methods through which variations can be measured in quantitative manner.
2. It deals with a statistical series.
3. It indicates the degree, or extent to which the various items of a series deviate from its central value.
4. It supplements the measures of central tendency in revealing the characteristics of a frequency distribution.
5. It speaks of the reliability, or otherwise of the average value of a series.

Characteristics for an ideal measures of dispersion

 It should be rigidly defined.
 It should be easy to calculate and simple to understand
 It should be based on all the observations of the series.
 It should be used further for any algebraic treatment.
 It should not be affected much by the fluctuation of sampling
 It should be affected by the extreme items of th series.

Objectives of dispersion
 A measure of dispersion tells us whether an average is a true representative of the series or not.
 The extent of variability between two or more series can be compared with the help of measures of dispersion. It is
useful to determine the degree of uniformity, reliability and consistency amongst two or more sets or data.
 Measures of dispersion facilitate the use of other statistical measure like correlation, regression etc, for further analysis.
 Measures of dispersion serve as a basis for control of the variability itself.

Types of Measures of Dispersion
A. Methods of Limit
 Range
 Inter-quartile range
 Semi inter quartile range
 Deciles range
 Percentile range
B. Methods of Moment
 Mean deviation
 Standard deviation
 Coefficient of variance
 Variance
C. Graphic Method –Lorenz Curve

Range

Range is defined as the difference between the two extreme values of a series. Thus, it is merely the difference between the
largest and smallest items of the series.
Advantages of Range;
1. It is easy to calculate and simple to understand.
2. It is rigidly defined
3. It takes the least possible time for calculation
4. In certain types of problems like quality control, weather forecasts etc. use of range is very useful.
Disadvantages of Range:
1. It is influenced very much by fluctuation of sampling
2. It does not take into consideration all the items of the series.
3. It is not capable of further algebraic treatment.
4. It does not take into consideration the frequencies of a series
Uses of Range:
 Quality control-Range has got a special application in the quality control measures. The control charts are prepared on
the basis of range for controlling the quality of products.
 Weather forecast- range is used advantageously by a metrological department for forecast the weather condition.
 Measurement of fluctuations- Range is a very useful measure to study the fluctuation of prices of certain commodities
viz, stock and shares, gold, silver etc.

Inter-quartile Rang;

Inter-quartile range is computed by deducting the value of the first quartile from the value of third quartile. Inter-quartile range is
defined as the difference between the two extreme quartiles of a series.
Advantages of inter-quartile range;
1. It is rigidly defined.
2. It can be easily calculated and simple to understand.
3. Its calculation is not affected even if first 25% and last 25% of a series are missing or changed.
Disadvantages of inter-quartile range:
1. It is not based on all the observations of the series.
2. It is not capable of further algebraic treatment.
3. It is affected by fluctuation in sampling.

Quartile deviation

Quartile deviation is based on central 50% of items. Quartile range is the difference between Q3 and Q1 and when this difference
is divided by 2 we get quartile deviation. Thus quartile deviation is defined as the average of the difference of two extreme
quartiles of a series.
Advantages of quartile deviation:
1. It is easy to calculate and simple to understand
2. Its calculation is based on middle 50% of item; hence it is a goods measure of dispersion.
3. It is rigidly defined. it is not very much affected by the extreme values of a series.
4. It is easy to calculate in case of open-end series.
Disadvantage
1. It is not capable of further algebraic treatment
2. It is too much affected by fluctuations of samples
3. It is not based on all the observations of a series
4. It does not show the scatterness around any average.

Mean deviation

Mean deviation is the average difference between the items in a series from the mean, median or mode.
Merits:
 It is better measure for comparison
 It is extensively used in other fields
 Mean deviation is less affected by the value of extreme items than the standard deviation.
Demerits
 It ignores ±signs in its calculation
 It is difficult to compute when average is in fraction.
 It is rarely used in sociological studies.

Standard Deviation

S.D. is the square root of the mean of the squared deviation from the actual mean. It is introduced by Karl person in 1823. It is by
far the most important and widely used measure of studying dispersion.

Note : - if we find consistence of two group the which S.D is less id more consistence

Merits:
 All individual values are taken into account for calculation of S.D.
 It is capable of further algebraic treatment.
 It is the most rigidly defined measure of dispersion.
 It is used as an important instrument in making higher statistical analysis viz., correlation, regression etc.
Demerits
 It is not easy to calculate S.D.
 It is not understood by a common man.
 It is affected very much by the extreme items of a series.

Difference between M.D. and S.D

 While calculating standard deviation algebraic signs ± are not ignored whereas in mean deviation algebraic signs are
completely ignored.
 Standard deviation is always calculated from arithmetic mean whereas mean deviation can be calculated either from
mean, median or mode.
 Standard deviation is much affected by the extreme observations of the series but that is not the cases with mean
deviation.

Variance:

Variance is the square of standard deviation. Thus, variance is calculated as – (S.D.)
2

The term variance was used by R.A. Fisher in 1913, if a phenomenon is affected by a number of variables, variances helps in
isolating the effects of differential factors.
Coefficient of Variation

Coefficient of variation is defined as “the percentage of variation in mean, standard deviation being considered as the total
variation in the mean.”
This measure developed by Karl Pearson is the most commonly used measure of relative variation. It is used in such problems
where we want to comparative the variability of two or more than two series.
Lorenz Curve

For studying the dispersion of a series graphically we are to draw a graph of Lorenz curves as devised by the famous Economist
Lorenz of England. This curve was used for the first time for measuring the distribution of wealth and income.

Coefficient of Variation (CV)

The coefficient of variation (CV) is a statistical measure of the dispersion of data points in a data series around the mean. The
coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the
degree of variation from one data series to another, even if the means are drastically different from one another.

Exercise A

1. Form the following distribution ascertain the value of range and its coefficient.
10 15 20 25 30 40 50 55 60 70
[Ans: 60; 0.75]

2. From the following series, determine the value of range and its coefficient:
Salary (per month) 1000 1500 2000 2500 3000 3500 4000 5000
No. of worker 30 20 15 3 7 10 9 6
[Ans: 4000; 0.67]

3. From the following distribution, determine the value of the range and its coefficient:
Wages (per day) 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50
No. of labourers 2 14 6 8 11 9
[Ans: 30; 0.43]

4. From the following data, determine the Range and the Coefficient of Range of marks awarded in statistics by the +2Commerce
students of Swami Vivekananda College:
Marks 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69
No. of Students 15 5 12 14 10 8
[Ans: 60; 0.76]

5. From the following distribution, find the range and its coefficient:
Group Below 50 50 – 60 60 – 80 80 – 110 110 – 150 150 & above
Frequency 5 10 8 7 13 7
[Ans: 155; 0.224]

6. Calculate the semi-inter quartile range, or quartile deviation and its coefficient of the following data:
Wages in ` 20 30 40 50 60 70 80
No. of workers 3 61 132 153 140 51 3
[Ans: ` 10; ` 0.2]

7. From the following discrete series, find out the deciles range, semi deciles range, and their coefficients:
Age 15 16 17 18 19 20 21 22
No of students 5 20 18 17 10 5 3 1
[Ans: 4; 2; 0.8]

8. Calculate quartile deviation and its relative measure for the following distribution:
Group: 20 – 29 30 – 39 40 – 49 50 –59 60 – 69 70 -- 79
Frequency: 306 182 144 96 42 34
[Ans: 10.71; 0.29]


Mean Deviation (δ)

1. From the following series relating to the marks obtained by a batch of 9 students in a certain test, calculate the mean deviation
from mean and median and also calculate their coefficients.
Weight in Kg. 47 50 58 45 53 59 47 60 49
[Ans: 4.89; 0.094; 4.67; 0.0934]

2. Find out the mean deviation from mean, median and mode, and also their coefficient form the following series:
Size of items; 5 15 25 35 45 55 65 75 85
Frequency: 3 8 15 20 25 10 9 6 4
[Ans: 14.99; 14.8; 14.8]

3. Calculate the mean deviation from mean for the following series. Also, find out its coefficient:
Marks: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
No. of Students: 5 8 15 16 6
[Ans: 9.44; 0.35]

4. Calculate mean deviation from median from the following data:
Marks secured Below Below Below Below Below Below Below Below

80 70 60 50 40 30 20 10
No of students 100 90 80 60 32 20 13 5
[Ans: 14.31]

5. Calculate median, and mean deviation from median for the following frequency distribution:
Age in years 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45
No of person 7 10 16 32 24 18 10 5 1
[Ans: 19.95; 7.1]

Standard Deviation

1. Calculate the standard deviation from the following data of income of 10employees of a firm by direct method; short-cut
method, and step deviation method:
Income (`) 600 620 640 620 680 670 680 640 700 650
[Ans: ` 30.33]

2. From the following discrete series, find out the standard deviation by all the possible methods:
Marks: 10 20 30 40 50 60
No. of students 8 12 20 10 7 3
[Ans: 13.45]

3. Calculate the standard deviation for the following data in different possible methods:
Class interval: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
No of students: 7 12 24 10 7
[Ans: 11.397]

4. Calculate the standard deviation from the following data:
Age in years 10-19` 20-29 30-39 40-49 50-59 60-69 70-79
Frequency: 3 61 233 137 53 79 4
[Ans: 12.4]

5. Calculate standard deviation and coefficient of standard deviation of the following series:
Wages in ` No of workers Wages in ` No of workers
Upto ` 10 12 Upto ` 50 165
Upto ` 20 30 Upto ` 60 202
Upto ` 30 45 Upto ` 70 222
Upto ` 40 107 Upto ` 80 230
[Ans: 16.52; 41]

6. The following data relate to the profit/loss made by engineering companies in Odisha during the year 2012-13:
Wages in ` -10 – 0 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
Less than 10 19 24 49 87 31 27
Calculate the standard deviation, and its coefficients. Also, calculate the coefficient of variation. [Ans: 13.55; 0.6134; and
61.34%]

7. The following are the maks obtained by 40 students of a class. Calculate the coefficient of variation:
Marks Students Marks Students Marks Students
80 – 84
75 – 79
70 – 74
65 – 69
1
1
1
4
60 – 64
55 – 59
50 – 54
45 – 49
4
7
6
6
40 – 44
35 – 39
30 – 34
25 – 29
6
3
0
1
[Ans: 21.8%]

8. A factory produces two types of lams. In an experiment in the working life of these lams, the following results were obtained:
Length of life
(in hours)
No. of lamps
Type – A Type – B
500 – 700
700 – 900
900 – 1100
1100 – 1300
1300 – 1500
5
11
26
10
8
4
30
12
8
6
Compare the variability using the coefficient of variation. [Ans: 21.64; 23.40]

Skewness
If one tail is longer than another, the distribution is skewed. These distributions are sometimes called asymmetric or asymmetrical
distributions as they don’t show any kind of symmetry. Symmetry means that one half of the distribution is a mirror image of the
other half. For example, the normal distribution is a symmetric distribution with no skew. The tails are exactly the same.
A left-skewed distribution has a long left tail. Left-skewed distributions are also called negatively-skewed distributions. That’s
because there is a long tail in the negative direction on the number line. The mean is also to the left of the peak.
A right-skewed distribution has a long right tail. Right-skewed distributions are also called positive-skew distributions. That’s
because there is a long tail in the positive direction on the number line. The mean is also to the right of the peak.

Mean and Median in Skewed Distributions
In a normal distribution, the mean and the median are the same number while the mean and median in a skewed distribution
become different numbers:A left-skewed, negative distribution will have the mean to the left of the median

A right-skewed distribution will have the mean to the right of the median.

Effects on Statistics
The normal distribution is the easiest distribution to work with in order to gain an understanding about statistics. Real life
distributions are usually skewed. Too much skewness, and many statistical techniques don’t work. As a result, advanced
mathematical techniques including logarithms and quantile regression techniques are used. Read more about quantile regression
here.
Skewed Left (Negative Skew) :- A left skewed distribution is sometimes called a negatively skewed distribution because it’s
long tail is on the negative direction on a number line.A common misconception is that the peak of distribution is what defines
“peakness.” In other words, a peak that tends to the left is left skewed distribution. This is incorrect. There are two main things
that make a distribution skewed left:The mean is to the left of the peak. This is the main definition behind “skewness”, which is
technically a measure of the distribution of values around the mean.The tail is longer on the left.In most cases, the mean is to the
left of the median. This isn’t a reliable test for skewness though, as some distributions (i.e. many multimodal distributions)
violate this rule. You should think of this as a “general idea” kind of rule, and not a set-in-stone one.

Skewed Right / Positive Skew :-A right skewed distribution is sometimes called a positive skew distribution. That’s because the
tail is longer on the positive direction of the number line.
Formula
Karl Pearson’s Coefficient of Skewness

1. Pearson’s Coefficient of Skewness #1 uses the mode. The formula is:


Where = the mean, Mo = the mode and s = the standard deviation


2. Pearson’s Coefficient of Skewness uses the median. The formula is:


Where = the mean, Mo = the mode and s = the standard deviation

Bowley’s coefficient of skewness

Absolute formula =(Q3 – M ) – (M- Q1 ) = Q3 + Q1 -2M
Relative measure = (Q3 + Q1 -2M) / (Q3-Q1 )

Kelly coefficient of skewness
jpercentile = (P90 + P10 -2P50) / (P90-P10)

Based on deciles
jdeciles = (D9 + D1 -2D5) / ( D9-D1 )

1. Calculate the Karl Pearson’s coefficient of Skewness from the following data:
Size: 1 2 3 4 5 6 7
Frequency: 10 18 30 25 12 3 2
[Ans: 0.184]

2. Calculate the coefficient of Skewness based on mean and median from the following distribution:
X: 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
F: 6 12 22 48 56 32 18 6
[Ans: 41.7; 42.14; –0.086]

3. Calculate Karl Pearson’s Coefficient of Skewness from the following data:
X: 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50
F: 8 16 30 45 62 32 15 6
[Ans: –0.22]
4. Calculate coefficient of variation and Karl Pearson’s coefficient of Skewness from the following data:
Sales (crores) less than 20 40 60 80 100
No of companies: 8 20 50 70 80
[Ans: 42.65; 0.0063]
5. From the following data find out the Bowley’’s coefficient of Skewness:
Marks in Maths 90 50 52 86 87 76 80 85 58 61 65
[Ans: –0.286]
6. Calculate the Quartile coefficient of Skewness for the following
Monthly Income ` No of family Monthly Income ` No of family
501 – 600
601 – 700
701 – 800
801 – 900
5
17
80
186
901 – 1000
1001 – 1100
1101 – 1200
1201 - 1300
208
134
68
18
[Ans: 0.025]

7. The measure of Skewness for a certain distribution is –0.8. If the lower and upper quartiles are 44.1 and 56.6 respectively,
find the median. [Ans: 55.35]

8. In a frequency distribution of the coefficient of Skewness based on quartiles is 0.6. If the sum of upper and lower quartiles is
100 and median is 38, find the value of the upper quartile. [Ans: 70]

9. Pearson’s coefficient of Skewness of a distribution is 0.64. Its mean is 82 and Mode 50. Find the standard deviation [Ans:
50]

10. When mean 86, Median 80 and Karl Pearson’s coefficient of Skewness 0.42, find the coefficient of variance [Ans: 49.83]

Unit-4
CORRELATION


PREVIOUS YEAR PT R.S.S.U QUESTION PAPERS
2016

Q.1 Calculate Karl Pearson’s coefficient of correlation from the data given below:
X: 3 7 5 4 6 8 2 7
Y: 7 12 8 8 10 13 5 10
[Ans: 0.963]

Q.2 what is correlation ? Explain implication of positive and negative correlation show by means of scatter diagram the presence
of perfect positive and perfect negative correlation ?

2015

Q.1 define correlation Explain different types of correlation with suitable example
Q.2 Calculate Karl Pearson’s coefficient of correlation from the data given below?c
X: 6 2 10 4 8
Y: 9 11 5 8 7
Ans:- -0.92
Q.3 define Karl Pearson’s coefficient of correlation what is intended to measure?

2014
Q.1 define correlation Explain different types of correlation with suitable example
Q.2 calculate spearman’s coefficient of rank correlation from the following data :
X: 57 16 24 65 16 16 9 40 33 48
Y: 19 6 9 20 4 15 6 24 13 13
Ans:0.7333
Q.3 Find out the coefficient of correlation between the age of husband and wife from the following data
Age Of Wife
Age of husband
20-30 30-40 40-50 50-60 60-70 Total
15-25 4 9 4 17
25-35 8 24 5 37
35-45 2 11 2 15
45-55 6 14 5 25
55-65 4 2 6
Total 4 19 45 25 7
Ans: 0.73

2013
Q.1 Define Karl Pearson’s coefficient of correlation what is intended to measure? How would you interpret the sign of
correlation coefficient ?
Q.2 explain the importance of correlation in statistical analysis in management decision situation with examples
Q.3 Calculate coefficient of correlation from the data given below:
X: 1 2 3 4 5
Y: 3 3 7 9 12
[Ans: 0.97]
Correlation Analysis – Karlpearson’s co-efficient of Correlation.

CORRELATION
Correlation is a statistical measure for finding out the degree or strength of association between two (or more) variables. By
‘association’ we mean the tendency of the variables to move together. If two variables x and y are so related that movements (or
variations) in one, say X, tend to be accompanied by corresponding movements ( or variations) in the other variable Y, then X
and Y are said to be correlated. The movements may be in the same direction (i.e., one, say X, increases and the other i.e., Y
decreases). Correlation is said to be positive or negative according as these movements are in the same or in the opposite
directions. If y is unaffected by any change in X, then X and Y are said to be uncorrelated.
Definition
L . R . Conner: “If two or more quantities vary in sympathy so that movements in the one tend to be accompanied by
corresponding movements in the other, then they are said to be correlated.”
Correlation may be linear or non-linear. If the amount of variation in X bears a constant ration to the corresponding amount of
variation in Y, then correlation between X and Y is said to be linear. Otherwise it is non-linear. Correlation coefficient or
Coefficient of correlation [r] measures the degree of linear relationship, (i.e., linear correlation) between two variables.


Utility
The utility of the study of correlation is immense both in physical as well as social sciences.. However, we shall confine
ourselves to the utility of correlation studies in social sciences only.
1. The study of correlation reduces the range of uncertainty associated with decision making. In social sciences,
particularly in the business world, forecasting is an important phenomenon, and correlation studies help us to make
relatively more dependable forecasts.
2. Correlation analysis is very helpful in understanding economic behavior; it helps us in locating such variables on
which other variables depend. This is helpful in studying factors by which economic events are affected. For example,
we can find out the factory responsible for price rise or low productivity.
3. Correlation study helps us in identifying such factors which can stabilize a disturbed economic situation.
4. Correlation study helps us to estimate the likely change in a variable with a particular amount of change in related
variable. For example correlation study can help us in finding out the change in demand with a certain amount of
change in price.
5. Inter-relationship studies between different variables are very helpful tools in promoting research and opening new
frontiers of knowledge.

TYPES OF CORRELATION
Correlation can be: [1] Positive or Negative; [2] Simple, Multiple or Partial; [3] Linear or Non-linear.
1. Positive and Negative correlation: Correlation can be either positive or negative. When the values of two variables
move in the same direction i.e., when an increase in the value of one variable is associates with an increase in the value
of other variable and a decrease in the value of one variable is associated with the decrease in the value of the other
variable, correlation is to be positive.
If, on the other hand, the values of two variables move in opposite directions, so that with an increase in the
values of one variable the value of the other variable decrease, and with a decrease in the values of one variable the
values of the other variable increase, correlation is said to be negative. There are some data in which correlation is
generally positive while in others it is negative.

2. Simple, Multiple and Partial correlation: In simple correlation we study only two variables- say price and demand.
In multiple correlations we study together the relationship between three or more factors like production, rainfall and
use of fertilizes. In partial correlation though more than two factors are involved but correlation is studied only
between two factors and the other factors are assumed to be constant.

??????=
����??????������������
??????
�×??????
�

3. Linear and Non-linear: The correlation between two variables is said to be linear if corresponding to a unit change in
the value of one variable there is a constant change in the value of the other variable i.e., incase of linear correlation
the relation between the variables x and y is of the type [�=�+��]
The correlation between two variables is said to be non-linear if corresponding to unit change in the value of
one variable the other variable does not change at a constant rate but at a fluctuating rate.

Degrees of Correlation:

1. Perfect Correlation: It two variables vary in same proportion, and then the correlation is said to be perfect correlation.
2. Positive Correlation: If increase (or decrease) in one variable corresponds to an increase (or decrease) in the other, the
correlation is said to be positive correlation.
3. Negative Correlation: If increase (or decrease) in one variable corresponds to a decrease (or increase) in the other, the
correlation is said to be positive correlation.
4. Zero or No Correlation: If change in one variable does not other, than there is no or zero correlation.

Uses of Correlation

1. It gives a precise quantitative value indicating the degree of relationship existing between the two variables.
2. It measures the direction as well as relationship between the two variables.
3. Further in regression analysis it is used for estimating the value of dependent variable from the known value of the
independent variable
4. The effect of correlation is to reduce the range of uncertainty in predictions.

Importance Of Correlation

1. Most of the variables show some kind of relationship. For instance, there is relationship between price and supply,
income and expenditure etc. With the help of correlation analysis we can measure in one figure the degree of
relationship.
2. Once we know that two variables are closely related, we can estimate the value of one variable given the value of
another. This is known with the help of regression.
3. Correlation analysis contributes to the understanding of economic behavior, aids in locating the critically important
variables on which others depend.
4. Progressive development in the methods of science and philosophy has been characterized by increase in the
knowledge of relationship. In nature also one finds multiplicity of interrelated forces.
5. The effect of correlation is to reduce the range of uncertainty. The prediction based on correlation analysis is likely to
be more variable and near to reality.

Limitations of correlation:

1. Extreme items affect the value of the coefficient of correlation.
2. Its computational method is difficult as compared to other methods.
3. It assumes the linear relationship between the two variables, whether such relationship exist or not.



Correlation table

correlation Positive correlation Negative correlation
Prefect correlation 1 -1
Very high correlation .99 to 90 -.99 to -.90
High correlation .90 to .75 -90 to -.75
Moderate correlation .75 to .25 -.75 to -.25
Low correlation .25 to 0 -.25 to 0
No correlation 0 0

Methods of calculating correlation

1. Graphical method
2. Scatter diagram
3. Karl Pearson’s
4. spearman’s coefficient
5. Coefficient of concurrent deviations
6. Least Square Method


Correlation coefficient Graphical method

The degree of association is measured by a correlation coefficient, denoted by r. It is sometimes called Pearson's correlation
coefficient after its originator and is a measure of linear association. If a curved line is needed to express the relationship, other
and more complicated measures of the correlation must be used.

The correlation coefficient is measured on a scale that varies from + 1 through 0 to - 1. Complete correlation between two
variables is expressed by either + 1 or -1. When one variable increases as the other increases the correlation is positive; when one
decreases as the other increases it is negative. Complete absence of correlation is represented by 0. Figure gives some graphical
representations of correlation.

Scatter Diagram

The scatter diagram is known by many names, such as scatter plot, scatter graph, and correlation chart. This diagram is drawn
with two variables, usually the first variable is independent and the second variable is dependent on the first variable



The scatter diagram is used to find the correlation between these two variables. This diagram helps you determine how closely
the two variables are related. After determining the correlation between the variables, you can then predict the behavior of the
dependent variable based on the measure of the independent variable. This chart is very useful when one variable is easy to
measure and the other is not.


Type of Scatter Diagram

The scatter diagram can be categorized into several types; however, I will discuss the two types that will cover most scatter
diagrams used in project management. The first type is based on the type of correlation, and the second type is based on the slope
of trend.

1. Scatter Diagram with No Correlation
2. Scatter Diagram with Moderate Correlation
3. Scatter Diagram with Strong Correlation


Scatter Diagram with No Correlation

This type of diagram is also known as “Scatter Diagram with Zero Degree of Correlation”. And In this type of scatter diagram,
data points are spread so randomly that you cannot draw any line through them.In this case you can say that there is no relation
between these two variables

.

Scatter Diagram with Moderate Correlation

This type of diagram is also known as “Scatter Diagram with Low Degree of Correlation”.Here, the data points are little closer
together and you can feel that some kind of relation exists between these two variables



.

Scatter Diagram with Strong Correlation

This type of diagram is also known as “Scatter Diagram with High Degree of Correlation”.In this diagram, data points are
grouped very close to each other such that you can draw a line by following their pattern.




Limitations of a Scatter Diagram

1. Scatter diagrams are unable to give you the exact extent of correlation.
2. Scatter diagram does not show you the quantitative measure of the relationship between the variable. It only shows the
quantitative expression of the quantitative change.
3. This chart does not show you the relationship for more than two variables.
Benefits of a Scatter Diagram

1. It shows the relationship between two variables.
2. It is the best method to show you a non-linear pattern.
3. The range of data flow, i.e. maximum and minimum value, can be easily determined.
4. Observation and reading is straightforward.
5. Plotting the diagram is relatively simple.

Karl Pearson’s Coefficient of Correlation

Definition: Karl Pearson’s Coefficient of Correlation is widely used mathematical method wherein the numerical expression is
used to calculate the degree and direction of the relationship between linear related variables.

Pearson’s method, popularly known as a Pearsonian Coefficient of Correlation, is the most extensively used quantitative methods
in practice. The coefficient of correlation is denoted by “r”.

Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is
the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). It shows the linear relationship
between two sets of data. In simple terms, it answers the question, Can I draw a line graph to represent the data? Two letters are
used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a sample.

If the relationship between two variables X and Y is to be ascertained, then the following formula is used:

Formulae of Correlation

Direct Method (based on deviation from actual
Mean) r=
∑dxdy
√∑dx
2
∑dy
2


Direct Method (on values)

�=
�∑��−∑�×∑�
√�∑�
2
−(∑�)
2
×�∑�
2
−(∑�)
2



Short cut method (based on deviation from
Assumed Mean)

�=
�∑�
��
�−∑�
�×∑�
�
√�∑�
�
2
−(∑�
�)
2
×�∑�
�
2
−(∑�
�)
2

Probable error of measurement
�??????
??????=0.6745×
1−�
2
√�


Merits

1. This method not only indicates the presence, or absence of correlation between any two variables but also, determines
the exact extent, or degree to which they are correlated.
2. Under this method, we can also ascertain the direction of the correlation i.e. whether the correlation between the two
variables is positive, or negative.
3. This method enables us in estimating the value of a dependent variable with reference to a particular value of an
independent variable through regression equations.
4. This method has a lot of algebraic properties for which the calculation of co-efficient of correlation, and a host of other
related factors viz. co-efficient of determination, are made easy.
Demerits

1. It is comparatively difficult to calculate as its computation involves intricate algebraic methods of calculations.
2. It is very much affected by the values of the extreme items.
3. It is based on a large number of assumptions viz. linear relationship, cause and effect relationship etc. which may not
always hold good.
4. It is very much likely to be misinterpreted particularly in case of homogeneous data.
5. In comparison to the other methods, it takes much time to arrive at the results.
6. It is subject to probable error which its propounder himself admits, and therefore, it is always advisable to compute it
probable error while interpreting its results.

Probable Error of Correlation Coefficient

Definition:

The Probable Error of Correlation Coefficient helps in determining the accuracy and reliability of the value of the coefficient that
in so far depends on the random sampling.

In other words, the probable error (P.E.) is the value which is added or subtracted from the coefficient of correlation (r) to get the
upper limit and the lower limit respectively, within which the value of the correlation expectedly lies.

The probable error of correlation coefficient can be obtained by applying the following formula:

r = coefficient of correlation
N = number of observations

There is no correlation between the variables if the value of ‘r’ is less than P.E. This shows that the coefficient of
correlation is not at all significant.

The correlation is said to be certain when the value of ‘r’ is six times more than the probable error; this shows that the
value of ‘r’ is significant.

By adding and subtracting the value of P.E from the value of ‘r,’ we get the upper limit and the lower limit, respectively
within which the correlation of coefficient is expected to lie. Symbolically, it can be expressed


where rho denotes the correlation in a population

The probable Error can be used only when the following three conditions are fulfilled:

1. The data must approximate to the bell-shaped curve, i.e. a normal frequency curve.
2. The Probable error computed from the statistical measure must have been taken from the sample.
3. The sample items must be selected in an unbiased manner and must be independent of each other.
4. Thus, the probable error is calculated to check the reliability of the value of coefficient calculated from the random
sampling.


Question :

1. Calculate the coefficient of correlation between X and Y from the following data:
X: 1 2 3 4 5 6 7
Y: 2 4 5 3 8 6 7
[Ans: 0.78]

2. Calculate Karl Pearson’s coefficient of correlation from the data given below:
X: 2 4 6 8 10
Y: 12 14 16 18 20
[Ans: +1]


3. Calculate the coefficient of correlation between X and Y for the values given below:
X: 2 5 7 9 19 16
Y: 25 27 26 29 34 39
[Ans: 0.89]

4. Calculate the coefficient of correlation for the following:
X: 128 140 125 121 122 146
Y: 87 96 101 93 99 140
[Ans: 0.70]

5. Calculate the Pearson’s coefficient of correlation from the following data, taking 69 and 112 as the assumed average of X
and Y respectively. Also, find the probable error.
X: 78 89 96 69 59 79 68 61
Y: 125 137 156 112 107 136 123 108
[Ans: +0.95; 0.02]

6. Find out the coefficient of correlation in the following case, taking 67 and 68 as the assumed average of X and Y
respectively:
Height of fathers (X) 65 66 67 67 68 69 71 73
Height of sons (Y) 67 68 64 68 72 70 69 70
[Ans: 0.47]

7. Compute Karl Pearson’s coefficient of correlation in the following series relating to cost of living and wages:
Wages in ` 100 101 103 102 100 99 97 98 96 95
Cost of living ` 98 99 99 97 95 92 95 94 90 91
[Ans: 0.85]

8. Calculate the value of coefficient of correlation between the price and supply. What is the probable error?
Price: 8 10 15 17 20 22 24 25
Supply: 25 30 32 35 37 40 42 45
[Ans: 0.98; 0.01]

9. Calculate Pearson’s coefficient of correlation between advertisement cost and sales:
Advertisement` 39 65 62 90 82 75 25 98 36 78
Sales ` 47 53 58 86 62 68 60 91 51 84
[Ans: 0.78]

10. Find the correlation coefficient between the income and expenditure of a wage earner and comment thereon:
Income ` 46 54 56 56 58 60 62
Expenditure ` 36 40 44 54 42 58 54
[Ans: 0.77]

11. Determine the Karl Pearson’s coefficient of correlation, when ∑XY=130,∑X=15,∑Y=40,∑X
2
=55,∑Y
2
=
330 and N=5 [Ans: 1]
12. From the following data, find out the coefficient of correlation as given by Karl Pearson, ∑d
x=−5,∑d
y=−10,∑d
x
2
=
109,∑d
y
2
=62,∑d
xd
y=43 &N=10
[Ans: 0.51]
13. Find out the Karl Pearson’s coefficient of correlation between the two variables X and Y, when X̅=74.5,Y̅=125.5,A
x=
69,A
y=112,σ
x=21.76,σ
y=13.07,∑d
xd
y=2176 and N=8
[Ans: 0.961

COEFFICIENT OF CORRELATION FOR BIVARIATE GROUPED DATA

When the number of observations is very large, we need to arrange the data into different classes, which are either discrete or
continuous. Items having values falling in a particular class are placed together and those having values falling in another class
are placed together. Due to this the whole data is divided into horizontal rows and vertical columns, with one variable placed
horizontally and the other placed vertically. The table so obtained is a two-way frequency distribution table and is called the
correlation table or Bi-variate frequency distribution table. The formula for calculating and for bi-variate distribution is given by
From the table given below calculate the coefficient of correlation between the ages of husbands and wives:
Q.1
Age of
Wife
Age of Husband
20−30 30−40 40−50 50−60 60− 70
15 − 25
25 − 35
35 -- 45
45 − 55
55 − 65
5 9
10
1
3
25
12
4

2
2
16
4



5
2
[Ans: r = 0.79]
Q.2 . Find the coefficient of correlation between the age and the sum
assured from the following table: [Ans: r = -0.26]
Age 10000 20000 30000 40000 50000
20 − 30 4 6 3 7 1
30 − 40 2 8 15 7 1
40 − 50 3 9 12 6 2
50 − 60 8 4 2
17 27 32 20 4

Q.3 . Find the coefficient of correlation between the marks
obtained by sixty candidates at an examination in two
subjects- Economics and Statistics-from the data given
below:
Stats 5 − 15 15 − 25 25 − 35 35 − 45
0 − 10 1 1
10 − 20 3 6 5 1
20 − 30 1 8 9 2
30 − 40 3 9 3
40 −50 4 4
Total 5 18 27 10
[Ans: r=0.5329]

Spearman’s Rank Correlation
This method is a development over Karl Pearson’s method of correlation on the point that-
i. It does not need the quantitative expression of the data and
ii. It does not assume that the population under study is normally distributed.
This method was introduced by the British Psychologist Charles Edward Spearman in 1904. under this method,
correlation is measured on the basis of the ranks rather than the original values of the variables. For this, the values of the two
variables are first converted into ranks in a particular order i.e., the ranks may be assigned to the different values either in
ascending or in descending order .
Special Features of Rank Correlation
i. The value of such co-efficient of correlation lies between +1 and −1.
ii. The sum of the differences between the corresponding ranks i.e, ∑d = 0.
iii. It is independent of the nature of distribution from which the sample data are collected for calculation of the co-efficient.
iv. It is calculated on the basis of the ranks of the individual items rather than their actual values.
v. Its result equals with the result of Karl Pearson’s Co-efficient of correlation unless there is repetition of any rank. This is
because, Spearman’s correlation is nothing more than the Pearson’s co-efficient of correlation between the ranks.
??????=�−
??????∑�
�
�(�
�
−�)

In case of tied:
??????=�−
??????(∑�
�
+∑
�
�
−�
��
)
�(�
�
−�)


Practical Problems:
1. 10 students were given tests in English and Mathematics. Their marks are given below:
No. 1 2 3 4 5 6 7 8 9 10
Eng. 78 40 50 55 52 49 60 54 59 58
Math 70 60 60 75 69 55 70 65 65 61

3. 10 students got the following percentage of marks in Mathematics and Statistics.
No. 1 2 3 4 5 6 7 8 9 10
Math 78 36 98 25 75 82 90 62 65 39
Stat 84 51 91 60 68 62 60 58 51 47

4. Find the rank correlation coefficient of the following data:
A 115 109 112 87 98 120 98 100 98 118
B 75 73 85 70 76 82 65 73 68 80
[

Unit-5
INDEX NUMBER, TIME SERIES ANALYSIS




PREVIOUS YEAR PT R.S.S.U QUESTION PAPERS
2016

Q.1 Explain The Following Test With Suitable Example. (A) Time Reversal Test (B)Factor Reversal Test ?
Q.2 Fit A Straight Line Trend Equation By The Method Of Least Squares And Estimate The Value Of 2019:
Year 2010 2011 2012 2013 2014 2015 2016 2017
Value 380 400 650 720 690 600 870 950
Ans :1060.87

2015

Q.1 What Is An Index Number ? Explain The Various Types Of Index Number ?
Q.2 What Is Ment By Time Series ? State The Different Components Of Time Series?
Q.3 From The Following Data Calculate Price Index Number By Fisher’s
Commodity
2014 2015
Price Quantity Price Quantity
A 10 100 12 150
B 8 80 10 100
C 5 60 10 72
D 24 30 18 33
Ans :118.60

2014:

Q.1 Write Is Meant By Time Series ? State The Different Component Of Time Series ?
Q.2 From The Following Data Calculate Price Index Number?
Commodity
2014 2015
Price Quantity Price Quantity
A 6 50 560 56
B 2 100 24 120
C 4 60 360 60
D 10 30 288 24
E 8 40 432 36
ANS 6934.473
Q.3 From The Following Data Calculate The Trend Values Using 4 Yearly Moving Average
Year 1989 1990 1991 1992 1993 1994 1995 1996 1997
Values 506 620 1036 673 588 696 1116 738 663






Index Number, Time Series Analysis

2013

Q.1 Write Short Notes On Yhe Following :
(A) Base Year’s Index (B) Relative Years Index (C) Weightsd Index
Q.2 Using The Data Calculate Price Index For The Year 1949 By Fisher Formula
Commodity
Price Quantity
1949 1958 1949 1958
Rise 9.3 4.5 100 90
Wheat 6.4 3.7 11 10
Pulse 5.1 2.7 5 3

Q.3 Fit A Trend By The Method Of Semi Average To The Data Given Below Estimate The Sales For The Year 1984?
Year 1975 1976 1977 1978 1979 1980 1981 1982 1983
Sales 18 24 26 28 33 36 40 44 48

INDEX NUMBER

Historically, the first index was constructed in 1764 to compare the Italian price index in 1750 with the price level in 1500.
Though originally developed for measuring the effect of change in prices, index numbers have become today one of the most
widely used statistical devices and there is hardly any field where they are not used. Newspapers headline the fact that prices are
going up or down, that industrial production is rising or falling, that imports are increasing or decreasing or decreasing, that
crimes are rising in a particular period compared to the previous period as disclosed by index numbers. They are used to feel the
pulse of the economy and they have come to be used as indicators of inflationary or deflationary tendencies. In fact, they are
described as ‘barometers’ of economic activity, i.e., if one wants to get an idea as to what is happening to an economy, he should
look to important indices like the index number of industrial production, agricultural production, business activity, etc.

Definition:

According to Croxton& Cowden “Index numbers are devices for measuring differences in the magnitude of a group of related
variables.”

For a proper understanding of the term index number, the following points are worth considering:
• Index numbers are specialized averages;
• Index numbers measure the net change in a group of related variables;
• Index numbers measure the effect of changes over a period of time;

Uses of Index Numbers
Index numbers are indispensable tools of economic and business analysis. Their significance can be best appreciated by the
following points:
i. They help in financing suitable policies. Many of the economic and business policies are guided by index numbers. For
example, while deciding the increase in dearness allowance of the employees, the employers have to depend primarily
upon the cost of living index. If wages and salaries are not adjusted in accordance with the cost of living, very often it
leads to strikes and lock-outs which in turn cause considerable waste of resources. The index numbers provide some
guideposts that one can use in making decisions.

ii. They reveal trends and tendencies. Since index numbers are most widely used for measuring changes over a period of
time the time series so formed enable us to study the general trend of the phenomenon under study. For example, by
examining index number of imports for India for the last 10-12 years we can say that our imports are showing an upward
tendency, i.e., they are rising year after year. Similarly, by examining the index numbers of industrial production, business
activity, etc., for the last few years we can conclude about the trend of production and business activity. By examining the
trend of the phenomenon under study we can draw very important conclusions as to how much change is taking place due
to the effect of seasonality, cyclical force, irregular forces, etc. Thus index numbers are highly useful in studying the
general business conditions.

iii. They are important in forecasting future economic activity. Index numbers are often used in time series analysis has
historical study of long-term trend, seasonal variations and business cycle development, so that business leader may keep
pace with changing economic and business conditions and have better information available for decision making purposes.

iv. Index numbers are very useful in deflating. Index numbers are highly useful in deflating, i.e., they are used to adjust the
original data for price changes, or to adjust wages for cost of living changes and thus transform nominal wages into real
wages. Moreover, nominal income can be transformed into real income and nominal sales into real sales through
appropriate index numbers.

Problems in the Construction of Index Numbers

Before constructing index numbers a careful thought must be given to the following problems:

1.The purpose of the index. At the very outset the purpose of constructing the index must be very clearly decided-what the
index is to measure and why? There is no all-purpose index. Every index is of limited and particular use. Thus, a price index that
is intended to measure consumers’ prices must not include wholesale prices. And if such an index is intended to measure the
costs of living of poor families, great care should be taken not to include goods ordinarily use by middle class and appear-income
groups. Failure to decide clearly the purpose of the index would lead to confusion and wastage of time with no fruitful results.
All, other problems such as the base year, the number of commodities to be included, the prices of the commodities, etc, are
decided in the light of the purpose for which the index is being constructed.

2.Selection of a base period. The base period of an index number is the period against which comparisons are made. It may be a
yea, a month or a day. The index for base period is always taken at 100. Though the selection of the base period would primarily
depend upon the object of the index, the following points need careful consideration of base period:
• The base period should be a normal one. The period that is selected at base should be normal, i.e., it should be free
from abnormalities like wars, earthquakes, famines, booms, depressions, etc.
• The base period should not be too distant in the past. It is desirable to have an index based on a fairly recent period,
since comparisons with a familiar set of circumstances are more helpful than comparisons with vaguely remembered
conditions.
• Fixed base or chain base. While selecting the base a decision has to be made as to whether the base shall remain fixed
or not. i.e., whether we have a fixed base or chain base index. In the fixed base method, the year or the period of years
to which all other prices are related is constant for all times. On the other hand, in the chain base method the prices of
a year are linked with those of the preceding year and not with the fixed year.

3.Selection of number of items. The items included in an index should be determined by the purpose for which the index is
constructed. Every item cannot be included while constructing an index number and hence one has to select a sample. For
example, while constructing a price index it is impossible in include each and every commodity. Hence, it is necessary to decide
what commodities to include. The commodities should be selected in such a manner that they are representative of the tastes,
habits and customs of the people for whom the index is meant.

4.Price quotations. After the commodities have been selected, the next problem is to obtain price quotations for these
commodities. it is a well known fact that prices of many commodities vary from place to place and even from shop to shop in the
same market. It is impracticable to obtain price quotations from all the places where a commodity is dealt in. A selection must be
made of representative places and persons. These places should be those which are well known fro trading for that particular
commodity. after the places from where the price quotations are to be obtained is decided, the next thing is to appoint some
person or institutions who can supply price quotations as and when required.

5.Choice of an average. Basically a choice has to be made between arithmetic mean and geometric mean. Theoretically
speaking, geometric mean is the best average in the construction of index numbers because of the following reasons:
• in the construction of index numbers we are concerned with ratios of relative changes and the geometric mean given
equal weights to equal ratio of change;
• geometric mean is less susceptible to major variations as a result of violent fluctuations in the values of the
individual items; and
• Index numbers calculated by using this average are reversible and therefore, base shifting is easily possible. The
geometric mean index always satisfies the time reversal test.

6.Selection of appropriate weights. The term ‘weight’ refers to the relative importance of the equal importance and hence it is
necessary to devise some suitable method whereby the varying importance of the different items is taken into account. This is
done by allocating weights. Thus, we have broadly two type of indices-- unweighted indices and weighted indices. In the former
case, no specific weights are assigned whereas in the latter case specific weights are assigned to various items.

7.Selection of an appropriate formula. The problem very often is that of selecting the most appropriate formula. The choice of
the formula would depend not only on the purpose of the index but also on the data available. Prof. Irving Fisher has suggested
that an appropriate index is that which satisfies Time Reversal Test and Factor Reversal Test.


Cost of Living Index Number
These index numbers are also called [1] Consumer Price Index Numbers, [2] Retail Price Index Numbers [3] Cost of Living Price
Index Numbers and [4] Price of Living Index Numbers. It tells us how much; the consumers of a particular class have to pay to
get a basket of goods and services at a particular point of time in comparison to what they paid for this basket in the base year.
Utility
These indices are of very great importance as would be obvious from the following:
I. They determine the purchasing power of money. It has been pointed out earlier that,
urchasing power of money = 1 / Cost of Living Index
II. They help in determining the real wages. As has been pointed out earlier.

Real Wage=
Actual Wages
Cost of Living Index
×100
III. They help Government and business houses to adjust the rates of Dearness Allowance on the basis of Cost of Living
Indices.
IV. These indices are very helpful in wage negotiations and wage Contracts. Automatic adjustment of wages is done on the
basis of a particular unit increase in Cost of Living Index.

V. These indices help in deflating income and value series in national accounts.

Construction of Cost of Living Index

The following are the steps in the construction of Cost Living Index Numbers:

1. Determining the scope of the index. By this is meant the selection of the group or type of persons in a region for
whom the index number has to be constructed. The index may relate to industrial workers in a locality or government
employees or teachers or agricultural laborers, etc. the group has to be meticulously defined. When we talk or
government employees, then we have to decide about the low paid or high paid government employees as their
consumption pattern differs. The class for which the index has to be constructed must be as far as possible
homogeneous from the point of view of income and habits.

2. Conducting a family budget enquiry. Family budget enquiry is held with a view to find out how much an average
family of this group spends on different items of consumption. The quantity of the commodities consumed, as also the
prices at which they are purchased are noted down. The enquiry is done on a random sample basis. Some families are
selected from the total number by lottery method and their family budgets are scrutinized in detail. The items on which
money is spent are classified in certain groups. Generally these groups are: food, clothing, fuel and lighting, house rent,
and miscellaneous.

3. Obtaining price quotations. This is by far the most important and also the most difficult task. The reason is that retail
price varies from place to place, shop to shop and even customer to customer. The prices collected should be: [a] retail
prices,[b] should relate to the specified quality, [c] should take into account discount for cash payment and interest for
late payment, and [d] controlled prices if there is price control or rationing.

4. Averaging price quotations. After the collection of price quotations an average price for each item included in the
index should be worked out. Generally the arithmetic average is used to find out a single figure of price for a particular
commodity.

5. Weighting of index numbers. To construct the index number the prices are weighted with the weights arrived at as a
result of the family budget enquiry, which has been discussed earlier. After this the index number can be constructed
by:
• Aggregate Expenditure Method or Aggregative Method.
• Family Budget Method or Weighted Relatives Method.

Base shifting:
Very often it becomes necessary to shift the base of an index number. Base shifting means changing of the given base
year of an index number and recasting it into a series based on some recent new base year. The reasons why base shifting is
necessary are:
1. When the base year is too old and is too far away from the current year it is unsuitable of meaningful comparisons. Thus if
prices of the year 1983 are compared with the prices of a base year of 1914 the comparison is not very meaningful. The base year
should be such around which the prices of current year fluctuate. The comparison of a current year’s prices can be made only
with normal prices. In times of changing prices the base year of 1914 is highly in appropriate to compare the prices of 1980,
1981, 1982 or 1983.

2. It we want to compare series of index numbers with different bases periods, a comparison would be meaningful only if the
two index numbers have a common base. Thus if want to compare prices in India with prices in United States the two index
numbers should have a common base period. Base shifting can be done by two methods. One is method of reconstructing the
entire series. Here the prices of the new base year are taken as 100 and the prices of all preceding and succeeding years are
converted into price relatives and all the index numbers are constructed afresh.

���??????������??????����??????�=
���??????�������ℎ���??????�
���??????�������ℎ�����??????��
×100

Splicing: (index number on a common base)

By splicing of index numbers we mean combining two or more series of over lapping index numbers to obtain a single index
number on a common base. This is done by the same technique as used in base shifting. Splicing of index numbers can be done
only if the index numbers are constructed with the same items, and have an over lapping year.
Splicing is generally done when an old index number with an old base is being discontinued and a new index with a new base is
being started. To have continuity of comparison the new index number is spliced to the old index number in the over lapping
year. Splicing can be the other way round also when the old index is spliced to the new index.In splicing of index numbers we
find a common factor by which the spliced index number series is multiplied to give a common base.

Time Reversal Test:

In the words of Fisher: “ The test is that the formula for calculating an index number should be such that it will give the same
ratio between one point of comparison and the other no matter which of the two is taken as base.” This means that the index
number should work both backwards as well as forwards. Thus, if the index number of the current year is 400 then the index
number of the base year (based on the current year) should be 25. In other words, the two index numbers thus calculated (without
the figure 100) should be reciprocals of each other. The reciprocal of 4 is .25 and the reciprocal of .25 is 4. The product of these
two ratios would always be equal to one. Thus, if P10 represents of the price change in the current year and P10 the price change of
the base year (based on the current year) the following equation should be satisfied: P02 ×P10 = 1
When P01 is the current year’s index based on the base year and P10 is the base year index based on the current year.
This test is not satisfied by Laspeyres index and by Paasche’s index. The methods which satisfy the Time Reversal Test are:
• Fisher’s ideal formula
• Simple geometric mean of price relatives
• Aggregative with fixed weights (Kelly’s formula)
• Marshall-Edge worth method
• Weighted geometric mean of price relatives if fixed weights are used.

Factor Reversal Test:

The changes in the price multiplied by the changes in quantity should be equal to the total change in value. Change in value is the
result of changes in price and changes in quantity and as such the product of these changes should represent the total change in
value. Thus, if the price of a commodity has doubled during a certain period and if in this period the quantity has trebled the total
change in the value should be six times the former level. In other words, if P1 and P0 represent the prices and Q1 and Q0 the
quantities in the current and the base years respectively, and if P01 represents the change in price in the current year and Q01 the
change in the quantity in the current year. The factor reversal test is satisfied only by the Fisher’s Ideal Index Number.
�
01�
01=
∑�
1�
1
∑�
0�
0

Circular Test:

It is a sort of extension of the time reversal test. Suppose an index number is constructed for the year 1983 with the base of 1981
and another index number for 1982 on the base of 1981, then it would be possible for us to directly get an index number for 1983
on the base of 1981. If the index number calculated directly does not give an inconsistent value, the circular is said to be satisfied.
If P01 represents the price change of the current year on the base year and P12 the price change of the base year on some other
base and P20 the price change of the current year on this second base then the following equation should be satisfied: P01 × P12
× P20 = 1

This test is fulfilled by unweighted or fixed weighted aggregative or by index numbers which use simple geometric mean.

DIFFERENT METHODS OF COMPUTING AN INDEX NUMBER
1. Simple
i. Simple aggregate
ii. Simple relative
2. Weighted
i. Weighted aggregate
a. General
b. Laspeyre’s
c. Paasche’s
d. Marshal &Edgeworth’s
e. Kelley’s
f. Fisher’s
ii. Weighted relative


SIMPLE METHOD

Simple Aggregative Method :- Under this method, the price index for a given period is obtained by dividing the aggregate of
different prices of the current year by the aggregate of different prices of the bases year and multiplying the quotient by 100.

�
01=
∑�
1
∑�
0

Simple relative method :-Under this method, the price index for a given year is calculated as the simple average of the price
relatives for the different items included in the index numbers. The simple average used, here, may be of any type viz., arithmetic
mean, geometric mean, harmonic mean, median or mode, but arithmetic mean is usually preferred to, for its simplicity in
calculation and Geometric Mean for its ability of measuring the relative changes which is the inherent feature of an index
number.
��??????�ℎ���??????���??????�: �
01=
∑??????
�


????????????����??????������??????���??????�: �
01=�.�.��
∑���??????
�


WEIGHTED METHOD

Weighted aggregative method:Under this method, an index number of prices for any given year is calculated after assigning the
appropriate weights to the different items included in the index number. The weights to be assigned should be rational, and
relevant for the purpose. Such weights may be assigned on the basis of quantities, values, or sale price of the commodities
consumed during the base year, or in some typical years.

�
01=
∑�
1�
∑�
0�
×100

Laspeyre’s method: The method devised by the German Economist Etienne Laspeyre in 1871 for calculating the price indices
for a current period is known as Laspeyre’s method of index number. Under this method, we get the weighted index on the basis
of aggregative expenditure assuming that the quantities consumed in the base year are also the quantities consumed in the current
year.
�
01=
∑�
1�
0
∑�
0�
0
×100

Paasche’s method: Paasche, German Mathematician, has introduced this method in 1874 as an improvement over the Laspeyre’s
method analyzed above. In this method Mr. Paasche has taken the quantities of the current year as the respective weights of the
items in a fixed manner.

�
01=
∑�
1�
1
∑�
0�
1
×100

Marshall and Edgeworth’s method: The formula enunciated by Marshall and Edgeworth for constructing an index number is
known as Marshall-Edgewroth’s method. In this method, they have suggested to take the arithmetic average of the quantities of
the base year, and the current year as the weights of the items.
�
01=
∑�
1�
0+∑�
1�
1
∑�
0�
0+∑�
0�
1
×100


Fisher’s method (or Fisher’s Ideal Index): The method devised by Prof. Irving Fisher for construction of an index number is
known after his name as fisher’s aggregative method. He has devised a number of methods for the purpose among which the
following method is called as the ideal one.
�
01=100×√
∑�
1�
0
∑�
0�
0
×
∑�
1�
1
∑�
0�
1


Bowley’s Method: This method is a development over the methods of Laspeyre, and Paasche discussed above. This method has
been devised by the famous mathematician Bowley. To do away with the defects of the above said two method, they have
suggested to take the arithmetic average of the two method of Laspeyre and Paasche.

�
01=
1
2
×[
∑�
1�
0
∑�
0�
0
+
∑�
1�
1
∑�
0�
1
]×100


Weighted Relative Method
This method is popularly known as the weighted price relative method or family budget method

P
01=
∑PW
∑W
or
∑IV
∑V

Chain base method

Under this method, the base year’s price does not remain fixed but moves step by step from price for each of the succeeding
years. The formula for computing the indices under this method remains the same as displayed above except that the p0 and q0
represent the immediately preceding year’s price and quantity respectively

Comparison between Laspeyre’s and Paasche’s index number

Laspeyre’s Index Number Paasche’s Index Number
i. Here, quantity of the base year is assumed to be the
quantity of the current year.
ii. It has an upward bias i.e., the numerator of the index
number is increased due to the assignment of higher weights
fixed on the basis of the base year’s quantities even though
there might have been a fall in the quantity consumed during
the current year due to rise, or fall in price and change in
tastes, habits and customs etc. in the current year.
iii. As the quantities of the base year are used as weights,
the influence of price changes on quantities demanded do
not get reflected in the index number.
iv. It measures changes in a fixed marked basket of goods
and services as the same quantities are used in each period.
v. Here weights remain constant.
i. Here, quantity of the current year is assumed to be the quantity
of the base year.
ii. It has a downward bias i.e., the numerator of the index number
is decreased due to the assignment of lower weights fixed on the
basis of the current year’s quantities even though the quantities in
the current year might have fallen due to rise or fall in price, or
change in habits of consumption.

iii. As the quantities of the current year are used as weights, the
influence of price change on quantities demanded gets reflected in
the index number.
iv. It continually updates the quantities to the level of current
consumption.
v. Here, weights are determined every time an index number is
constructed.

.

Difference between fixed base method and chain base method

Fixed base Index Chain base Index
1. It is easy to understand by a common man as each year’s
price is expressed as percentage of a fixed base year’s price
2. It is simple to calculate as the denominator remains fixed
for all the cases.
3. Here, the base period remains fixed.
4. It does not permit frequent alterations of the weights of
different items.
5. It does not facilitate comparison between two adjacent
periods.
6. It is greatly affected by seasonal variations.
7. It is suitable for long period and not for short period
8. It embraces the fresh data with those of the remote past for
which the comparison loses its significance, particularly, in the
field of economics and commerce.
1. It is difficult to understand by a common man as different
years’ price is expressed as percentage of different base
year’s price.
2. It is difficult and tedious to calculate as the denominator
changes every time.
3. Here, the base period changes from year to year.
4. It permits frequent adjustment of the weights of different
items.
5. It facilitates comparison between two adjacent periods.
6. It is least affected by seasonal variations.
7. It is suitable for short period, and not for long period.
8. It comprises the data with those of the recent past for
which the comparison appears very significant, particularly,
in the field of business and economics

Practical Problems:

1. From the following data compute the price index for 2008 on the basis of 2008 prices:
Commodities Unit Price in 2005 Price in 2008
Rice
Dal
Vegetables
Meat
Fish
Milk
Clothing
Quintal
Kg.
Kg.
Kg.
Kg.
Liter
Meter
500
15
6
40
30
4
25
600
20
8
50
40
11
3
[Ans: 122]

2. From the data given below find out the index numbers for each of the years given using mean and Geometric mean.
Items Prices
2006 2007 2008 2009
A
B
C
D
E
2
8
4
3
1
3
10
5
6
4
4
12
8
7
6
5
15
10
8
9

3. From the data given below, compute the index number for 2009 using Laspeyre’s, Paasche’s, Bowley’s, Marshall-Edgeworth
and Fisher’s index No.
Articles 2006 2009
Price Value Price Value
Rice
Wheat
Dal
Fish
Milk
5
8
6
3
4
50
48
18
30
8
4
7
5
2
6
48
49
20
16
12

Cost of Living Index

4. An enquiry into the budgets of certain middle class families in a town gave the following information:
Heads of exps Food Rent Clothing Fuel Misc.
Price in 2006
% of exps. In 2006
Price in 2009
% of exps. in 2009
100
30
90
25
20
15
20
20
70
20
60
30
20
10
15
15
40
25
55
10
Compute for 2009 weighted arithmetic mean of the price relative, and weighted geometric mean of the price relatives taking p0q1
as the weights of the items. Also, show that Paasche’s index number is equal to the weighted arithmetic mean of the price
relatives.

5. From the following group indices of wholesale prices in India for the year 2009, and the group weights compute the index
number by the method of weighted price relatives:
Group Group Indices Group Weights
Food
Manufacturing
Industrial raw materials
Semi-manufacturing
Miscellaneous
450
300
520
400
650
25
30
20
15
5

6. From the following table of money wages, and cot of living index numbers, find real wage index numbers for all the seven
years. Also, determine the purchasing power of rupee:
Year: 2003 2004 2005 2006 2007 2008 2009
Wages
Indices
65
100
70
110
75
120
80
130
90
150
100
200
120
250
[Ans: 100, 97.91, 96.15, 94.68, 92,31, 76.92,73.85]

7. From the following table compute the real income indices:
Year. 01 02 03 04 05 06 07 08 09
Income
Indices
360
100
420
104
500
115
550
160
600
280
640
290
680
300
720
320
750
330
[Ans: 100, 112, 121, 95, 59, 61, 63, 62, 63]

8. From the following data relating to the annual wages and the price indices, determine by deflation:
i. The purchasing power of money
ii. Real wages and
iii. Real wage index
Year 2003 2004 2005 2006 2007 2008 2009
Wages
P01
180
100
220
170
340
300
360
320
365
330
370
340
378
350
[Ans; 1, .59, .33, .31, .30, .29, .29; 180, 129.41, 113.33, 112.5, 110.61, 108.82, 107.14; 100, 72, 63, 62.5, 61, 60, 59.5]

Chain base Index:

9. From the data given below construct an index number by chain base method:
Year 2004 2005 2006 2007 2008 2009
Price 50 60 62 65 70 78
[Ans: 100, 120, 124, 130, 140, 156]

10. Construct the chain indices from the link relatives given below:
Year 2005 2006 2007 2008 2009
Link Index 100 105 85 115 102

12. From the chain base index numbers given below find fixed base index numbers:
Year 2004 2005 2006 2007 2008 2009
CBI 80 110 120 90 140 150
[Ans: 80, 88, 105.6, 95.04, 133.06]

13. Compute chain index numbers with 2001 prices as base, from the following table giving wholesale prices of the commodities
A, B and C for the year 2001 to 2005:
Commodities Average Wholesale Prices (in Rs.)
2001 2002 2003 2004 2005
A
B
C
2
8
4
3
10
5
5
12
7
7
14
9
8
18
12
[Ans: 100, 133.33, 189.55, 243.38, 305.198]

Base shifting:

14 From the following data relating to average prices of commodity compute the index numbers with 2001 as base, and recast
the indices thus obtained by shifting the bases to 2005:
Year 01 02 03 04 05 06 07 08 09
Price 50 62 65 68 75 78 82 84 88
[Ans: 67, 80, 83, 87, 100, 104, 109, 112, 117]

15. From the index numbers given below, find out index numbers by shifting base from 2001 to 2006 and then 2008:
Year 01 02 03 04 05 06 07 08
P01 120 150 160 180 200 200 210 240
[Ans: Base 2006 - 60, 75, 80, 90, 100, 100, 105, 120]; Base 2008:- 50, 62.5, 66.7,75, 83.3, 83, 87.5, 100]

Splicing
16. Splice the following two index-number series, continuing series A. forward and the series B back wards:
Year 2004 2005 2006 2007 2008 2009
Series A 100 120 150
Series B 100 110 120 150

[Ans: A-165, 180, 225; B- 66.66, 80, ]

17. Splice the following two index-number series, continuing series A. forward and the series B back wards and Series C
Year 2004 2005 2006 2007 2008 2009
A 100 150 200
B 100 120
C 100 150 170
Construct the continuous series of index numbers with base 2007 through backward splicing of the tree series.[Ans: 41.67, 62.50,
83.33, 100, 150, 170]

Time and Factor Reversal Test

18. Prove using the following date that time reversal test and factor reversal test are satisfied by fisher’s ideal formula for index
number:
Commodity Base year Current year
Price Quantity Price Quantity
A
B
C
D
E
6
2
4
10
8
50
100
60
30
40
10
2
6
12
12
56
120
60
24
36

19. The following figures relate to the prices and quantities of certain commodities .construct an appropriate index number and
show if it satisfies the time and factor reversal tests:
Commodity Base year Current year
Price Quantity Price Quantity
A
B
C
4
3
2
50
10
5
10
9
4
40
2
2
[Ans: Fisher’s ideal Index No. 252.4]

Quantity Index Number
20. Construct the quantity indices for the following data:
Commodity Quantity produced Price
2007 2008 2009 2007
A
B
C
15
20
10
16
22
20
17
24
30
25
75
60
[Ans: 131.33; 162.4]

TIME SERIES

ANALYSIS OF TIME SERIES

When quantitative data are arranged in the order of their occurrence, the resulting statistical series is called a time series. The
quantitative values are usually recorded over equal time interval daily, weekly, monthly, quarterly, half yearly, yearly, or any
other time measure. Monthly statistics of Industrial Production in India, Annual birth-rate figures for the entire world, yield on
ordinary shares, weekly wholesale price of rice, daily records of tea sales or census data are some of the examples of time series.
Each has a common characteristic of recording magnitudes that vary with passage of time.
Time series are influenced by a variety of forces. Some are continuously effective other make themselves felt at recurring time
intervals, and still others are non-recurring or random in nature. Therefore, the first task is to break down the data and study each
of these influences in isolation. This is known as decomposition of the time series. It enables us to understand fully the nature of
the forces at work. We can then analyse their combined interactions. Such a study is known as time-series analysis.

Components of time series
1. Basic or Secular or Long-time trend;
2. Seasonal variations;
3. Business cycles or cyclical movement; and
4. Erratic or Irregular fluctuations.

These components provide a basis for the explanation of the past behaviour. They help us to predict the future behaviour. The
major tendency of each component or constituent is largely due to casual factors. Therefore a brief description of the components
and the causal factors associated with each component should be given before proceeding further.

1. Basic or secular or long-time trend : Basic trend underlines the tendency to grow or decline over a period of years. It is the
movement that the series would have taken, had there been no seasonal, cyclical or erratic factors. It is the effect of
such factors which are more or less constant for a long time or which change very gradually and slowly. Such factors
are gradual growth in population, tastes and habits or the effect on industrial output due to improved methods. Increase
in production of automobiles and a gradual decrease in production of foodgrains are examples of increasing and
decreasing secular trend.

2. Seasonal Variations : The two principal factors liable for seasonal changes are the climate or weather and customs. Since,
the growth of all vegetation depends upon temperature and moisture, agricultural activity is confined largely to warm
weather in the temperate zones and to the rainy or post-rainy season in the torried zone (tropical countries or sub-
tropical countries like India). Winter and dry season make farming a highly seasonal business. This high irregularity of
month to month agricultural production determines largely all harvesting, marketing, canning, preserving, storing,
financing, and pricing of farm products. Manufacturers, bankers and merchants who deal with farmers find their
business taking on the same seasonal pattern which characterise the agriculture of their area.

3. Business Cycle : Because of the persistent tendency for business to prosper, decline, stagnate recover; and prosper again,
the third characteristic movement in economic time series is called the business cycle. The business cycle does not
recur regularly like seasonal movement, but moves in response to causes which develop intermittently out of complex
combinations of economic and other considerations. When the business of a country or a community is above or below
normal, the excess deficiency is usually attributed to the business cycle. Its measurement becomes a process of contrast
occurrences with a normal estimate arrived at by combining the calculated trend and seasonal movements. The
measurement of the variations from normal may be made in terms of actual quantities or it may be made in such terms
as percentage deviations, which is generally more satisfactory method as it places the measure of cyclicaltendencies on
comparable base throughout the entire period under analysis.

4. Erratic or Irregular Component : These movements are exceedingly difficult to dissociate quantitatively from the business
cycle. Their causes are such irregular and unpredictable happenings such as wars, droughts, floods, fires, pestilence,
fads and fashions which operate as spurs or deterrents upon the progress of the cycle.

Mathematical Statement of the Composition of Time Series

A time series may not be affected by all type of variations. Some of these type of variations may affect a few time series, while
the other series may be effected by all of them. Hence, in analysing time series, these effects are isolated. In classical time series
analysis it is assumed that any given observation is made up of trend, seasonal, cyclical and irregular movements and these four
components have multiplicative relationship.
Symbolically :

O = T × S × C × I
where O refers to original data,
• T refers to trend.
• S refers to seasonal variations,
• C refers to cyclical variations and
• I refers lo irregular variations.

This is the most commonly used model in the decomposition of time series.
There is another model called Additive model in which a particular observation in a time series is the sum of these four
components.
O = T + S + C + I


Methods of Measuring Trend

(i) Free hand curve method
(ii) moving averages method
(iii) semiaverages method
(iv) least-squares method


Freehand Curve Method

The term freehand is used to any non-mathematical curve in statistical analysis even if it is drawn with the aid of drafting
instruments. This is the simplest method of studying trend of a time series. The procedure for drawing free hand curve is an
follows :
(i) The original data are first plotted on a graph paper.
(ii) The direction of the plotted data is carefully observed.
(iii) A smooth line is drawn through the plotted points.
While fitting a trend line by the freehand method, an attempt should be made that the fitted curve conforms to these conditions.

(i) The curve should be smooth either a straight line or a combination of long gradual curves.
(ii) The trend line or curve should be drawn through the graph of the data in such a way that the areas below and
above the trend line are equal to each other.
(iii) The vertical deviations of the data above the trend line must equal to the deviations below the line.
(iv) Sum of the squares of the vertical deviations of the observations from the trend should be minimum.

: Draw a time series graph relating to the following data and fit the trend by freehand method :
Years 1981 1982 1983 1984 1985 1986 1987 1988 1989
Production Million Metric Tons 6.6 6.9 5.6 6.3 8.4 7.2 7.2 8.5 8.5

We observe that the graph of the original data does not show any closeness to any type of curve. It looks like it increases very
slowly in a straight (linear) manner. Thus we draw a line AB as an approximation to the original graph. The line AB represents
the trend line, and from this we read the trend values for the given years.


Method of Moving Averages

Suppose that there are n time periods denoted by t1,t2,t3,…,tn and the corresponding values of the Y variable are
Y1,Y2,Y3,…,Yn. First of all we have to decide the period of the moving averages. For a short time series we use a period of 3 or
4 values, and for a long time series the period may be 7, 10 or more. For a quarterly time series we always calculate averages
taking 4-quarters at a time, and in a monthly time series, 12-monthly moving averages are calculated. Suppose the given time
series is in years and we have decided to calculate 3-year moving averages. The moving averages denoted by a1,a2,a3,…,an−2
are calculated as below:

The average of the first 3 values is (Y1+Y2+Y3 ) / 3 and is denoted bya1. It is written against the middle year t2. We leave the first
value Y1 and calculates the average for the next three values. The average is (Y2+Y3+Y4 ) / 3=a2 and is written against the middle
yearst3. The process is carried out to calculate the remaining moving averages. 4-year moving averages are calculated as:


The first average is a1 which is calculated as

(Y1+Y2+Y3+Y4) / 4=a1

. It is written against the middle of t3 and t4. The two averages a1 and a2 are further averaged to get an average of
(a1+a2 ) /2=A1, which refers to the center of t3 and is written against t3. This is called centering the 4-year moving averages. The
process continues until the end of the series to get 4-years moving averages centered. The moving averages of some proper
period smooth out the short term fluctuations and the trend is measured by the moving averages.

Method of Semi-Averages

This method is as simple and relatively objective as the free hand method. The data is divided in two equal halves and the
arithmetic mean of the two sets of values of Y is plotted against the center of the relative time span. If the number of observations
is even the division into halves will be straightforward; however, if the number of observations is odd, then the middle most item,
i.e.,(n+1) /2th, is dropped. The two points so obtained are joined through a straight line which shows the trend. The trend values
of Y, i.e., Yˆ, can then be read from the graph corresponding to each time period.

Since the arithmetic mean is greatly affected by extreme values, it is subjected to misleading values, and hence the trend obtained
by plotting by means might be distorted. However, if extreme values are not apparent, this method may be successfully
employed. To understand the estimation of trends, using the above noted two methods, consider the following working example.
Example:

Measure the trend by the method of semi-averages by using the table given below. Also write the equation of the trend line with
origin at 1984-85.

Years Value in Million
1984 – 85 18.6
1985 – 86 22.6
1986 – 87 38.1
1987 – 88 40.9
1988 – 89 41.4
1989 – 90 40.1
1990 – 91 46.6
1991 – 92 60.7
1992 – 93 57.2
1993 – 94 53.4



Y2 = Trend for 1991 – 92 = 50.60 T2 = 1991 – 92
Y1 = Trend for 1986 – 87 = 32.32 T1 = 1986 – 87
Increase in trend in 5 years = 18.28
Increase in trend in 1 year = 3.656


Y = Y1 + ( (Y2 –Y1) / (T2 –T1 ) ) ( t –T1 )

Y = 32.32 + ( (50.60-32.32) / ( 1992-1987) ) (t -1987 )
Y= 32.32 + 3.656 * (t -1987)

The trend for one year is 3.656. This is called the slope of the trend line and is denoted by b.

Year t Y= 32.32 + 3.656 * (t -1987)

1 1985 25.008
2 1986 28.664
3 1987 32.32
4 1988 35.976
5 1989 39.632
6 1990 43.288
7 1991 46.944
8 1992 50.60
9 1993 54.256
10 1994 57.912

LEAST SQUARES LONG METHOD :

It makes use of the above mentioned two normal equations without attempting to shift the time variable to convenient mid-year.
This method is illustrated by the following example

Fit a linear trend curve by the least-squares method to the following data :

Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Production
(Kg.) 3 5 6 6 8 10 11 12 13 15

Solution : The first year 2001 is assumed to be 0, 2002 would become 1, 2003 would be 2 and so on. The various steps are
outlined in the following table.
Year Production (Kg.)
Y X XY X2
2001 3 0 0 0
2002 5 1 5 1
2003 6 2 12 4
2004 6 3 18 9
2005 8 4 32 16
2006 10 5 50 25
2007 11 6 66 36
2008 12 7 84 49
2009 13 8 104 64
2010 15 9 135 81
TOTAL 89 45 506 285


The above table yields the following values for various terms mentioned below :n = 10, ∑X = 45, ∑X2 = 285, ∑Y = 89, and
∑XY = 506

Substituting these values in the two normal equations, we obtain
89 = 10a + 45b ...(i)
506 = 45a + 285b ...(ii)
Multiplying equation (i) by 9 and equation (ii) by 2, we obtain
80l = 90a + 405b ...(iii)
1012 = 90a + 570b ...(iv)
Subtracting equation (iii) from equation (iv), we obtain
211 = 165b or b = 211/165 = 1.28
Substituting the value of b in equation (i), we obtain
89 = 10a + 45 × 1.28
89 = 10a + 57.60
10a = 89 – 57.6
10a = 31.4
a = 31.4/10 = 3.14
Substituting these values of a and b in the linear equation, we obtain the following trend line
Yc = 3. 14 + 1.28X
Inserting various values of X in this equation, we obtain the trend values as below

Year X EQUATION
3.14 + 1.28 * X
2001 0 3.14
2002 1 4.42

2003 2 5.7
2004 3 6.98
2005 4 8.26
2006 5 9.54
2007 6 10.82
2008 7 12.1
2009 8 13.38
2010 9 14.66





Q.1 Fit A Straight Line Trend Equation By The Method Of Least Squares And Estimate The Value Of
2019:
Year 2010 2011 2012 2013 2014 2015 2016 2017
Value 380 400 650 720 690 600 870 950
Ans :1060.87

Q. 2 From The Following Data Calculate The Trend Values Using 4 Yearly Moving Average
Year 1989 1990 1991 1992 1993 1994 1995 1996 1997
Values 506 620 1036 673 588 696 1116 738 663

Q.3 Fit A Trend By The Method Of Semi Average To The Data Given Below Estimate The Sales For
The Year 1984?
Year 1975 1976 1977 1978 1979 1980 1981 1982 1983
Sales 18 24 26 28 33 36 40 44 48
Q. 4 From The Following Data Calculate The Trend Values Using 3 Yearly Moving Average
Year 1989 1990 1991 1992 1993 1994 1995 1996
Values 56 60 106 63 88 96 116 73


0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 910
Series1
Tags