Types of Data and it's representation- Graphs and table
Size: 1.83 MB
Language: en
Added: May 09, 2018
Slides: 55 pages
Slide Content
TYPES OF DATA AND GRAPHICAL / TABULAR REPRESENTATION DR. REENA TITORIA
INTRODUCTION Statistics may be defined as the science, which deals with collection, presentation, analysis and interpretation of numerical data DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
Collected data should be • Accurate (i.e. Measures true value of what is under study) • Valid( i.e. Measures only what is supposed to measure) • Precise(i.e. Gives adequate details of the measurement) • Reliable(i.e. Should be repeatable)
Types of DATA Qualitative/ Quantitative Discrete/ Continuous/ Interval/ Ratio Primary/ Secondary Nominal/ Ordinal
Quantitative data: Qualitative data: Also called as measurement data Can be expressed as number with or without unit of measurement Eg : Height in cm, Hb in gm%, BP in mm of Hg, Weight in kg Represents a particular quality or attribute Expressed as numbers without unit of measurements Eg : religion, Sex, Blood group etc
Discrete data: Here we always get a whole number. Ex: Number of beds in hospital Malaria cases Continuous data : I t can take any value possible to measure or possibility of getting fractions Ex: Hb level, Ht, Wt. WHAT IS IMPORTANT???
Interval: Has values of equal intervals that mean something. For example, a thermometer might have intervals of ten degrees Ex: Celsius Temperature, IQ (intelligence scale) Ratio: Exactly the same as the interval scale except that the zero on the scale means: does not exist Ex: Age, Weight, Height
Primary data : Data collected by the investigator himself/ herself for a specific purpose Ex : Data collected by a student for his/her thesis or research project Advantages : The investigator collects data specific to the problem under study. There is no doubt about the quality of the data collected (for the investigator). If required, it may be possible to obtain additional data during the study period .
Secondary data : Data collected by someone else for some other purpose (but being utilized by the investigator for another purpose) Ex : Census data being used to analyze the impact of education on career choice and earning Advantages of using Secondary data: The data’s already there- no hassles of data collection It is less expensive The investigator is not personally responsible for the quality of data (“I didn’t do it”)
Nominal data: The information or data fits into one of the categories, but the categories cannot be ordered Categories without order Ex: Colour of eyes, Race, Gender Ordinal data: A rank or order Here the categories can be ordered, but the space or class interval between two categories may not be the same Ex: Ranking in the class or exam, SES
QUESTION A person's highest educational level is which type of variable? Continuous Discrete Ordinal Nominal The number of motor-vehicle accidents on a particular stretch of the national highway in a week is which type of variable? Continuous Discrete Nominal Ordinal
REPRESENTATION OF DATA Tabular Graphic Numeric
When to use Tables When you wish to show how a single category of information varies when measured at different points When the dataset contains relatively few numbers When the precise value is crucial to your argument and a graph would not convey the same level of precision For example: when it is important that the reader knows that the result was 2.48 and not 2.45 When you don’t wish the presence of one or two very high or low numbers to detract from the message contained in the rest of the dataset
Tabular Presentation 1. Table must be numbered 2. Brief and self explanatory title must be given to each table 3. The heading of columns and rows must be clear, sufficient, concise and fully defined 4. The data must be presented according to size of importance, chronologically, alphabetically or geographically 5. Table should not be too large 6. The classes should be fully defined, should not lead to any ambiguity 7. The classes should be exhaustive i.e. should include all the given values 8. The classes should be mutually exclusive and non overlapping. 9. The classes should be of equal width or class interval should be same 10. The number of classes should be neither too large nor too small
Normal Range 18.5 ≤ x < 25
Frequency distribution table with quantitative data: Table 1: Fasting blood glucose level in diabetics at the time of diagnosis (n=78) Fasting Glucose n 120-129 12 130-139 8 140-149 10 150-159 10 160-169 15 170-179 18 180-189 5
Cross- Tabulation Table 2: Fasting blood glucose level in diabetics at the time of diagnosis (n=78)
Frequency distribution table with qualitative data: Table 1: Cases of malaria in adults and children in the months of June and July 2010 in Nair Hospital (n=389)
EXAMPLE This is a poor example because: • The table lacks a title • The source of the information is not provided • Row titles overlap two lines • The alphabetical listing of regions results in a non-numerical ordering of data down the columns
EXAMPLE This is a better example because: • The table has title • The source of the information is provided • Row titles not in two lines • The alphabetical listing of regions results in a numerical ordering of data down the columns • Numbers are aligned
Graphical P resentation A Graphical representation is a visual display of data and statistical results. It is more often and effective than presenting data in tabular form Graphical representation helps to quantify, sort and present data in a method that is understandable to a large variety of audience Graphs also enable us in studying both time series and frequency distribution as they give clear account and precise picture of problem Graphs are also easy to understand and eye catching
General Principles of Graphic Presentation In a graph there are two lines called coordinate axes One is vertical known as Y axis and the other is horizontal called X axis These two lines are perpendicular to each other. Where these two lines intersect each other is called ‘0’ or the Origin On the X axis the distances right to the origin have positive value and distances left to the origin have negative value On the Y axis distances above the origin have a positive value and below the origin have a negative value It should have a title, legend and labelling
VARIOUS CHARTS AND DIAGRAMS Bar Diagram Histogram Frequency polygon Cumulative frequency curve/ Ogive Scatter diagram Line diagram Pie diagram Pictogram Stem and Leaf Plot
BAR DIAGRAM Bar charts are used for qualitative type of variable in which the variable studied is plotted in the form of bar along the X-axis (horizontal) and the height of the bar is equal to the percentage or frequencies which are plotted along the Y-axis (vertical). The width of the bars is kept constant for all the categories The space between the bars also remains constant throughout. The number of subjects along with percentages in bracket written on the top of each bar Types: Simple Compound Component
SIMPLE BAR CHART When we draw bar charts with only one variable or a single group it is called as simple bar chart
COMPOUND BAR CHART W hen two variables or two groups are considered it is called as multiple/ compound bar chart In multiple bar chart the two bars representing two variables are drawn adjacent to each other and equal width of the bars is maintained
COMPONENT BAR CHART Bar chart wherein we have two qualitative variables which are further segregated into different categories or components is called component bar chart In this the total height of the bar corresponding to one variable is further sub-divided into different components or categories of the other variable
HISTOGRAM A histogram is used for quantitative continuous type of data where, on the X-axis, class intervals and on the Y-axis we plot the frequencies It is very similar to the bar chart with the difference that the rectangles or bars are adherent (without gaps) It is used for presenting class frequency table (continuous data) Diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval
Distribution of the subjects by Cholesterol level Serum Cholesterol (mg/dl) No. of Subjects Percentage (%) 175-200 3 30 200-225 3 30 225-250 2 20 250-275 1 10 275-300 1 10 Total 10 100 EXERCISE
EXERCISE
FREQUENCY POLYGON AND CURVE •Plot the variable along the X-axis and the frequencies along the Y-axis •Derived from a histogram by connecting the mid points of the tops of the rectangles in the histogram •The line connecting the centres of histogram rectangles is called frequency polygon •If we construct a smooth freehand curve passing through these points. Such a curve is known as frequency curve
(n=37)
CUMULATIVE FREQUENCY DIAGRAM
One can tell the number of patients that lie above or below a certain level