Data Organizarion and presentation (1).pptx

MuhammadAsif297069 37 views 50 slides Oct 14, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Data Organizarion and presentation (1).pptx


Slide Content

DATA ORGANIZATION&PRESENTATION

The raw material of Statistics is called data. We may define data as figures. Figures result from the process of counting or from taking a measurement. For example: When a hospital administrator counts the number of patients (counting). When a nurse weighs a patient ( measurement) Data

  Data from normal population to set bench marks or standards. Data from sick population to describe the disease or vital events. To compare the characteristics of normal population in various localities, countries and regions. To compare the normal with the abnormal OBJECTIVES OF DATA COLLECTION:

Primary Secondary Ungrouped Grouped Examples: Observations Questionnaire Interviews Survey Examples: Census Medical records. are presented or observed individually. Example: List of weight (in pounds) for six men: 140,150,160, 150,150,160. Data presented in various classes or categories Classification of Data

Observation.  Face to face interview  Telephone interview  E-mail interview  Focus group discussion  Written Questionnaire  Existing records METHODS OF DATA COLLECTION

  Tabulation. Diagrams Graphs   Data can be Presented in 3- Forms :

A table is systematic arrangement of data into vertical columns and rows. So the process of arranging the data into columns and rows is called Tabulation.   Can be simple, 2 x 2 and complex tables.   1-Tabulation

    Tables should be numbered . Title must be given to each table, which must be brief and self explanatory.   Headings of the columns and rows should be clear and concise . Data to be presented according to size or importance; chronologically, alphabetically or geographically.   If percentages or averages are to be compared, they should be placed as close as possible . No table should be too large . Foot notes where additional information to be provided.   Principles to be followed while designing tables

Years Population 1991 115 million 1995 122 million 1998 130 million 2002 145 million(Estimated ) *Census of Pakistan 1998 Table -1 Population of Pakistan

Data is first split into class intervals and the number of items (frequency) which occur in each group is shown in the adjacent table Guidelines for class intervals: Number of classes should be small enough to provide an effective summary but large enough to display the relevant characteristics of the data. Usually the number of classes should be between 5 and 20. Each piece of data must belong to one class. All classes should have the same width.   Frequency Distribution Table

Classes : Categories for grouping data Frequency: The number of pieces of data in a class Frequency distribution : A listing of classes and their frequencies Relative frequency : The ratio of the frequency of the class to the total number of pieces of data.( rf ) rf = f/ E f Terms used in Data grouping

Relative frequency distribution : A listing of classes and their relative frequencies Lower class limit : The smallest value that can go into a class Upper class limit : The largest value that can go into a class Class mark/Mid point: The midpoint of a class Class width : The difference between the lower class limit of the given class and lower class limit of the next higher class Term Used

  It shows at a glance how many individual observations are in a group and where the main concentration lies.   It also shows the range, and the shape of distribution.   These tables can also be extended to relative frequency distribution tables and cumulative frequency distribution table. A cumulative frequency is obtained by summing the frequencies of all classes representing values less than specified class limit. Cumulative relative frequency is expressed as a percentage.   Advantages of Frequency Distribution Tables:

210 209 212 208 217 207 210 203 208 210 210 199 (L) 215 221 (H) 213 218 202 218 200 214 Cholesterol levels of the 20- patients

Cholesterol level Tally Method Frequency 195-199 I 1 200-204 111 3 205-209 1111 4 210-214 11111,11 7 215-219 1111 4 220-224 1 1 ToT . Nos. of frequencies 20

Classes/Level Frequency Relative Frequency Class mark 195-199 1 0.05 200-204 3 0.15 205-209 4 210-214 7 0.35 212 215-219 4 220-224 1 20 7/20 N o . of freq in particular class / tot no of frequency L+U/2

Total blood cholesterol level (mg/dl) Frequency Relative frequency Cumulative Frequency Cumulative relative frequency distribution 100-119 2 0.95 2 0.95 120-139 2 0.95 4 1.9 140-159 6 2.9 10 4.8 160-179 33 15.8 43 20.6 180-199 36 17.2 79 37.8 200-219 40 19.1 119 56.9 220-239 29 13.9 148 70.8 240-259 27 12.9 175 83.7 260-279 13 6.2 188 89.9 280-299 9 4.3 197 94.2 300-319 11 5.3 208 99.5 320-339 208 99.5 340-359 208 99.5 360-379 1 0.5 209 100.0 Total 209 100.0 Cumulative frequency distribution of total blood cholesterol levels

Tabulation is the simplest way to present nominal data (or ordinal data, if there are not too many points on the scale) is to list categories in one column of the table or percentage of observation in another column.

Methods of delivery No. of births Percentage Normal 478 79.7 Forceps 65 10.8 Caesarean section 57 9.5 Total 600 100 Table: Method of Delivery of 600 Babies B orn in Hospital

Contingency Tables:   Data obtained from observing values of two variables is called bivariate data. They can be grouped using tables called contingency tables.   In its general form, the ‘r’ by ‘c’ contingency table contains counts of observations arranged in rows and columns representing various levels of exposure in discrete data . S uch as diseased/non diseased and exposed/non exposed there are two columns and two rows and table is referred as 2x2 table.  

Age & Sex Age & Sex Age & Sex 21 M 29 F 22 M 20 M 20 M 23 M 32 F 18 F 19 F 21 M 21 M 21 M 19 F 26 M 21 F Example: Bivariate data on age in years and sex were obtained from the students attending the Medical class. Under 21 21-25 Over 25 Total Male 2 6 1 9 Female 3 1 2 6 Total 5 7 3 15

Graphical Representation

Many people have no taste for figures and they would prefer a way of representation where figures could be avoided. This purpose is achieved by representing statistical data visually- -- Graphical Representation. It can be divided into graphs and diagrams.   The basic difference between a graph and a diagram is that a graph is a representation of data by a continuous curve while diagram is any other visual. Graphical Representation

Advantages: Powerful impact on the imagination of people popular method in news papers and magazines . Better retained in the memory than the statistical tables. Data must be simple . Comparison is easier with the diagrams . Disadvantage: Lot of details of the original data may be lost in the charts and diagrams.   Advantages & Disadvantage:

They are divided into: Simple Bar Chart . Multiple bar chart Histogram Pie diagram Pictogram DIAGRAMS OR CHARTS

Popular media of presenting statistical data, usually the nominal or ordinal data . Easy to prepare and enable values to be compared visually. Counts or percentages of the characteristics of interest are shown as bars. Length of the bar is proportional to the magnitude to be represented.   These bars do not abut each other   BAR CHARTS:

Simple Bar C hart: S howing the M ean Depth of Water S ources D istricts

Multiple B ar C harts : Two or more bars can be grouped together.

Component bar chart: The bars may be divided into two or more parts—each part representing a certain item and proportional to the magnitude of that particular item.   Example: Component bar chart showing household water sources district wise.  

It consists of a set of adjacent bars . The width of the bars corresponds to class intervals---along the X-axis and the frequencies/percentages of observations for each value are on the vertical Y-axis (ordinate ). Height of each bar is equal to the frequency of the class it represents . Displays the distribution of quantitative continuous variable . Only one set of data is shown. If the class intervals are not of equal width, it is necessary to make adjustments so that the total area remains in proper proportion and direct comparisons can be made.     HISTOGRAM

For purposes of visually comparing the distribution of two data sets, it is better to use relative frequency histogram than frequency histograms. This is because the same vertical scale is used for all relative frequency histograms (i.e. 0%-100%).  In public health usually cases, deaths, male, female data are shown by the histogram . Method of depicting the classes on the horizontal axis of a histogram is to use Class Boundaries . Make the class boundaries by subtracting 0.5 from the lower class limit and add 0.5 from upper limits. Example: 50-59 49.5-59.5 60-69 59.5-69.5

  The areas of the segment of the circle are compared. Suitable to express proportion /percentages. Total area of pie is 100% Total angle of the circle is 360. Area of each segment depends upon angle. To find out the angle of a piece of a circle, the formula is X/360=P/T Where X is the angle P is the part T is total. X=P/Tx360   PIE CHART

Example: An office has 188 workers, out of these 25 are professionals, 43 are skilled and 120 are unskilled. Occupation Frequency Sectoral angle Professionals 25 A Skilled 43 B Un-skilled 120 C P=25 T=188 X=25/188x360=48 Another method to find out the angle of the pie is:   100%=360 1%=3.6%

Popular method of presenting data for the lay persons who cannot understand charts. It is a form of a bar chart.  Small pictures or symbols are used to present the data. Example: A picture of doctor to represent the population per physician     PICTOGRAM 

  It shows the relationships between two variables. The scatter diagram plots the value of each pair of bivariate observations ( x,y ) at the point of intersection of the vertical line through the x value on the abscissa and of the horizontal line through the y value of the ordinate. If the dots cluster round a straight line, it shows evidence of a relationship of a linear nature. If there is no such cluster, it is probable that there is no relationship between the variables.   The diagonal line is called the regression line or sometimes the line of best fit.     Scatter Diagram:

Frequency Polygon (Line graphs): It is a graphical form of a frequency distribution. It is constructed by plotting the frequencies against class mid points and connecting them by a straight line. It can also be constructed by joining the midpoints of the tops of successive rectangles in the histogram by means of straight line.   GRAPHS

Similarly relative frequency (or percentages) polygon can be constructed.   They are used to compare two distributions on the same graph.   The end points of the resulting line are then joined to the horizontal axis at the midpoints immediately below and above the lowest and highest non-zero frequencies respectively.   Frequency polygons may take on a number of different shapes:  

Symmetrical (Bell shaped curve)   In this, there are high frequencies in the centre of distribution and low frequencies in the two extremes, which are called upper and lower tails of the distribution. Example is height

A frequency distribution curve is said to be skewed when it departs from symmetry. Here the frequencies tend to pile up at one end or other end of distribution.   Positively skewed distribution: The upper tail of the distribution is longer than the lower tail.   Negatively skewed distribution: The lower tail of the distribution is longer than the upper tail.   S kewed Distributions :

All three distributions are unimodal , that is, they have just one peak. Sometimes there is a bimodal frequency distribution. This is occasionally seen and usually indicates that the data are a mixture of two separate distributions e.g., hormone levels of males and females.

  The horizontal scale is the same as that used for a histogram, the vertical scale indicates cumulative frequency or cumulative relative frequency .   To construct the ogive , we place a point at the upper class boundary of each class interval.   Each point represents the cumulative frequency for that class. Note that not until the upper class boundary has been reached have all the data of a class interval been accumulated. Ogive is completed by connecting the points.   Useful in comparing two sets of data.   Cumulative Frequency Polygon ( Ogive )

Is used to describe the large continuous data set Is based on five number summary and can be used to provide a graphical display of the centre and variation of a data set. The five number summary of data consists of in increasing order: Min, Q 1 ,Q 2 ,Q 3 , Max Is suited for comparing two or more data sets BOX PLOT (BOX AND WHISKER DIAGRAM)

1.Determine the quartiles for the data 2.Determine the minimum and maximum of the data 3. Draw a horizontal axis on which the values obtained in step 1 and 2. Above this axis mark the quartiles and minimum and maximum with vertical lines 4. Connect the quartiles to each other to make a box and then connect the box to the minimum and maximum with lines. Steps of constructing a box plot

WHISKER DIAGRAM)

Outlier: For a set of numerical data, any value that is markedly smaller or larger than other values is called outlier. Outlier is also considered any value that is more than 1.5 times the Interquartile range (IQR=Q3-Q1) away from the median An outlier requires special attention: It may be the result of a measurement or recording error, a member from a different population than the rest of the sample or simply unusual extreme value

Note that an extreme value need not to be an outlier; it may instead be an indication of skewness. When an outlier is found, its cause should be determined. If it is due to a measurement or recording error or for some other reason it clearly indicates that it does not belong to the data set and it is to be removed
Tags