Coding, editing, Tabulation and validation of data AEC 509 RESEARCH METHODOLOGY FOR SOCIAL SCIENCES (1 +1) GAYATHRY M 2023504003
INTRODUCTION Data which was collected for study cannot reveal everything Being raw data, it need to be processed and analysed to get results Data processing is an intermediate stage between collection of data and their analysis and interpretation, which include checking, editing, coding and tabulation 2
EDITING Editing means to rectify or to order or to establish sequence It is the process of examining the data collected in questionnaire or interview schedule to deduct errors and omissions and to correct those if possible After collecting data, a final check up is made for data processing Editing is done to ensure that the collected data are accurate, consistent with other factors gathered uniformly entered and as complete as possible 3
Types Editing is performed in two stages and depending on that it could be of two types: 4 Field Editing Central Editing/ Centralised Editing Editing
Field editing Process of completing information recorded in abbreviated or in illegible form at the time of recording This sort of editing should be carried out as soon as possible after interview In field editing, completeness of the forms should be checked by person It may be possible that the investigator might have forgotten to record the information If the recorded information is incomplete form, then it should be completed It is done on returning to the office Done by single editor or by a team of editors Editors are free to correct any mistake At central level, the editors must correct various mistakes of the investigator Sometimes, gap in the answers must be edited by reviewing the other questions in the questionnaire In case if correctness of answers is impossible, the wrong answers must be dropped 5 CENTRAL/ CENTRALISED EDITING
Significance of editing It is pre-requisite for accuracy It is useful in the elimination of incorrect reply It ensures the consistency of the collected data and avoids contradiction It is useful in converting answers into uniform units of measurement before coding 6
coding It is the process of organising the data or response into classes or categories and assigning numerical or other symbols to responses according to the class or category under which they fall Hence it is considered as the classification process It is necessary for efficient analysis It is used to compartmentalise several replies into a smaller number of classes which contain information required for analysis 7
coding In the process of coding, the first step is to study the answer and the last step is to transfer the information from the schedule to the separate sheet called transcription sheet Transcription sheet is a large summary sheet which contains the answers of all the respondents Transcription may not be necessary if only simple tables are required and the number of respondents is few 8
coding It is done with help of set rules The classes or categories should be reasonable should be reasonable and should be appropriate to research, under study The coding must be exhaustive; it means there should be class for each item of the data For each answer it should be assigned with separate number Coding should be based on the fact of mutual exclusivity it means specific answer can be placed in only one category Coding must observe the rule of single dimension; it means every category set is defined on terms of only one concept It forms basis for analysis Standard method should be used in case of hand coding 9
Significance It is useful in the classification of the responses into meaningful categories It simplifies the difficult task of processing the qualitative information One code is specific to only one kind of information so that a given response falls in only one category 10
Rules for coding Give code numbers to each respondent and to each response Give code numbers to qualitative response also Prepare the coding frame While editing is being performed, special coding actions are carried out 11
classification It is the process of reducing a large data into homogenous meaningful group It is the process of arranging data into groups based on common characteristics Classification can be done either according to attributes or according to class intervals 12
Classification as per attributes Classification as the attributes may either be descriptive or numerical in nature Data collected based on attributes can be classified only based on attributes In this classification, only one attribute is considered to classify the universe into two classed; one having the attribute and the other not having If more number of attributes are considered the data is divided into number of more number of classes 13
Classification based on interval Numerical data collected refers to quantitative form and they can be measured through some statistical unit Data related to production, income, age come under this category This type of data is classified on the basis of class interval 14
Rules for classification Classification must be exhaustive with no room for confusion regarding the placement of observations in the given classes Classes must not overlap Classification must be in accordance with the objectives of inquiry 15
Significance of classification It is helpful in tabulation It leads to valid result It makes interpretation clear and meaningful 16
Tabulation It is the process of summarising raw data and displaying them n compact form of rows and columns for further analysis It can be done manually or mechanically or electronically It is very important because, It conserves space It is self-explanatory Data computation is made easier Data comparison becomes simple Adequacy and inadequacy of the data is clearly visible 17
Tabulation A table contains rows and columns which form small boxes called cells Tables are classified as One-way table Two-way table Multi-way table Tabulation can be classified as Simple tabulation Complex tabulation 18
Tabulation Simple tabulation Gives information about one or more groups of independent questions This results in one-way table, providing information on one characteristic of the data Complex tabulation Data is divided into two or more categories which gives information regarding more sets of inter-related question It results in two-way or three-way tables which gives information about several inter-related characteristics of the data It is also known as cross tabulation 19
Components of Table Each table should have clear number for the purpose of the reference Every table should also have suitable title should be self explanatory There should be proper heading to each column and row of the table The body of the table contains the numerical information Data presented in the body is arranged as per the description The unit of measurement is frequently written as headnote such as ‘000 (in thousand) or million (i.e. 10 lakhs) or Cr. (i.e. crores) 20
Rules for tabulation Captions and stubs should be clearly arranged in systematic order Measurement should be clearly defined Avoid overloading the table with details Table should be logically arranged Avoid the use of abbreviations Expression like etc. should not be used in the table Dittoes should not be used in the table Not available letter should be used for information not available or even the use of dash can be made for these explanation Miscellaneous items should be placed in the last row of the table Table must be suitable to the requirements of the study 21
Validation Data validation refers to the process of ensuring the accuracy and quality of data It is implemented by building several checks into a system or report to ensure the logical consistency of input and stored In automated systems, data is entered with minimal or no human supervision Therefore, the entered data must be ensured with correctness and desired quality standards Unstructured data, even if entered correctly, will incur related costs for cleaning, transforming and storage 22
Types of data validation Data type check Code check Range check Format check Consistency check Uniqueness check 23
Types of data validation Data type check It confirms that the data entered has the correct data type. For example, a field might only accept numerical data. If this is the case, then any data containing any other characteristics such as letters or special symbols should be rejected by the system Code Check A code check ensures that a field is selected from a valid list of values or follows certain formatting rules. For example, it is easier to verify that a postal code is valid by checking it against a list of valid codes 24
Types of data validation Range check Range check will verify whether input data falls within the predefined range. For example, latitude and longitude are commonly used in geographic data. A latitude value should be between -90 and 90 while a longitude while a longitude value must range between -180 and 180. Any value out of these range are invalid Format check Many data types follow a certain predefined format. A common use case is that stored in a format like “ yyyy -mm-dd” or “dd-mm- yyyy ”. A data validation procedure ensures dates are in the proper format helps maintain consistency across data and through time 25
Types of data validation Consistency check A consistency check is a type of logical check that confirms the data’s been entered I a logically consistent way. An example is checking if the delivery date is after the shipping date for a parcel Uniqueness check Some data like IDs or e-mail addresses are unique by nature. A database should likely have unique entries on those fields. The uniqueness check ensures that an item is not entered multiple times into a database. 26