This section describes how to consolidate and process quantitative data prior to analysis. An example is given that clarifies the application of each step outlined.
Why is Data Consolidation
As a critical first step, following data collection and prior to data analysis, raw quantitative data from questionnaires (or other data collection instruments) must
be processed and consolidated in order to be usable. This will require some form of data cleaning, organising, and coding to so that the data is ready to be entered
into a database or spreadsheet, analysed and compared.
Quantitative data is usually collected using a data collection instrument such as a questionnaire. The number of questionnaires or cases, is usually fairly large,
especially where probability sampling strategies are used. Due to the nature of quantitative inquiry, most of the questions are closed ended and solicit short
‘responses’ from respondents that are easy to process and code. It is almost always necessary to use computer software to analyse the data due to this re-latively
large number of cases (in comparison to qualitative data) as well as variables (e.g. questions on the questionnaire). Microsoft excel and access provide basic
spreadsheet and database functions, whereas more specialised statistical software such as SPSS and Epilnfo can be used where available and where expertise exists.
Ideally consolidation and processing is conducted by the team of interviewers who completed the data collection (WFP or implementing partner staff or consultants),
however, in many cases additional staff are specifically tasked with the work of entering data into pre-formatted spread¬sheets or databases. Data processing and
consolidation needs to be well supervised and con¬ducted as it can significantly affect the quality of subsequent analysis.
Steps to follow for consolidating and processing Quantitative Data
The following 6 steps outline the main tasks related to consolidating and processing quantitative data, prior to analysis.
Step 1: Nominate a Person and set a Procedure to ensure the Quality of Data Entry
When entering quantitative data into the database or spreadsheet, set up a quality check pro¬cedure such as having someone who is not entering data check every 10th
case to make sure it was entered correctly.
Step 2: Entering Numeric Variables on Spreadsheets
Numeric variables should be entered into the spreadsheet or database with each variable on the questionnaire making up a column and each case or questionnaire making
up a row. The type of ‘case’ will depend on the unit of study (e.g. individual, households, school, or other).
Step 3: Entering Continuous Variable Data on Spreadsheets
Enter raw numeric values for continuous variables (e.g. age, weight, height, anthropometric Z¬scores, income). A new categorical variable can be created from the
continuous variable later to assist in analysis. For 2 or more variables that will be combined to make a third variable, be sure and enter each separately. (For
example, the number of children born and the number of children died should be entered as separate variables and the proportion of children who have
died could be created as a third variable). The intent is to ensure that the detail is not lost during data entry so that categories and variable calculations can be
adjusted later if need be.
Step 4: Coding and Labelling Variables
Code categorical nominal variables numerically (e.g. give each option in the variable a number). Where the variable is ordinal (e.g. defining a thing’s position in a
series), be sure to order the codes in a logical sequence (e.g. 1 equals lowest and 5 equals the highest). In SPSS and some other software applications it is possible
to give each numeric variable a value label (e.g. the nominal label that corresponds with the numeric code). For excel and other software that do not have this
function, create a key for each nominal variable that lists the numeric codes and the corresponding nominal label.
Step 5: Dealing with a Missing Value
8e sure to enter a for cases in which the answer given is 0, do not leave the cell blank. A blank cell indicates a missing value (e.g. the respondent did not answer
the question, the interviewer skipped the question by mistake, the question was not applicable to the respondent, or the an-swer was illegible). It is best practice to
code missing values as 99,999, or 9999. Make sure the number of 9?s make the value an impossible value for the variable (e.g. for a variable that is ‘number of
cattle’, use 9999 since 99 cattle may be a plausible number in some areas). It is im-portant to code missing values so that they can be excluded during analysis on a
case by case basis (e.g. by setting the missing value outside the range of plausible values you can selectively exclude it from analysis in any of the computer
software packages described above).
Step 6: Data Cleaning Methods
Even with quality controls it will be necessary to ‘clean the data’, especially for large data sets with many variables and cases. This allows for obvious errors in
data entry to be corrected as well as for excluding responses that simply do not make sense. (Note that the majority of these should be caught in data collection, but
even the best quality control procedures miss some mistakes.) To clean the data run simple tests on each variable in the dataset. For example a variable denoting the
sex or gender of the respondent (1 = male, 2 = female) should only take values 1 or 2. If a value such as 3 exists, then you know a data entry mistake has occurred.
Also look for impossible values (outside the range of plausibility) such as a child weighing 100 kg, a mother being 10 years old, a mother being a male, etc.
An Example of Quantitative Data Consolidation and Processing through the Application of the 6 Steps outlined above
In this example, each household is the unit of study for the survey and is considered a case.
Step 1: Nominate a Person and set a Procedure to ensure the Quality of Data Entry
Every 4th case will be checked by a non-data entry person (ex field editor) to ensure qual¬ity in data entry.
Step 2: Entering Numeric Variables on Spreadsheets and
Step 3: Entering Continuous Variable Data on Spreadsheets Q1: The estimated expenditure on food in the last 6 months Responses on Questionnaires:
Case1 $30
Case 2 $23
Case 3 $112
Case 4 $40
Q2: The estimated total expenditure in the last 3 months Responses on Questionnaires:
Case 1 $50
Case 2 $35
Case 3 $140
Case 4 $35
Enter into database or spreadsheet and create a third variable that is food expenditure as a per-centage of total expenditure.
Food Expenditure in Last 3 Total Expenditure in Last 3 Food Exp as a % of total
Months months Exp
Case 1 $30.00 $50.00 60.00%
Case 2 $23.00 $35.00 65.71%
Case 3 $112.00 $140.00 80.00%
Case 4 $40.00 $35.00 114.29%
Step 4: Coding and Labelling Variables
Code the nominal variables using numeric values. For ordinal variables make sure the order or sequence of numeric values makes sense.
Q3: Name of Village (with corresponding numeric code added) Case 1 Hagadera = 1
Case 2 Hagadera = 1
Case 3 Kulan = 2
Case 4 Bardera = 3
Q4: Highest level of education completion of the head of household (with corresponding ordinal numeric codes that reflect least education to most)
Case 1 some primary, did not complete = 2
Case 2 no formal schooling = 1
Case 3 completed primary, some secondary = 4 Case 4 completed primary 3
Enter into database:
Food Expenditure Total Expenditure Food Exp as a % Village Highest ed. level
in Last 3 Months in Last 3 months of total Exp completion by
HofHH
Case 1 $30.00 $50.00 60.00% 1 2
Case 2 $23.00 $35.00 65.71% 1 1
Case 3 $112.00 $140.00 80.00% 2 4
Case 4 $40.00 $35.00 114.29% 3 3
Step 5: Dealing with a Missing Value Coding missing values
Q5: Number of children under 5 in household Case 1 = 2
Case 2 = 0
Case 3 = no answer given (missing value) Case 4 = 3
Enter into database giving missing value a value of 99 (we use 99 because with multiple wives 9 children under 5 within a household is a possibility, even though it is
a remote 1 for this area).
Food Expendit- Total Expendit- Food Exp as a Village Highest ed. Number of chil-
ure in Last 3 ure in Last 3 % of total Exp level com ple- dren US in HH
Months months tion by HofHH
Case 1 $30.00 $50.00 60.00% 1 2 2
Case 2 $23.00 $35.00 65.71% 1 1 0
Case 3 $112.00 $140.00 8Q.00% 2 4 99
Case 4 $40.00 $35.00 114.29% 3 3 3
Step 6: Data Cleaning Methods
Run data validity checks to ‘clean the data’. Try to find impossible values for each variable. If they are found and reverting to the questionnaire does not clarify
the mistake, then set the value to missing (step 5).
In this case the third variable in case 4 (refer to the table under step 5) suggests either an entry error or a mistake on the questionnaire. Food cannot be 114% of
total expenditure since food is a portion of expenditure and the maximum value it could take is 100% (food expenditure repres¬ents all expenditure).
After reverting to the questionnaire, it is confirmed that data was entered correctly and that the error lies in the respondent’s understanding of the question or in
the interviewer’s recording of the response. It is decided that the best course of actual is to set variables 1,2, and 3 for Case 4 to ‘missing’ so that the analysis
is not misleading.
Food Expendit- Total Expendit- Food Exp as a Village Highest ed. Number of chil-
ure in Last 3 ure in Last 3 % of total Exp level com ple- dren US in HH
Months months tion by HofHH
Case 1 $30.00 $50.00 60.00% 1 2 2
Case 2 $23.00 $35.00 65.71% 1 1 0
Case 3 $112.00 $140.00 80.00% 2 4 99
Case 4 9999 9999 999 3 3 3