- What is a codebook? Who might create one, what would she or he include in it, and what purpose or purposes would it serve?
- Explain the distinction between variable labels and value labels in an electronic dataset.
- True or False: The linear regression model cannot handle curvilinear relationships between independent and dependent variables.
- True or False: By convention, if we conduct a statistical hypothesis test and obtain a p-value of .3, we would reject the null hypothesis.
- Suppose you have two SPSS datasets. The first contains the variables ID, X1, X2, and X3 for participants 1 through 100; the second contains the variables ID, X4, X5, and X6 for the same 100 participants. Suppose that the datasets are named EvalPre.sav and EvalPost.sav, and are saved on your computer in following file location:
C:Documents and SettingsEvaluationEvalData
And suppose, finally, that you want to combine these datasets to create a new dataset, to be named EvalPrePost.sav, containing ID and X1 through X6 for all 100 participants. What SPSS syntax would you use to accomplish this?
- A research team is studying cognitive decline in old age. They collect data on 300 people between the ages of 75 and 95 years. One of the key variables is a measure of one particular aspect of cognitive functioning: Executive function (named EXFUNC in the dataset). For this study it is measured using a test that produces values ranging from 0 to 100, with higher values representing better executive function. The investigators fit a linear regression model to their data and obtain the following estimated model:
EXFUNCi = 161.73 1.05AGEi + ei
According to this model, by how many points does the typical score on the executive function scale decline between age 80 and 90?
- Suppose your boss gives you a dataset and asks you to run frequencies on the variables X1 and X4, and descriptive statistics on the variables X2, X3, X5, and X6. What SPSS syntax would you use to accomplish this task? (Please present only the command(s) that generate the frequencies and descriptive statistics.)
- Suppose your dataset has a variable, X1, that was derived from a questionnaire item with a response options ranging from Strongly Disagree (coded 1) to Strongly Agree (coded 5). Because the wording of this item runs in the opposite direction of the wording of several related items, you want to create a reverse-coded version of this variable on which Strongly Disagree will be coded 5 while Strongly Agree will be coded 1. What SPSS syntax would you use to accomplish this task?
- An investigator interested in regional differences in breastfeeding attitudes and practices conducts a national survey. The survey includes a multi-item instrument measuring breastfeeding attitudes. The resulting breastfeeding attitudes scale takes values ranging from 1 to 5, with higher numbers representing more favorable attitudes toward breastfeeding. This scale score is named BRATT in the dataset. The dataset also includes a variable named region which takes the following values: 1 = Northeast, 2 = Southeast, 3 = Midwest, 4 = Southwest, 5 = Rocky Mountains, and 6 = West Coast. The investigator creates a set of dummy variables and runs a linear regression model with the breastfeeding attitudes scale as the dependent variable. The estimated model is
BRATTi = 3.68 + 0.57NORTHEASTi 0.13SOUTHEASTi 0.41SOUTWESTi 0.04ROCKIESi + 0.77WESTCOASTi + ei
According to this model, what is the mean score on the breastfeeding attitudes scale for respondents residing in the Midwest?
- Suppose you have a dataset with items X7, X8, X9, and X10 and you want to sum these items to create a scale score with variable name SumX. What SPSS syntax would you use to accomplish this task?
- Suppose you have a dataset that contains a variable, named BMI, that gives measured BMI values for 100 adults. And suppose that you want to create a new variable, BMI3CAT, that categorizes participants as normal, overweight, or obese according to the following scheme.
BMI |
BMI3CAT |
< 25 |
1 |
? 25 and < 30 |
2 |
? 30 |
3 |
What SPSS syntax would you use to accomplish this?
- Some researchers claim that exclusive breastfeeding of infants from birth to six months of age can boost a childs intelligence. Others are skeptical and believe that previous findings to this effect may be attributable to confounding by maternal socioeconomic status. That is, higher socioeconomic status mothers may be more likely to practice exclusive breastfeeding through six months of age; and high material socioeconomic status may contribute to the development of intelligence in the child through mechanisms other than breastfeeding. An investigator studying these issues has a dataset on 1422 mother-child dyads. The dataset contains the following key variables: CHILDIQ, the intelligence of the child as measured via the Stanford-Binet IQ test age age 6 years, with higher scores indicating greater intelligence; BRSTFD, the mothers self-report of whether or not she breastfed the child exclusively through six months of age (0 = no, 1 = yes); and MOMSES, an index of maternal socioeconomic status derived from information about educational attainment, income, and her own parents occupations. The investigator finds that the mean score on Stanford-Binet IQ test was 107.32 among the 454 children who were exclusively breastfed for six months; and 102.55 for the 968 children who were not exclusively breast fed through six months of age. Thus, the average breastfed child had an IQ 4.77 points higher than the average non-breastfed child. To determine the extent to which this difference could attributable to confounding by maternal socioeconomic status rather than to an actual effect of breastfeeding on intelligence, the investigator next runs the following linear regression model:
CHILDIQi = b0 + b1BRSTFDi + b2MOMSESi +ei.
If the difference is due in part to confounding by maternal SES, how would the value of the coefficient b1 in this model likely compare to the raw difference of 4.77?
- True or False: A boxplot is a useful way of examining the distribution of a dichotomous variable.
- Suppose you have a dataset that includes the variable BMI3CAT as described in question 8, and you wish to create a bar chart that shows how many people fall into the three categories: normal, overweight, and obese. What SPSS syntax would you use to obtain that bar chart?
- Suppose you are trying to use linear regression analysis to determine whether the effect of one variable, X1, on another variable, Y, depends upon the value taken by a third variable, X2. What type of term should you include in your regression model?
- A curvilinear term
- A logistic term
- An interaction term
- An orthogonal term
- None of the above
- True or False: A cross-tabulation is a useful way of examining how two categorical variables are related.
- Suppose you have obtained from SPSS the correlation matrix in Appendix 1. According to the information in this matrix, which two variables exhibit the strongest linear relationship?
- In your own words, what is confounding and why is it sometimes a problem in observational studies?
- Suppose that your agency has been evaluating an intervention using a posttest-only control group design. There were 155 people in the treatment group and 140 in the control group. The outcome variable is continuous and, according to boxplots, appears to follow a bell-shaped distribution with similar variances in the treatment and control groups. What statistical test would be most appropriate for testing the null hypothesis of no intervention effect?
- Suppose your agency has been evaluating an intervention using a one-group pretest-posttest design with 20 participants. The focal variable is continuous but, in looking at the boxplots, you see that it is skewed heavily to the left both before and after the intervention. What statistical test would be most appropriate for testing the null hypothesis of no intervention effect (i.e., no change from before to after the intervention)?
- Suppose you wish to include a nominal variable that takes four different values as an independent variable in a logistic regression model. How many dummy variables should you include in your logistic regression model in order to accomplish this?
- True or False: ANCOVA is often used to test the null hypothesis of no intervention effect in the context of an impact evaluation using pretest-posttest control-group design with a continuous dependent variable.
- Suppose your agency has been evaluating an intervention using a one-group pretest-posttest design with 200 participants. The focal variable is dichotomous. What statistical test would be most appropriate for testing the null hypothesis of no intervention effect (i.e., no change from before to after the intervention)?
- A linear regression model with a single dummy variable predicting a continuous dependent variable is equivalent to which of the following statistical tests?
- Fishers Exact Test
- Independent samples t-test (unequal variances version)
- Paired t-test
- Mann-Whitney U test
- Independent samples t-test (equal variances version)
- Suppose that your agency has been evaluating an intervention using a posttest-only control group design. There were 95 people in the treatment group and 111 in the control group. The outcome variable takes the following three ordinal values: normal, overweight, and obese. What statistical test would be most appropriate for testing the null hypothesis of no intervention effect?
- True or False: When running a two-sample t-test, if the Levenes test gives a p-value of .023, you should look at the equal variances rather than the unequal variances version of the t-test.
- True or False: Before you run a logistic regression model, you must first use a COMPUTE command to enact the log-odds or logit transformation on the dichotomous dependent variable.
- Suppose you want to conduct a chi-square test of independence and Fishers exact test for the relationship between two dichotomous variables. The first variable, named TREAT, is coded 0 for control group members and 1 for treatment group members. The second variable, named POSTSMOKE, indicates smoking status assessed one month after the intervention being evaluated; it is coded 0 for participants who were not smoking, and 1 for participants who were smoking, at that time. What SPSS syntax would you use to obtain these hypothesis tests?
- Suppose you want to conduct a paired t-test to see if post-intervention knowledge scores, KNOWPOST, differ significantly on average from pre-intervention knowledge scores, KNOWPRE, in a dataset containing information on people exposed to the intervention. What SPSS syntax would you use to obtain the paired t-test?
- What do you get when you exponentiate a logistic regression coefficient?
- A relative risk
- A hazard ratio
- A quadratic term
- An odds ratio
- An intercept
- Suppose you are want to conduct a Mann-Whitney U test to test for an effect of an intervention on a continuous but highly skewed outcome in a small scale experiment using a posttest-only control group design. The variable TREAT is coded 0 for control group members and 1 for treatment group members. The continuous but skewed outcome variable is a measure of blood glucose; in the dataset it is named GLUCOSE. What SPSS syntax would you use to obtain the Mann-Whitney U test?
- Suppose that the analysis of data from an impact evaluation using a posttest-only control group design with a dichotomous outcome variable, SICKPOST, resulted in the SPSS output appearing in Appendix 2. How would you quantify the estimated effect of the intervention in terms of a relative risk, and do the statistical tests provide support for the effectiveness of this intervention?
- Suppose that you a part of a team that has been analyzing data from an impact evaluation that a one-group pretest-posttest design. Measures of social support were obtained before and after the intervention using the same instrument. They are continuous. The pre-intervention social support variable appears in the dataset as SSPRE, and the post-intervention appears as SSPOST. A paired t-test was conducted and you have been presented with the output appearing in Appendix 3. How would you quantify the estimated effect of the intervention in terms of a mean difference, and do the statistical tests provide support for the effectiveness of this intervention?
- Suppose you have a dataset containing survey data on 783 adolescents. For each adolescent, you have a variable EVERSEX that indicates whether she or he has ever had sexual intercourse (0 for no, 1 for yes). Youre interested in how the likelihood of sexual activity varies in relation to three variables: perceived parental disapproval of teen sexual activity, perceived peer norms valuing sexual activity, and age. Perceived parental disapproval is a scale score derived from multiple Likert-type questionnaire items, and is named PARDIS in your dataset. Perceived peer norms is also a scale score, and is named PEERNORM. Age is a continuous variable computed as the difference between the date of data collection and each respondents date of birth, and is named AGE in your dataset. What SPSS syntax would you use to run a logistic regression model with these three independent variables predicting EVERSEX?
- Suppose that your team has been analyzing data from an impact evaluation that used a posttest-only control group design. There were 227 participants in the treatment group, and 236 in the control group. The key outcome is a scale score measuring social support. In the dataset, the experimental group variable is named TREAT, and takes the value 0 for control group members and 1 for treatment group members; while the outcome variable is named SOCSUPP. A member of your team used SPSS to conduct a two-sample t-test, and you have been presented with the output appearing in Appendix 4. How would you quantify the estimated effect of the intervention in terms of a mean difference, and does the statistical test provide support for the effectiveness of the intervention?
- One more question. Suppose your team has been analyzing data from an impact evaluation that used a one-group pretest-posttest design. In the evaluation 50 participants were classified as either sick or not sick both before and after the intervention. The pre-intervention measure appears in the dataset as a variable named SICKPRE, taking values 0 for not sick and 1 for sick. The post-intervention measure uses the same coding scheme and is named SICKPOST. A member of your team has used McNemars test to see whether the proportion of participants who were sick declined significantly over the course of the intervention, and you have been presented with the output appearing in Appendix 5. Looking at this output, did the proportion classified as sick increase or decrease over the course of the intervention, and was this change statistically significant?
Appendix 1. SPSS Output for Question 17.
|
|
X1 |
X2 |
X3 |
X1 |
Pearson Correlation |
1 |
.303** |
-.441** |
|
Sig. (2-tailed) |
|
.000 |
.000 |
|
N |
500 |
500 |
500 |
X2 |
Pearson Correlation |
.303** |
1 |
.183* |
|
Sig. (2-tailed) |
.000 |
|
.032 |
|
N |
500 |
500 |
500 |
X3 |
Pearson Correlation |
-.441** |
.183* |
1 |
|
Sig. (2-tailed) |
.000 |
.032 |
|
|
N |
500 |
500 |
500 |
** Correlation is significant at the .01 level (2-tailed).
* Correlation is significant at the .05 level (2-tailed).
Appendix 2. SPSS Output for Question 32.
TREAT*SICKPOST Crosstabulation
|
|
|
SICKPOST |
|
|
|
|
0.00 Not Sick |
1.00 Sick |
Total |
TREAT |
0.00 Control |
Count |
10 |
90 |
100 |
|
|
% within TREAT |
10.0% |
90.0% |
100.0% |
|
1.00 Treated |
Count |
30 |
70 |
100 |
|
|
% within TREAT |
30.0% |
70.0% |
100.0% |
Total |
|
Count |
40 |
160 |
200 |
|
|
% within TREAT |
20.0% |
80.0% |
100.0% |
Chi-Square Tests
|
Value
|
df
|
Asymp. Sig.
(2-sided)
|
Exact Sig.
(2-sided)
|
Exact Sig.
(2-sided)
|
Pearson Chi-Square |
12.500a |
1 |
.000 |
|
|
Continuity Correctionb |
11.281 |
1 |
.001 |
|
|
Likelihood Ratio |
12.972 |
1 |
.000 |
|
|
Fishers Exact Test |
|
|
|
.001 |
.000 |
Linear-by-Linear Association |
12.438 |
1 |
.000 |
|
|
N of Valid Cases |
200 |
|
|
|
|
- 0 cells (.0%) have expected counts less than 5. The minimum expected count is 10.00.
- Computed only for a 2×2 table.
Appendix 3. SPSS Output for Question 33
Paired Samples Statistics
|
|
Mean
|
N
|
Std. Deviation
|
Std. Error Mean |
Pair 1 |
SSPOST |
18.5150
|
200 |
3.51876 |
.24881 |
|
SSPRE |
18.0200
|
200 |
3.45678 |
.24443 |
Paired Samples Correlations
|
|
N |
Correlation |
Sig. |
Pair 1 |
SSPOST & SSPRE |
200 |
.655 |
.000 |
Paired Samples Test
|
|
Paired Differences |
|
|
|
|
|
|
|
|
95% C.I. for the Difference |
|
|
|
|
|
Mean
|
Std. Deviation |
Std. Error Mean |
Lower |
Upper |
t
|
df
|
Sig.
(2-tailed)
|
Pair 1 |
SSPOST-SSPRE |
.49500 |
2.89688 |
.20484 |
.09106 |
.89894 |
2.417 |
199 |
.017 |