1) (15pts) Open the data2.dta file, after changing to the right directory (use the command “cd”), and describe the variables included by using the commands: a. Describe b. Codebook 2) (15pts) Choose two variables which attracted your attention and which can be related one to each other (in other words with a meaningful economic relationship) and motivate your choice. One of this two variables MUST be categorical (e.g. male and female; marital status; east or west location; etc.) and one MUST be continuous (e.g. income, earnings). Summarize these two variables using the appropriate command(s). Please use a combination of two variables which has not been analysed during the tutorials, in other words a replication of any exercise performed during the tutorials will not be 3) (40pts) For the continuous variable: a. Show the histogram and show the kernel density b. Perform the normality test on the variable as well as on the Logarithm of the variable. Are they normally distributed? Both? If not, are they left or right skewed? c. Transform the continuous variable into a categorical variable with a maximum of 10 categories (you can choose also less than 10). Show the histogram and the table of the new categorical variable (in each row you will have the mean of each category). For example if your continuous variable is Wage you could call the new categorical one “Wage_Cat” 4) (4pts) Cross tab the frequencies (with row, column and cell %) of the categorical variable you have just created (e.g. Wage_Cat) vis-a-vis the initial categorical variable you have selected at the beginning. 5) (4pts) Cross tab the mean value of each category of the variable you have just created (e.g. Wage_Cat) vis-a-vis the initial categorical variable you have selected at the beginning. 6) (12pts) How would you find the conditional probabilities and marginal probabilities of the two categorical variables by looking at your results in point (4)? Explain in details by referring to the concepts of joint probability and conditional probabilities with numerical examples from the table. 7) (10pts) Now take two new continuous variables from the database and compute their covariance and their correlation. Show their scatter plot: is the correlation positive, null or negative? Is it high or low? Why?

Published by admin at August 3, 2017

Tags

1) (15pts) Open the data2.dta file, after changing to the right directory (use the

command “cd”), and describe the variables included by using the commands:

a. Describe

b. Codebook

2) (15pts) Choose two variables which attracted your attention and which can be

related one to each other (in other words with a meaningful economic

relationship) and motivate your choice. One of this two variables MUST be

categorical (e.g. male and female; marital status; east or west location; etc.) and

one MUST be continuous (e.g. income, earnings). Summarize these two

variables using the appropriate command(s). Please use a combination of two

variables which has not been analysed during the tutorials, in other words a

replication of any exercise performed during the tutorials will not be

3) (40pts) For the continuous variable:

a. Show the histogram and show the kernel density

b. Perform the normality test on the variable as well as on the Logarithm of

the variable. Are they normally distributed? Both? If not, are they left or

right skewed?

c. Transform the continuous variable into a categorical variable with a

maximum of 10 categories (you can choose also less than 10). Show the

histogram and the table of the new categorical variable (in each row you

will have the mean of each category). For example if your continuous

variable is Wage you could call the new categorical one “Wage_Cat”

4) (4pts) Cross tab the frequencies (with row, column and cell %) of the categorical

variable you have just created (e.g. Wage_Cat) vis-a-vis the initial categorical

variable you have selected at the beginning.

5) (4pts) Cross tab the mean value of each category of the variable you have just

created (e.g. Wage_Cat) vis-a-vis the initial categorical variable you have

selected at the beginning.

6) (12pts) How would you find the conditional probabilities and marginal

probabilities of the two categorical variables by looking at your results in point

(4)? Explain in details by referring to the concepts of joint probability and

conditional probabilities with numerical examples from the table.

7) (10pts) Now take two new continuous variables from the database and compute

their covariance and their correlation. Show their scatter plot: is the correlation

positive, null or negative? Is it high or low? Why?