Instructions: Use SPSS, please save and submit the output file (*.spv). Follow the naming conv ention as the word file. Submissions without code/ output files will not be graded. Data description and variable names: See documentation.pdf

Before attempting the questions, run descriptive statistics. You must have 3000 observations and 12 variables. 


1) A researcher is interested in the effect of mother’s smoking during pregnancy on birthweight. Run a regression. What is the estimated slope on smoker? [1]

2-a) To the model in Q1, add the mother characteristics variables – age, educ and unmarried – and run the regression again. [1]

2-b) What happens to the estimated slope on smoker now? Does it change? Is there an omitted variable bias? [1]

3-a) To the model in Q2, add the following variables: alcohol, tripre0, tripre2 and tripre3. [1]

3-b) What happens to the R2 and adjusted R2? Compare from Q1 and Q2. [1]

3-c) Interpret the coefficient on age. [1]

4) Run the model in Q3, without the intercept (or constant) and add all tripre1 as well.
[Hint: In R, add “+0” to the lm formula. In SPSS, go the options dialogue box in linear regression and turn off the constant option.] [1]

5) All questions here refer to the model in Q1:

5-a) In the regression estimated in Q1, is the slope coefficient on smoker significantly different from 0? Test a hypothesis at a 5% level of significance. [1]

5-b) The variable smoker is a dummy variable. Write out the null and alternate hypothesis in words. [1]

5-c) Use the confidence interval approach to test the hypothesis in 5-a. Use a 95% confidence level.