# PART ONE: MULTIPLE CHOICE

1. The probability density function for a discrete random variable X
1. Is usually normal
2. Is usually uniform
3. Specifies the probability of observing each possible value of X
4. Is a continuous function
2. The expected value of a random variable is
1. The sample mean of the observed data in a given sample
2. The average we would expect to see if we could observe an infinite amount of data for that variable
3. The weighted sum of all possible values, weighted by the probability of observing them
4. Both (b) and (c)
5. All of the above
3. The standard deviation of a variable is
1. The same as its variance
2. The square of its variance
3. The square root of its variance
4. Unrelated to its variance
4. Consider the following joint probability density function for X = whether it is raining, and Y = whether there is a traffic jam on Massachusetts Ave.  What is the conditional probability of getting in a traffic jam, given that it is raining?
1. 5% No Traffic Jam Traffic Jam
2. 15%      No Rain 0.40 40
3. 33% Rain 05  0.15
4. 75%
5. If X and Y are independent random variables then
1. They have zero covariance
2. They have a correlation coefficient of 1
3. They are linearly related
4. None of the above
6. Which of the following is NOT one of the basic assumptions of the simple linear model: a. Cov(X,Y)=0
1. E(u|X)=0
2. The ui are independently and identically distributed
3. We have a random sample: {(Yi, Xi) i=1…n}
7. The vertical distance between the true population regression line and a given data point is called
1. The error term
2. The residual
3. The standard error
4. The standard deviation
8. The difference between the observed value of y and the predicted or fitted value for that point is called
1. The error term
2. The residual
3. The standard error
4. The standard deviation

1. The estimated regression line
1. Is identical to the true regression line
2. Is upward sloping
3. Maximizes the sum of squared residuals
4. Passes through the point (x, y)

1. Suppose we estimate a regression of Y=family income against X=a binary variable that takes the value of 1 if the family owns a house and 0 if not. Then the coefficient on the X variable tells us:
1. The difference in homeownership rates between high and low income families
2. The difference between the average family incomes of homeowners and non-homeowners
3. The effect of an extra dollar of family income on the probability of owning a home d. None of the above

1. When we say the OLS estimator is unbiased, we mean that
1. It always gets the right answer
2. It is the most precise linear estimator
3. E(βˆ 1)= β1
4. It is normally distributed in large samples

1. Heteroskedasticity implies that:
1. The error terms are correlated with X
2. The residuals do not have mean zero
3. The OLS estimator will be biased
4. The variances of the error terms are not constant across observations
2. Which of the following does NOT lead to smaller standard errors of our OLS estimates?
1. Smaller variance of the error term
2. Smaller samples
3. Less correlation between different X-variables
4. X-variables are widely spread out around their means

1. The difference between β and βˆ is a. β is an estimate of the unknown βˆ
1. βˆ is an estimate of the unknown β
2. the expected value of βˆ is always β
3. no difference
2. What proportion of the area under the standard normal curve lies in the right-hand tail above the value of z=1.96?
1. 10%
2. 5%
3. 5%
4. 1%

1. Loosely speaking, “statistical significance” corresponds to
1. Large t’s and p’s
2. Small t’s and p’s
3. Small t’s, large p’s
4. Large t’s, small p’s

1. The R2 of a simple regression of y on x is equal to
1. The sample correlation coefficient between x and y
2. The square of the sample correlation coefficient between x and y c. The 1-SSE/SST
3. The sum of squared residuals divided by the total sum of squares

1. When will the omission of X2 lead to bias in the estimates of the coefficient associated with X1?
1. If X2 has no effect on Y and is uncorrelated with X1
2. If X2 has no effect on Y and is correlated with X1
3. If X2 has an effect on Y and is uncorrelated with X1
4. If X2 has an effect on Y and is correlated with X1

1. Which of the following is true?
1. An equation with a higher R2 explains a greater proportion of the variance of Y
2. An equation with a higher R2 is always preferred to one with a lower R2
3. A high value of R2 means you have eliminated omitted variables bias d. R2 = SSTx/SST

1. If a two-sided t-test results in a p-value of 0.06, then the one-sided test will result in a p-value of:
1. 05
2. 12
3. 03
4. not enough information to answer question

# PART TWO: SHORT ANSWERS AND CALCULATIONS

1) Consider the following dataset with 4 observations for X and Y.  Answer the questions below.  Use the blank rows and columns in the table for the required intermediate calculations and label your added rows & columns so I know what you are doing.

 X Y 1 15 5 13 8 5 10 3

1. Write the formula for the mean of X. Calculate MEANS OF X AND Y.

1. Write the formula for the the variance of X. Calculate the SAMPLE VARIANCE OF X AND THE SAMPLE VARIANCE OF Y.

1. Write the formula for the SAMPLE COVARIANCE of X and Y and calculate it.

1. Write the formula for the slope of the regression of Y against X and calculate it.

1. Find the intercept of this regression.

2) Consider the following regression predicting birth weight using the variables listed:

variable name    variable label bwght            birth weight, ounces cigs             cigs smked per day while preg faminc           1988 family income, \$1000s motheduc         mother’s yrs of educ male             =1 if male child

Source |       SS       df       MS              Number of obs =     674

Model |  12066.6671     4  3016.66677

Residual |  305549.726   669  456.726048

————-+——————————

Total |  317616.393   673  471.941149

——————————————————————————        bwght |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ————-+—————————————————————-         cigs |  -.4370689   .1358532           faminc |   .0812292    .050534          motheduc |   .3334542   .3955639              male |   4.538161   1.661007             _cons |   110.1121   4.849262

1. Explain carefully what the intercept and each of four slope parameter estimates means, including all relevant units.
2. Write the formula for R2 and calculate it for this regression. Explain what R2 tells us.
3. Write the formula for a 95% confidence interval and calculate it for male
4. Write the formula for a t-statistic that tests the null hypothesis Ho:βj=0. Calculate the t-statistics for cigs, faminc, motheduc, and male.  Which of these are statistically significant at the 1% level (two-tailed), and how do you know?

2

• Consider the following formula: Var(βˆ j ) = SSTjσ−R2)

(1 j

1. Explain what each term: Var(βˆ j ), σ2,SSTj and R2j means; formulas are optional if your explanation is good.
2. What is the difference (in words, not equations) between σ2and σˆ2
3. Suppose the variable xj is highly correlated with many of the other variables in the regression. Which term on the right hand side of the equation will this affect, and what will be the effect on the Var(βˆ j )?

• Consider the following three regression results. (I did not print the intercepts):

variable name   variable label lwage           natural log of wage tenure          years with current employer KWW             knowledge of world work test score

• Regression of log of monthly wage earnings against years of job tenure and KWW test score

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐        lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐       tenure |   .0120699   .0025837     4.67   0.000     .0069994    .0171404          KWW |   .0157515   .0017166     9.18   0.000     .0123827    .0191204

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

• Regression of log of monthly wage earnings against years of jobe tenure only

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐        lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐       tenure |   .0154222   .0026693     5.78   0.000     .0101836    .0206608 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

• Regression of KWW score against years of job tenure

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

KWW |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐       tenure |   .2128208   .0487803     4.36   0.000      .117089    .3085526

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

1. Suppose regression (i) represents the TRUE POPULATION model. What is the true effect of an extra year of job tenure on the wage?

1. Explain carefully why the effect of job tenure appears to be larger in regression (ii) than it is in regression (i). Which basic assumption of the OLS model is violated in regression (ii) and why?  (Tell me what that assumption says, not just its number!)  Which equation do you think is more likely to be correct?

1. Write the formula that relates the job tenure parameter estimate in regression (ii) to the job tenure parameter in regression (i), and do the calculation to confirm that the formula holds, up to rounding error.

5) Consider a regression of the log(infant mortality rate) in each state against the variables listed:

Variable        Meaning

linfmort        Log(Infant mortality rate) lpcinc          Log(Per capita income)

lphysic         Log(Doctors per 100,000 population} lpopul          Log(population in 1000s)

DC              =1 for Washington DC

——————————————————————————     linfmort |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ————-+—————————————————————-       lpcinc |  -.2472401   .0928288    -2.66   0.009    -.4314795   -.0630006      lphysic |  -.1756327   .0809773    -2.17   0.033    -.3363502   -.0149152       lpopul |   .0628281   .0142942     4.40   0.000      .034458    .0911981           DC |   1.136104   .1305253     8.70   0.000     .8770479    1.395161

_cons |   5.047641    .704294     7.17   0.000     3.649812    6.445469 ————-+—————————————————————-

Here is the variance covariance matrix from this regression:

e(V) |     lpcinc     lphysic      lpopul          DC       _cons  ————-+————————————————————       lpcinc |  .00861719                                                       lphysic | -.00476487   .00655732                                            lpopul |  .00008304   -.0004003   .00020432                                    DC |  .00303542  -.00681149   .00076755   .01703684                     _cons | -.05907059      .01496  -.00034546  -.00001664   .49602999

Calculate the t-statistic that you would use to test the hypothesis that the elasticity of infant mortality with respect to per capita income is the same as its elasticity with respect to the number of physicians per population.  Interpret the results.

6) Use the following formula and the regression results below to test the joint null hypothesis that neither mother’s nor father’s education has an effect on wages:

What is the F-statistic, and what is the 1% critical value for that F-stat?  What do you conclude? How can your conclusion be correct, given that neither of the relevant coefficients is significant at even 5% by itself?

Regression 1)

Source |       SS       df       MS              Number of obs =     722

————-+——————————           F(  8,   713) =   24.84

Model |  27.6422885     8  3.45528607           Prob > F      =  0.0000

Residual |  99.1696272   713  .139087836           R-squared     =  0.2180

Total |  126.811916   721  .175883378           Root MSE      =  .37294 ——————————————————————————        lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] ————-+—————————————————————-         educ |   .0562629    .007967     7.06   0.000     .0406214    .0719045        exper |    .014224   .0044905     3.17   0.002     .0054077    .0230402       tenure |    .009331   .0029518     3.16   0.002     .0035356    .0151264          age |   .0096873   .0055133     1.76   0.079     -.001137    .0205117        south |  -.0807515   .0305296    -2.65   0.008    -.1406902   -.0208128        urban |   .1672097   .0313994     5.33   0.000     .1055634     .228856        meduc |   .0108041   .0061195     1.77   0.078    -.0012103    .0228185        feduc |   .0086483   .0054328     1.59   0.112    -.0020179    .0193145        _cons |   5.184015   .1749834    29.63   0.000      4.84047    5.527559 ——————————————————————————

Regression 2)

Source |       SS       df       MS              Number of obs =     722

————-+——————————           F(  6,   715) =   30.89

Model |  26.1070112     6  4.35116853           Prob > F      =  0.0000

Residual |  100.704905   715   .14084602           R-squared     =  0.2059