Regression analysis using EXCEL
In this section we will illustrate how EXCEL’s Regression tool can be used to perform the regression analysis computations for the Armand’s Pizza Parlours problem. Refer to Figure 14.25 as we describe the steps involved. The labels Restaurant, Population and Sales are entered into cells A1:C1 of the worksheet. To identify each of the ten observations, we entered the numbers 1 through 10 into cells A2:A11. The sample data are entered into cells B2:C11. The steps involved in using the Regression tool for regression analysis follow.
|Step 1||Data > Data Analysis > Regression||[Main menu bar]|
|Step 2||Enter C1:C11 in the Input Y Range box||[Regression panel]|
|Enter B1:B11 in the Input X Range box|
|Select Confidence Level. Enter 99 in the Confidence Level box|
|Select Output Range|
|Enter A13 in the Output Range box (to identify the upper left corner of the section of the worksheet where the output will appear)|
The first section of the output, titled Regression Statistics, contains summary statistics such as the coefficient of determination (R Square). The second section of the output, titled ANOVA, contains the analysis of variance table. The last section of the output, which is not titled, contains the estimated regression coefficients and related information. We will begin our discussion of the interpretation of the regression output with the information contained in cells A28:I30.
Interpretation of estimated regression equation output
The y intercept of the estimated regression line, b0 = 60, is shown in cell B29, and the slope of the estimated regression line, b1 = 5, is shown in cell B30. The label Intercept in cell A29 and the label Population in cell A30 are used to identify these two values. In Section 14.5 we showed that the estimated standard deviation of b1 is sb1 = 0.5803.
Note that the value in cell C30 is the standard error, or standard deviation, sb1 of b1. Recall that the t test for a significant relationship required the computation of the t statistic, t = b1/sb1. For the Armand’s data, the value of t that we computed was t = 5/0.5803 = 8.62. The label in cell D28, t Stat, reminds us that cell D30 contains the value of the t test statistic.
The value in cell E30 is the p-value associated with the t test for significance. EXCEL has displayed the p-value in cell E30 using scientific notation. To obtain the decimal value, we move the decimal point five places to the left, obtaining a value of 0.0000255. Because the p-value = 0.0000255 < α = 0.01, we can reject H0 and conclude that we have a significant relationship between student population and quarterly sales.
Cells F28:I30 refer to confidence interval estimates of the y intercept and slope of the estimated regression equation. EXCEL always provides the lower and upper limits for a 95 per cent confidence interval. Recall that in step 4 we selected Confidence Level and entered 99 in the Confidence Level box. As a result, EXCEL’s Regression tool also provides the lower and upper limits for a 99 per cent confidence interval. The value in cell H30 is the lower limit for the 99 per cent confidence interval estimate of β1 and the value in cell I30 is the upper limit. Thus, after rounding, the 99 per cent confidence interval estimate of β1 is 3.05 to 6.95. The values in cells F30 and G30 provide the lower and upper limits for the 95 per cent confidence interval. Thus, the 95 per cent confidence interval is 3.66 to 6.34.
Interpretation of ANOVA output
The information in cells A22:F26 is a summary of the analysis of variance computations. The three sources of variation are labelled Regression, Residual and Total. The label df in cell B23 stands for degrees of freedom, the label SS in cell C23 stands for sum of squares, and the label MS in cell D23 stands for mean square.
In Section 14.5 we stated that the mean square error, obtained by dividing the error or residual sum of squares by its degrees of freedom, provides an estimate of σ2. The value s2 in cell D25, 191.25, is the mean square error for the Armand’s regression output. In Section 14.5 we showed that an F test could also be used to test for significance in regression.
The value in cell F24, 0.0000, is the p-value associated with the F test for significance. Because the p-value = 0.0000 < α = 0.01, we can reject and conclude that we have a significant relationship between student population and quarterly sales. The label EXCEL uses to identify the p-value for the F test for significance, shown in cell F23, is Significance F.
Interpretation of regression statistics output
The coefficient of determination, 0.9027, appears in cell B17; the corresponding label, R Square, is shown in cell A17. The square root of the coefficient of determination provides the sample correlation coefficient (though EXCEL always shows the positive square root of R2) of 0.9501 shown in cell B16. Note that EXCEL uses the label Multiple R (cell A16) to identify this value. In cell A19, the label Standard Error is used to identify the value of the standard error of the estimate shown in cell B19. Thus, the standard error of the estimate is 13.8293. We caution the reader to keep in mind that in the EXCEL output, the label Standard Error appears in two different places. In the Regression Statistics section of the output, the label Standard Error refers to the estimate of σ. In the Estimated Regression Equation section of the output, the label Standard Error refers to sb1, the standard deviation of the sampling distribution of b1.