Custom Search
 Q4 Take home exam a. a) Compute the OLS regression of the number of crimes on population and population density for all 51 observations.   Test (using both the Goldfeld-Quandt test and the test used by Micro-Fit) whether the null hypothesis that the residuals of the estimated equation are homoscedastic can be accepted.  Why might the two tests give different results? Dependent variable is CRIM93 51 observations used for estimation from    1 to   51 ******************************************************************************* Regressor              Coefficient       Standard Error         T-Ratio[Prob] CONSTANT                 -37411.8            15369.2            -2.4342[.019] POP93                     62.3154             2.0370            30.5915[.000] ******************************************************************************* R-Squared                     .95025   R-Bar-Squared                   .94923 S.E. of Regression           81486.7   F-stat.    F(  1,  49)  935.8409[.000] Mean of Dependent Variable  277567.9   S.D. of Dependent Variable    361646.9 Residual Sum of Squares     3.25E+11   Equation Log-likelihood      -648.0637 Akaike Info. Criterion     -650.0637   Schwarz Bayesian Criterion   -651.9955 * A:Serial Correlation*CHSQ(   1)=   3.2907[.070]*F(   1,  48)=   3.3108[.075]* * B:Functional Form   *CHSQ(   1)=   5.6735[.017]*F(   1,  48)=   6.0081[.018]* * C:Normality         *CHSQ(   2)= 139.6596[.000]*       Not applicable       * * D:Heteroscedasticity*CHSQ(   1)=   2.7690[.096]*F(   1,  49)=   2.8131[.100]* Microfit test: H0:errors have an increasing variance H1:errors have the same variance For X2 p value is 0.096 For F test it is 0.1 At 5% level accept H0, there is no heteroscedacity Goldfeld-Quant H0:errors have an increasing variance H1:errors have the same variance I let c=11 and order the population for the first 20 Regressor              Coefficient       Standard Error         T-Ratio[Prob] CONSTANT                 970.1958             9217.7             .10525[.917] POP93                     46.2374             6.8773             6.7232[.000] R-Squared 0.71519 for the last 20 Regressor              Coefficient       Standard Error         T-Ratio[Prob] CONSTANT                -103894.2            49308.7            -2.1070[.049] POP93                     66.8360             4.2202            15.8370[.000] ******************************************************************************* R-Squared 0.93304 lamda=RSS2/RSS1=(1-r2(2))/(1-r2(1))= 0.23510 df=(51-11-4)/2= 18 Fcrit= 2.2 Thus there is likely homoscedacity The second model is so much more restrictive. It depends on the c value used etc The first model uses much more complicated techniques to spot heterosced. Not just assuming that the error term variance depends on the square of ii b) Plot scatter graphs of the squared residuals from the estimated equation against population and population squared,  Do these plots provide additional help to enable you to decide whether heteroscedasticity is present in your estimated equation? RES2 POP93 POP2 766719095.1 470 220900 595208004.6 576 331776 4820165130 579 335241 1118492260 598 357604 245872698.4 637 405769 779652136.5 698 487204 195252979.3 716 512656 639508012.1 841 707281 403465632.3 1000 1000000 124543776.3 1100 1210000 464.3909849 1124 1263376 1439589654 1166 1359556 561947.5822 1240 1537600 1346859740 1382 1909924 10918731.32 1613 2601769 1441629340 1616 2611456 889683797.1 1818 3305124 357718488.3 1860 3459600 8700467.328 2426 5885476 30898941.04 2535 6426225 109527336 2640 6969600 893060053.9 2821 7958041 542088929.5 3035 9211225 50427931.87 3233 10452289 302103622.8 3278 10745284 151343624.5 3564 12702096 649534469.6 3630 13176900 5674345602 3794 14394436 7185974738 3945 15563025 366251383.2 4181 17480761 4072383680 4290 18404100 2123390606 4524 20466576 975773446.1 4958 24581764 2472580407 5044 25441936 171531825.7 5094 25948836 487781730.7 5235 27405225 515791803.6 5259 27657081 4017875529 5706 32558436 1855981642 6018 36216324 9905582453 6473 41899729 1207709802 6902 47637604 8280285.049 6952 48330304 5627096725 7859 61763881 1313168202 9460 89491600 2426505575 11061 122345721 1175526129 11686 136562596 1015955886 12030 144720900 1078000806 13726 188403076 5595257545 18022 324792484 7417663339 18153 329531409 1161983188 31217 974501089 There are some values that are way out. Heteroscedacity is not present, just There are some very weird constituents with high (or low) crime rate. iii One should exclude the outliers. Ie countys with unusually high or low pop or crime Result is an OLS model that only applies to "normal" areas. Alternatively, there are some more complicated estimation techniques that take heteroscedacity into account. (GARCH). However, the resulting equation wont be BLUE. Finaly one should use population density instead of population to predict crime. Population density might be a better explanatory variable for crime iv Correlating crime rate and population density (pop/area) Dependent variable is CRIM93 51 observations used for estimation from    1 to   51 ******************************************************************************* Regressor              Coefficient       Standard Error         T-Ratio[Prob] CONSTANT                 299992.9            53053.6             5.6545[.000] POPDEN                  -189.9849           143.7568            -1.3216[.192] ******************************************************************************* R-Squared                    .034417   R-Bar-Squared                  .014711 S.E. of Regression          358976.9   F-stat.    F(  1,  49)    1.7465[.192] Mean of Dependent Variable  277567.9   S.D. of Dependent Variable    361646.9 Residual Sum of Squares     6.31E+12   Equation Log-likelihood      -723.6874 Akaike Info. Criterion     -725.6874   Schwarz Bayesian Criterion   -727.6192 * D:Heteroscedasticity*CHSQ(   1)=   .59156[.442]*F(   1,  49)=   .57503[.452]* doing the density gets rid of heteroscedacity, however, popdensity is not significant. But this model is restricted. Instead we can use the 2 variables, pop and area, separately: Dependent variable is CRIM93 51 observations used for estimation from    1 to   51 ******************************************************************************* Regressor              Coefficient       Standard Error         T-Ratio[Prob] CONSTANT                 -48292.5            17359.5            -2.7819[.008] POP93                     62.0446             2.0326            30.5253[.000] AREA                      .068207            .051916             1.3138[.195] ******************************************************************************* R-Squared                     .95197   R-Bar-Squared                   .94997 * D:Heteroscedasticity*CHSQ(   1)=   2.2226[.136]*F(   1,  49)=   2.2327[.142]* As seen, area is not signifficant, and heteroscedacity has increased, although it is not critical. This is the best i can do, i am afraid