Q4 Take home exam |
|
|
|
|
|
a. |
a) Compute the OLS regression of the number of crimes on population
and |
|
|
|
|
|
population density for all 51 observations. Test (using both the |
|
|
|
|
|
Goldfeld-Quandt test and the test used by Micro-Fit) whether the
null |
|
|
|
|
|
hypothesis that the residuals of the estimated equation are
homoscedastic |
|
|
|
|
|
can be accepted. Why might the
two tests give different results? |
|
|
|
|
|
Dependent
variable is CRIM93 |
|
|
|
|
|
51
observations used for estimation from
1 to 51 |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
Regressor
Coefficient Standard
Error T-Ratio[Prob] |
|
|
|
|
|
CONSTANT
-37411.8
15369.2
-2.4342[.019] |
|
|
|
|
|
POP93 62.3154 2.0370 30.5915[.000] |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
R-Squared
.95025 R-Bar-Squared .94923 |
|
|
|
|
|
S.E. of
Regression 81486.7 F-stat. F( 1, 49)
935.8409[.000] |
|
|
|
|
|
Mean of
Dependent Variable 277567.9 S.D. of Dependent Variable 361646.9 |
|
|
|
|
|
Residual
Sum of Squares 3.25E+11 Equation Log-likelihood -648.0637 |
|
|
|
|
|
Akaike
Info. Criterion -650.0637 Schwarz Bayesian Criterion -651.9955 |
|
|
|
|
|
* A:Serial Correlation*CHSQ( 1)=
3.2907[.070]*F( 1, 48)=
3.3108[.075]* |
|
|
|
|
|
* B:Functional Form *CHSQ( 1)= 5.6735[.017]*F( 1, 48)= 6.0081[.018]* |
|
|
|
|
|
* C:Normality *CHSQ( 2)=
139.6596[.000]* Not
applicable * |
|
|
|
|
|
* D:Heteroscedasticity*CHSQ( 1)=
2.7690[.096]*F( 1, 49)=
2.8131[.100]* |
|
|
|
|
|
Microfit test: |
|
|
|
|
|
H0:errors have an increasing variance |
|
|
|
|
|
H1:errors have the same variance |
|
|
|
|
|
For X2 p value is 0.096 |
|
|
|
|
|
For F test it is 0.1 |
|
|
|
|
|
At 5% level accept H0, there is no heteroscedacity |
|
|
|
|
|
|
|
|
|
|
|
Goldfeld-Quant |
|
|
|
|
|
H0:errors have an increasing variance |
|
|
|
|
|
H1:errors have the same variance |
|
|
|
|
|
I let c=11 and order the population |
|
|
|
|
|
for the first 20 |
|
|
|
|
|
Regressor
Coefficient Standard
Error T-Ratio[Prob] |
|
|
|
|
|
CONSTANT
970.1958 9217.7 .10525[.917] |
|
|
|
|
|
POP93 46.2374 6.8773 6.7232[.000] |
|
|
|
|
|
R-Squared |
|
0.71519 |
|
|
|
for the last 20 |
|
|
|
|
|
Regressor
Coefficient Standard
Error T-Ratio[Prob] |
|
|
|
|
|
CONSTANT
-103894.2
49308.7
-2.1070[.049] |
|
|
|
|
|
POP93 66.8360 4.2202 15.8370[.000] |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
R-Squared |
|
0.93304 |
|
|
|
lamda=RSS2/RSS1=(1-r2(2))/(1-r2(1))= |
|
|
|
0.23510 |
|
df=(51-11-4)/2= |
|
18 |
Fcrit= |
2.2 |
|
Thus there is likely homoscedacity |
|
|
|
|
|
The second model is so much more restrictive. It depends on the c
value used etc |
|
|
|
|
|
The first model uses much more complicated techniques to spot
heterosced. Not |
|
|
|
|
|
just assuming that the error term variance depends on the square of |
|
|
|
|
ii |
b) Plot scatter graphs of the squared residuals from the
estimated |
|
|
|
|
|
equation against population and population squared, Do these plots |
|
|
|
|
|
provide additional help to enable you to decide whether
heteroscedasticity |
|
|
|
|
|
is present in your estimated equation? |
|
|
|
|
|
RES2 |
POP93 |
POP2 |
|
|
|
766719095.1 |
470 |
220900 |
|
|
|
595208004.6 |
576 |
331776 |
|
|
|
4820165130 |
579 |
335241 |
|
|
|
1118492260 |
598 |
357604 |
|
|
|
245872698.4 |
637 |
405769 |
|
|
|
779652136.5 |
698 |
487204 |
|
|
|
195252979.3 |
716 |
512656 |
|
|
|
639508012.1 |
841 |
707281 |
|
|
|
403465632.3 |
1000 |
1000000 |
|
|
|
124543776.3 |
1100 |
1210000 |
|
|
|
464.3909849 |
1124 |
1263376 |
|
|
|
1439589654 |
1166 |
1359556 |
|
|
|
561947.5822 |
1240 |
1537600 |
|
|
|
1346859740 |
1382 |
1909924 |
|
|
|
10918731.32 |
1613 |
2601769 |
|
|
|
1441629340 |
1616 |
2611456 |
|
|
|
889683797.1 |
1818 |
3305124 |
|
|
|
357718488.3 |
1860 |
3459600 |
|
|
|
8700467.328 |
2426 |
5885476 |
|
|
|
30898941.04 |
2535 |
6426225 |
|
|
|
109527336 |
2640 |
6969600 |
|
|
|
893060053.9 |
2821 |
7958041 |
|
|
|
542088929.5 |
3035 |
9211225 |
|
|
|
50427931.87 |
3233 |
10452289 |
|
|
|
302103622.8 |
3278 |
10745284 |
|
|
|
151343624.5 |
3564 |
12702096 |
|
|
|
649534469.6 |
3630 |
13176900 |
|
|
|
5674345602 |
3794 |
14394436 |
|
|
|
7185974738 |
3945 |
15563025 |
|
|
|
366251383.2 |
4181 |
17480761 |
|
|
|
4072383680 |
4290 |
18404100 |
|
|
|
2123390606 |
4524 |
20466576 |
|
|
|
975773446.1 |
4958 |
24581764 |
|
|
|
2472580407 |
5044 |
25441936 |
|
|
|
171531825.7 |
5094 |
25948836 |
|
|
|
487781730.7 |
5235 |
27405225 |
|
|
|
515791803.6 |
5259 |
27657081 |
|
|
|
4017875529 |
5706 |
32558436 |
|
|
|
1855981642 |
6018 |
36216324 |
|
|
|
9905582453 |
6473 |
41899729 |
|
|
|
1207709802 |
6902 |
47637604 |
|
|
|
8280285.049 |
6952 |
48330304 |
|
|
|
5627096725 |
7859 |
61763881 |
|
|
|
1313168202 |
9460 |
89491600 |
|
|
|
2426505575 |
11061 |
122345721 |
|
|
|
1175526129 |
11686 |
136562596 |
|
|
|
1015955886 |
12030 |
144720900 |
|
|
|
1078000806 |
13726 |
188403076 |
|
|
|
5595257545 |
18022 |
324792484 |
|
|
|
7417663339 |
18153 |
329531409 |
|
|
|
1161983188 |
31217 |
974501089 |
|
|
|
There are some values that are way out. Heteroscedacity is not
present, just |
|
|
|
|
|
There are some very weird constituents with high (or low) crime rate. |
|
|
|
|
iii |
One should exclude the outliers. Ie countys with unusually high or low
pop or crime |
|
|
|
|
|
Result is an OLS model that only applies to "normal" areas. |
|
|
|
|
|
Alternatively, there are some more complicated estimation techniques
that take |
|
|
|
|
|
heteroscedacity into account. (GARCH). However, the resulting equation
wont |
|
|
|
|
|
be BLUE. |
|
|
|
|
|
Finaly one should use population density instead of population to
predict crime. |
|
|
|
|
|
Population density might be a better explanatory variable for crime |
|
|
|
|
iv |
Correlating crime rate and population density (pop/area) |
|
|
|
|
|
Dependent
variable is CRIM93 |
|
|
|
|
|
51
observations used for estimation from
1 to 51 |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
Regressor
Coefficient Standard
Error T-Ratio[Prob] |
|
|
|
|
|
CONSTANT
299992.9
53053.6
5.6545[.000] |
|
|
|
|
|
POPDEN
-189.9849
143.7568
-1.3216[.192] |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
R-Squared
.034417 R-Bar-Squared .014711 |
|
|
|
|
|
S.E. of
Regression 358976.9 F-stat. F( 1, 49)
1.7465[.192] |
|
|
|
|
|
Mean of
Dependent Variable 277567.9 S.D. of Dependent Variable 361646.9 |
|
|
|
|
|
Residual
Sum of Squares 6.31E+12 Equation Log-likelihood -723.6874 |
|
|
|
|
|
Akaike
Info. Criterion -725.6874 Schwarz Bayesian Criterion -727.6192 |
|
|
|
|
|
* D:Heteroscedasticity*CHSQ( 1)=
.59156[.442]*F( 1, 49)=
.57503[.452]* |
|
|
|
|
|
|
|
|
|
|
|
doing the density gets rid of heteroscedacity, however, popdensity is
not |
|
|
|
|
|
significant. But this model is restricted. Instead we can use the 2
variables, |
|
|
|
|
|
pop and area, separately: |
|
|
|
|
|
Dependent
variable is CRIM93 |
|
|
|
|
|
51
observations used for estimation from
1 to 51 |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
Regressor
Coefficient Standard
Error T-Ratio[Prob] |
|
|
|
|
|
CONSTANT
-48292.5
17359.5
-2.7819[.008] |
|
|
|
|
|
POP93 62.0446 2.0326 30.5253[.000] |
|
|
|
|
|
AREA .068207 .051916 1.3138[.195] |
|
|
|
|
|
******************************************************************************* |
|
|
|
|
|
R-Squared
.95197 R-Bar-Squared .94997 |
|
|
|
|
|
* D:Heteroscedasticity*CHSQ( 1)=
2.2226[.136]*F( 1, 49)=
2.2327[.142]* |
|
|
|
|
|
As seen, area is not signifficant, and heteroscedacity has increased,
although it |
|
|
|
|
|
is not critical. This is the best i can do, i am afraid |
|
|
|
|