Assumptions Of CLRM: Error Term Has Zero Mean And Errors Uncorrelated With X
Okay, welcome back to our discussion of the gauss-markov theorem, when we're trying to figure out when as ordinary least-squares the best. And we just got done talking about the first assumption of the classical linear regression model that the regressions linear in the coefficients is correctly, specified and has an additive error term. And we discussed what linear. And the coefficients means.
And that correctly, specified says, we have the right, functional form and the correct set of variables. And that. An additive error term means that you use the y-intercept and the slopes to come up with a predicted value for an individual. And for example, 29 miles per hour per gallon for a particular car with certain characteristics and then that's not the actual value. But that there's some random process that generates a and error term. So an error term is the theoretical notion. And the residual is an actual value for one particular observation, but that random value is simply added or subtracted to the.
Average or expected value, and that you could in many situations have a multiplicative or divisive or some other sort of error term, where you consult a random number generation process, and rather than adding it to the prediction, it multiplies to that, but we're, assuming it's got to be additive. Now, let's, look at the next assumption of the classical model here that has to be true in order for ordinary least-squares to be the best method to use this. Next assumption says that that residual that.
Error term has to have a zero population mean and I want to point out that. This is one of the least discussed or least talked about assumptions of the model. And let me try to show you what it means. And what it means, if it wasn't true and why we don't talk about it very much. So for the error term to have a zero population mean, this is the theoretical error term. Now, as we know, when you use OLD, the observed residuals will have a zero mean.
And so suppose though that theoretically, the line that. Should be estimated has looks like this red line, and suppose that relationship that should be observed or the real line. Sometimes I call God's line. The real relationship is a y-intercept of 5, plus a slope times, some variable, plus some kind of error term. And suppose that real error term had an average of 3 that might be what it looks like. If this red line was the real relationship, but where is the line?
Well, it doesn't go through the points it's below the points, and the average residual here. Might be 3. So the average is 3 away from this red line. Now to be honest, I've never come up with an good example that makes sense where this should be the case.
Why should the correct line underpredict the observed data by 3? So again, I'm, not sure exactly what that kind of situation would look like. But this is what a graph would look like the line you used to predict is consistently on average 3 below where the data is on average. So what is going to happen if you use ordinary least-. Squares to estimate this relationship, well, rather than actually coming up with the red line, you're going to come up with the blue line. So you're not going to get a y-intercept of 5 and a slope of whatever the slope is you're going to get a y-intercept of 8.
And then the slope you'll get the correct slope, but you'll get in correct y-intercept. And the residuals that you see will have an average of zero, rather than having an average of three. So this assumption is just saying if there was some kind.
Of situation where the average residual average stochastic error term should be three if you're not going to be able to find it with ordinary least-squares what's going to happen. Instead, is your y-intercept instead of being five would be eight. So that average of the residual average stochastic error term will if it's not zero it's going to be absorbed into the y-intercept. So we don't talk about this much you. You will almost never read a study that uses regression where they will say, well, you.
Know, we're worried that the average of the stochastic error term is not going to be zero. So you don't really talk about it much. But this does give a reason why we don't discuss in most studies. We don't discuss the y-intercept seriously because just in case, the average stochastic error term was not zero that's. Gonna bias this y-intercept we'll get eight. Instead of the true, five, if that were the case, so it's, just one more reason, not to seriously interpret the y-intercept as really meaning. Something it's just a placeholder.
And in this case, what it's going to do is absorb any situation where the error term really wasn't zero. So I'm gonna just leave that there now that you can kind of picture what's going on. The third assumption is quite important, but in an introductory course, you don't spend a lot of time talking about it, because it can be very difficult to control or deal with if you do have a problem.
But let me just give you an example of what's going on here. Assumption three. Says that all explanatory variables are uncorrelated with the error term.
Now you'll often hear economists call this assumption simultaneity. So simultaneity is a situation where if the error term is large, it can change the value of one of your explanatory variables. So that's, what correlated with the error term would mean? So if an explanatory variable is related to the error term, you have a issue that some that is sometimes called simultaneity.
It means that the error term and a. Variable or more than one variable are determined simultaneously at the same time, an example might help you see what's going on here. Suppose our model is that the crime rate in a city is equal to some y-intercept, plus a slope times income, plus another slope times how many police officers you have per thousand people or per square mile, plus some error term.
Now that might be the correct model. Now if you're really interested in figuring out, what is the relationship between adding a police officer, And the crime rate, well, what you would hope is that adding a police officer would lower the crime rate through prevention efforts, or being on the ground in community relations and getting people to feel comfortable reporting crimes, etc. However, let's look at a particular city. Suppose there's a particular city whose subs realization of the stochastic error term is positive, and suppose I don't know what the units are here. But suppose this is a very large positive number that for some unknown.
Reason this just happens to be a city that has a huge crime problem. Now. So the observed crime rate is going to be unexpectedly large by 50.
What is also likely to be true about this city? It is also going to be very likely to have a large police force, why? Well, because they have such a huge crime rate. These people hire extra police officers in order to try to help keep them safe from these roving bands of thugs.
So this is an example where the fact that an error term is large. It jointly determines. Or simultaneously helps determine the fact that this explanatory variable is going to be large so what's going to happen is, although you would expect on average that if you add police, you see the crime rate go down in your data, you're, not going to be observing people, adding police and causing the crime rate to go down. Instead, what you're going to see sometimes are cities that have high crime rates for a random reason have large police forces and so that's going to make you think that large. Police forces are associated with high crime rates.
And it might even make you think erroneously that adding police officers causes crime. And even though some cynics might believe that that's probably not the relationship that you're you're trying to estimate and your equation. So this is an important assumption of the model that you're only going to be able to get an accurate estimate of this relationship. If I add police, how much will crime go down if your explanatory variables are unrelated to. Your error term.