This is the part that is really interesting. While this model has some predictive power, it is still very simple and crude and probably not something I would want to use to say…put futures bets down on teams at a Vegas sports book. For instance, it isn’t as predictive as a simple naïve forecast that a team would have the exact same F/+ rating in a given season as it had in the previous one. And how could it be? That sort of naïve forecast has the advantage of capturing much of the same talent information (since the 4 year average recruiting ratings don’t move that much in a single year), while also capturing a whole host of other factors…like how those players are actually doing playing the game of football.
But the fact this model doesn’t capture those things actually makes it pretty useful for other purposes. Consider this simple conceptual model of team quality:
Team Quality (F/+Rating) = Apparent Talent + Coaching Effect + "Noise"
College football is a dreadfully complicated endeavor to model statistically, as you are doubtless aware. By using F/+ Rating we have already largely factored out a lot of the random in-game stuff, like bad bounces that result in one point losses, untimely misses by a walk-on field goal kicker, and so on. In this theoretical model, the "noise" factor includes a whole host of stuff ranging from key players being suspended, scandals turning the mojo around a program decidedly poor, to a whole cohort of players turning out to be grossly overrated. As anybody who follows college football knows, this stuff happens and can definitely affect how a team performs on the field, but from a statistical perspective it is truly unknowable, random and therefore can’t be considered.
The remaining piece of the puzzle is what I am calling "Coaching Effect", but refers not only to game prep and game-day coaching, but scheme, S&C program, and talent evaluation--coaches that routinely have better results than recruiting rankings indicate they should are probably doing a better job of evaluating the available talent than the ratings services are doing.
It stands to reason that over time, strong coaching staffs ought to consistently and measurably outperform (or at least perform in line) with the available talent on hand and weaker staffs will underachieve (sometimes spectacularly). Using the Apparent Talent metric and the simple model described above, we can statistically measure how well a staff performs, both in single season and over time. Of course even good coaches can have an outlier year to the downside and even lousy coaches are fortunate once in a while (cough, cough…Gene Chizik), but consistent deviation from what is predicted by the model ought to be statistically unlikely enough that it should indicate something about coaching performance. And as with most data of this kind, the really large outliers are very interesting.
The methodology was pretty simple. I calculated expected F/+ rating for each season for all AQ teams from 2006 to 2012, based on the Apparent Talent (four year star average), then compared it to the final actual F/+ for each team. I then normalized the deviations from the expected values using the standard deviation of the differences (Std Delta).
For example, the 2008 Washington Huskies were an approximately average team in terms of talent, with an Apparent Talent rating of 2.85. Based on that, they would have been expected to be an approximately average team; the model yielded a projected F/+ of 214.0. Instead, the team went 0-12 with a final F/+ rating of 143.3, just over 33%--or 2.77 standard deviations—worse than the model predicted.