A Rant about Regression
Todd explains the proper application of regression to player analysis
Pardon me while I get a little something off my chest.
In far too many instances regression is used synonymously with “play worse”. Any time a player’s performance is exceeding expectations, it is automatically assumed “he’ll regress.” There are some out there that poke fun at this by referencing the farcical “Regression Police.” This is pretty ironic as by doing so, they’re demonstrating their own ignorance with respect to the proper connotation of regression.
The key point to regression is it is out of the player’s control. This is a bit of hyperbole. A player cannot regress but his numbers can. If the act of regression is out of the player’s control, how can he regress? Again, this is a classic case of picking sematic nits but it’s done so to make a point. Regression is out of a player’s control.
So what is regression? By definition, regression is the act of returning to a previous state or place. Note there is no mention whether the return is positive or negative. Positive regression exists and is sometimes referred to as progression, which is actually a misnomer. Progression has nothing to do with regression.
Tying this in with baseball analysis, there are elements of performance out of player’s control when a round ball meets a round bat. It’s cliché, but luck plays a large role. Sure, when better pitchers throw round ball and better hitters swing round bat, better things happen, but some of what ensues is happenstance.
The aforementioned luck can be measured. Examples are batting average on balls in play and home run per fly ball percent. Inherent with luck is something should have happened that didn’t. In this case, should have refers to probability. If a coin is flipped twice, one time it should be heads and one time tails. Good luck means what actually happened was a better outcome that probability dictated while bad luck is a worse outcome.
Regression is nothing more than the aggregate outcomes of future events out of a player’s control approaching what should have happened. It’s not luck evening out as that is the Gambler’s Fallacy. Regression is the expectation that going forward, luck will be neutral so as these events add up, the overall outcome regresses to what it should be.
It doesn’t mean it WILL happen, only that it SHOULD happen.
Let’s circle back to flipping coins as a means to illustrate this. Let’s say you’re going to flip a coin 100 times. After the first 50 flips, 30 came up heads. When you flip 50 more, how many times should it be heads counting the entire 100 flips? The answer is 55 as the remaining 50 flips should be even.
Now let’s think of this in terms of regression. Initially, the expectation is for 50 percent heads. After half the trials, it was 60 percent heads. At the end, it was 55 percent. Sixty percent regressed towards the expected 50 percent.
Had the example instead been 20 heads after 50 flips, the final expected result would be 45. Here, 40 percent regressed towards 50 percent, ending at 45 percent.
In both cases, the results at the midpoint regressed towards the expected results. One regressed upward, the other down. But they both regressed.
What does this have to do with daily fantasy baseball, specifically daily fantasy?
As was discussed when we last met, one of the pathways to find players whose expectations exceed their price point is to look at different splits (home versus away, handedness among other). But often this sample of data available to evaluate is too small to take at face value. In such instances, the relevant aspects of the data needs to be regressed to what should have happened before we can use it in an effort to predict what will happen.
Confession time – the original intent of this column was to discuss handedness and how it can be used to gain an edge when evaluating daily expectations. But then I got to the part warning against using split data without an adequate sample (which is probably much bigger than you realize). The solution is to regress it to the league averages for each specific parsing of data.
Then I started thinking about how everyone butchers the notion of regression and I got all pissed off. I don’t have my own radio show where I can rant so I decided instead to calm down and instead put fingers to keyboard and talk about regression so when I do broach the subject for DFS, we have a mutual understanding where I am coming from. There will be some non-intuitive discussion so instead of working in what is meant by regression, directing your focus from the point at hand, we now a foundation from which to build.