Long, Unedited Version of All in One Post
NOTE: If you are reading this for the first time, please go to this link instead:
https://www.teemohoop.com/mamba-or-lepookie/Blog%20Post%20Title%20One-mm8gk-cy9wh
What are All in Ones, What actually Goes Behind them?
So through this week, I made my version of all in one metric I had conceptualized awhile back, at least a first draft of one, and I will get into that below, but I think its important to explain what an All-In-One metric is too. Most explanations online either cut a million corners or it’s an online statiscian saying “It’s just this simple formula :D” and pulling this out,
Since I hope some people Vegas are reading this, as well as maybe anyone else in the basketball scene that stumbled on my Linkedln post, I think that its worthwhile to explain fully in a more simple and easy to understand way what All in One metrics really are so you can make your own opinion on what you think of them. This won’t be the most formal intro to the stat or a metric you’ve ever seen, but I hope it’s interesting. Just skip through if you want to skip to the breakdown of the metric itself, but I would probably say at the parts of what the number is comprised of is kind of important and how it tests, and that section kind of flows nicely from this section.
Before getting into All in One metrics, its important to understand what RAPM, the backbone of most All in one metrics, is. “APM” from RAPM stands for Adjusted Plus Minus. which takes into account 11 things, the player of interest, the 9 other players of the court, and the scoring margin. It tries to see how the “player of interest” effects the scoring margin, while controlling for 9 other teammates as factors. However, the issue with APM is the idea of multicollinearity, which simply means it has a hard time distinguishing between teammates who play in many lineups together. Basically, it struggles to assign the right amount of “Credit” to people who play together alot, it will think because I am on the court with Lebron, it gets “tricked” into thinking that Im really good at basketball and not that Lebron carries me.
This is where the “R” in RAPM comes in, to mitigate this issue and distinguish between teammates who share the floor alot. It can help say Lebron is the one carrying the team and 2018 SNL recruits Childish Gambino, Pete Davidson, and Kenan Thompson, weren’t actually a secret top tier players, they just played with Lebron in many lineups. R stands for Regularized, and in general in means Ridge Regression for RAPM. now that sounds all fancy, and this is usually where someone throws a giant math equation or says “Just know it does this” But its actually pretty important to get this part to really understand in depth how All in Ones work and why they work, so I’ll give a real world example to illustrate this.
Imagine you’re a teacher have 2 troublemaking twins, one named Marco and the other one named Kenji, but you know one of them is “The bad apple” and the other one is just following their lead, Similar to how Lebron is carrying the team to high score margins and Random Player “Justice Young” is just along for the ride by being on the court with him alot. One of them starts shouting and the other one follows, and you want them to stop, and figure out the bad apple. You get all fake dramatic and yell at them to shut up, and you keep doing this over and over again, knowing the not so bad kid would feel bad and chill and the more consistently problem child will continue to troll. Eventually, you keep doing it enough, Kenji starts feeling bad and chilling out since he wasn’t really like that, while Marco just keeps screaming since Marco sucks, you’ve learned Kenji was chill and Marco was the real trouble maker.
Practically, Ridge regression is a similar concept. Instead of kids yelling, its their “scoring margin values” (Or how impactful they are), you’re pushing the values to 0, and instead yelling at them, its high score margins. So think about it this way, You are trying to find out the troublemaker between the Twins Marco and Kenji (finding who is the “driver of the high scoring margins” between Lebron and Justice Young who often play together in lineups), so you start Punishing them and telling them to quiet down (Shrinking Coefficients to 0), and as you keep punishing them, Kenji, who wasnt truly a problem child but just following Marco, begins to behave while Marco continues his mischief and is less effected by the punishments (Player X value starts going down, Lebrons stays high up and is less effected by the punishments as his “effect” is more consistent and is driving it more.).
The main confusing thing there is shrinking coefficients (PlayerValues) to 0, but just view it as “punishing” and you basically get the gist of it. I guess one better way to explain it would be, imagine instead of a advanced model, its a guy screaming out numbers of how good he thinks Lebron and Justice Young are every time he sees them play, and every time he does this you tell him they are both 0/10 players (You a hater), his “screams” are the coefficients (model guesses. While not a perfect representation, you can view it as the model keeps making guesses on how impactful it thinks players are, until it’s happy with its choice or runs out of film to watch) and you telling him everyone is a 0/10 player is “shrinking his ratings towards 0.” Hisranking of role player Justice Young is going to be more affected by you trolling him into saying he sucks, than his ranking of the greatest player to ever pick up a basketball where he can clearly see the greatness.
And thats pretty much RAPM! Why was it important to understand All-In-One data? Well, there are multiple forms of All in Ones, but the one I made, LEBRON, EPM, and ESPN RPM back when it was created (Its creators have since left to NBA teams, I’ve heard that the metric is a bit weird now since they left), are “Bayesian Prior Informed RAPM” That sounds super fancy and I have absolutely no idea why people don’t ever explain it normally, but its actually simple (like genuinely, not in the, “its suuper simple and then throwing an alphabet with an equals sign at you”).
Instead of the punishment being you punish everyone’s values to 0, you “punish” different players numbers specific to how good you think each player is. This is where the Box Scores usually come into play, you just use box scores to create a number for each player that gives a rough estimate of how good that player is. This has a HUGE effect on RAPM.
If that’s confusing, You can think of it this way, Imagine RAPM is your friend who has never watched basketball before, trying to learn about basketball in a limited amount of time (in this case, time would be the possession sample size it has to learn from).
Without Regularization, he just thinks everyone on the 2016 Warriors was a 10/10 because they won by 50.
With Basic Ridge Regression, which is pushing the values to 0, when he said he thought they were all great you kept saying they all actually suck, and he kept watching and said “hmm I guess some of them weren’t as impressive as I thought, but I thought that Curry guy was pretty good though!”
With Bayesian Regression, as he is watching you are giving him you’re complete honest opinion on how good every single player is while he watches, and keep saying your opinions on those players instead of saying they are all 0s
This number you are saying to him, is the “PRIOR”
You see the difference? Keep in mind in this case your friend is a super genius and will pick up on things eventually, but he’s a bit slow and just needs a lot of film, or a bit of a nudge. With a limited amount of time, getting him closer with those good opinions will really speed up the process as often he won’t have enough time to get the answer right.
That in a nutshell, is what much of All in One data is, at least a large proportion of the best ones. They create a number represent how a player is using box score data, and that becomes the prior that you scream at your friend watching the game over and over again.
Caveats to this Approach in All in Ones:
It all sounds really nice, but there are some practical issues, 2 of them I will put down here in my opinion (I’m not gonna go into the caveats with this type of approach for evaluation for now too)
1) that it takes a pretty big sample for your friend to truly get players right
2) the priors themselves (Your opinions your telling your friend watching the game) can skew his opinion in incorrect directions.
My version tries to tackle these in its own way (in the week I made it lol) but here’s kind of in depth an explanation of the issues to demonstrate why I felt they would be interesting to tackle this way. Feel free to skip this if you don’t really care.
(For this explanation, you can think of Noise = things distracting from the true value, imagine you’re trying to listen to lyrics of a song to memorize it but the baby starts screaming at the same volume so now you think the song has some crying in it)
The friend example was good as a visualization and to demonstrate it in a more human kind of way, but i’m throwing it away from here because it kind of takes away from the point of what RAPM in its raw form is and the benefits of it. It’s a impact metric only attempts to parse out the impact player X has on his team’s scoring margin, accounting for the 9 other players on the court. it cares about NOTHING else. Simply put, its unbiased. Sample size is an issue and short term RAPM is noisy, but some people tend to mistake this for “RAPM just doesnt say anything valuable in small samples.” RAPM is a raw impact metric, it. is a measurement of raw impact which in itself is used by people as an estimate of “True Impact”. Whats the difference? Raw impact is simply the points when you go on and off the court adjusting for teammates, True impact is whether or not you are actually the reason or a factor for that score or if its just coincidence you were there when something good happened (You happen to be there when good things happen that you didn’t effect indirectly or directly in any true way at all). A lot of “Raw impact” is simply noise, but it isn’t necessarily always noise, which I think is a key distinction.
With low but reasonable sample RAPM (lets say a season) you do get a ton of wonky results, but much of that “noise” is simply the instability of short term impact data itself, most of the time (key word is most, as in like more than half the time of course) you aren’t going to see wonky results that aren’t apparent in the raw impact data when you look at a player amongst their teammates.
This created an interesting debate in some places I saw back when All in ones first came out, I was like 15 at the time, but from what I remember some people were a bit unhappy and saying all in ones killed the point of this kind of thing. to be clear, I disagree with that take, but I do understand where it’s coming from. With the priors, You end up reducing noise but creating bias, but on the whole this tradeoff is 100% worth it. It’s just an issue on some individual cases at times which I’ll get into more below, but as a whole thats more for when people get to fixated on marginal differences and rankings between players
The second issue is that the Box Score Prior itself isn’t so simple to make. The way it is made is you get stable samples of RAPM, and you train a model that can take inputs (Box score numbers) that can predict a player’s RAPM, and make that the Prior. If you’re part of MSBA reading this and on the more technical data side, you might think “XGBOOST” but no, that doesn’t work because from my understanding that the errors in non-linear models tend to be unacceptable, and in my brief experience running it for this it was awful. Even interaction terms create large unnaceptable errors at an individual level. For a Draft model, sure, Boost dat, I even did one for my internship and my portion was pretty solid (I think I used XGBoost or Lightgbm i dont remember tbh), but not for this kind of thing.
you WANT outliers, at least in terms of really good player X and Y, you want it to “overshoot” on certain players and superstars in some years to stabilize when noise causes some players to be underrated. On a more meta evel, you want some players to be overshot for the sake of a metric looking more respectable, well and for the sake of messaging to be honest. If the ONLY goal was getting a high prediction on RAPM this would be easy, but you kind of have to have some semblance of common sense with your results. That isn’t to say that deviation from general opinion is wrong, having a guy like Caruso in the top 20 or something is completely fine in my opinion when his impact signals are THAT strong (all in one metrics are NOT a ranking of how good players are in a vacuum, to be clear), but if your list has like a bunch of role players in the top 10 and superstars out of the top 50, something is probably wrong. That being said, if certain players are consistently far different from preconcieved notions of where they would rank and 97% of others aren’t really, that’s a valuable data point, but people can often make a bit too strong conclusions from that. A box score prior does help RAPM become far more stable, and also can help create a final metric that isnt completely laughed out of the room. But here’s the thing, its a linear regression, you are applying a generalized pattern to the entire NBA, you ALWAYS are going to overshoot or undershoot on certain players. I have gotten push back on this statement before, but while while it undoubtedly creates better observations overall in a GENERAL level, there are CERTAINLY some players who are overpushed or punished on an INDIVIDUAL level. For me, while I do find all in one data valuable, I don’t view it as a raw measurement of impact like some other people view them. While RAPM has noise, All in Ones have Bias. 99% of the time, that small amount of Bias is worth it and helps a CRAZY amount, but that Bias can also lead to fundamentally incorrect predictions at the individual level where perhaps the Noise wasn’t truly far from reality. To me, both have their place when analyzing a player, All in Ones much more so especially if you only can pick one, but also, watch the game lol
Next two paragraphs are a slight case example with Lebron, you can ignore this if you want to
A case example. Awhile ago, I saw a pretty bad Article on BBall index.com. Now, I do really enjoy the site and like what it stands for, and to be clear, this WAS NOT WRITTEN BY TIM (also known as Cranjis Mcbasketball), Tims a smart guy and he’s pretty chill to talk to so he wouldnt write something like this, but the gist of the article was basically one of the other writers clickbaiting off of the olympics doing a “Lebrons not top 10 and I’ll tell you why with FACTS and STATS” and it just being a guy pulling out the LEBRON metric…
But it actually is relevant to this, because Lebron represents probably the clearest example (That I know of) of a high profile player that represents a bias. While I don’t want to go on a 10 page tangent defending Lebrons honor from Lebron on a spreadsheet in Capslock, what I’ll say is that, especially on the defensive end, for pretty much his entire post Miami career (at the very least),any available “Box Score” component for an all in one of Lebron’s data severely undershoots defensively. The 2 exceptions, 2018 and 2022, are the only years where his actual raw defensive impact data wasn’t good (according to RAPM). This is the case for LEBRON, DPM, and Mine (I’ll release the overall numbers, I can give the priors to anyone who asks but this is a first draft still so need to do some tuning) etc. . On a deeper level though, despite his great box scores, what you end up seeing fairly consistently is the more you weight box scores, the less impressive his All in One data can be. This doesn’t mean “Hey maybe his impact data overrates him” because that’s really not how it works if its this consistent for long periods of time for a high production player, it means Lebron is better than his box score production indicates. To be clear, Lebron’s career age adjusted impact data is by far the greatest in history, and if you only get playoff RAPM (there are caveats to doing it that way beyond the scope of this post), he’s basically a lone dot at the top even without adjusting for age, and thats with him being in LeCoast mode in the Regular Season since 2014. All in one data ironically shrouds the case here, but for his Career Lebron is pretty much the Undisputed king in the realm of impact data (Although obviously now he’s no longer undisputed #1 there). Im sure there are other examples (I feel KG would be another guy?), and sometimes this is by design (LEBRON tends to give extra weight to rim protection from my understanding, which helps it more in terms of predictive value since top tier bigs defenders are better building blocks than top tier perimeter defenders, even if it might not show up on raw impact stats for some of the non absolute top tier DPOY type bigs), but you get the point.
End of Lebron stuff
The box score prior is where a lot of the separation itself happens between these metrics. Its actually where people do unique stuff, but overall I think of an All in One as an estimation and some treat it as the asnwer. I don’t know how good my metric is or how the final version will be (I’ll show the results of my retrodiction testing before I put it down below, it actually performed super well, in and out of sample, but I still have alot to work on it I literally started doing this 5 days ago and 2 of those days I was out and about.) But regardless of how good this metric ends up being I don’t think I’ll ever phrase a result like “Player X was a 7 player in impact because my number said so” because all it means is my estimation puts them here, 0% chance I agree with any of these metrics exactly I mean this one (Spoiler Alert) hates AD and as a huge AD fan quite literally 0% change in my opinion on that man lol. My estimation and EPM tend to not love AD while LEBRON has him around top 5, whereas Mine and EPM Love Bron and LEBRON has him at like 19th, its just how these things go sometimes.
Personally I think both of them are both easily top 10, and top 5 in the playoffs, (#1 and #2 this year btw with a Young Pat Riley with a Calculator Presence and Drip at the helm btw) but I live in LA (Although I’m willing to relocate for any WNBA or NBA team if I can’t get a return offer pls im desperate lmao look at all this I will literally work for on a fry cook salary to make up for the VISA lol).
Little side note, Also RAPM tends to run a bit differently on Python or R, I know J.E. RAPM and the Ryan Davis LA RAPM is very different from the one on BBI, BBI recently has done more complex stuff with their 3 point shooting luck adjustments from what I know (I know some people love and some people hate it, not gonna get into that yet). But i honestly feel like its a bit weird to see alreadly luck adjusted O-RAPM from Ryan Davis have Jokic as clear #1 and Giannis around 5-6 ish over the last 2 years, and then BBI O-LA_RAPM has Giannis as like country mile 1st on offense and Jokic is like 3rd and 6th.
This isn’t to say “NYEHEHEHE They did it Wrong!", its just the biggest example for one jump I could think of, getting into which RAPM set has the most “errors” can be a dicey proposition, and im not opening that can of worms, but my main point is some of the changes seem too dramatic for a slight adjustment ON the luck adjustments to an already luck adjusted set, especially for something where testing I’ve seen (shown later) seemed to indicate those adjustments didn’t provide super significant improvements. At the very least, I think weighing the weight of the assumptions to be made vs the practical results if they cause this big jumps has to be considered. To be clear, LOVE the LEBRON metric and think it and EPM are relatively close and both the undisputed top right now.
SKIP HERE FOR THE METRIC ANALYSIS AND BREAKDOWN:
Now that that part is done its time to get into the “fun” part, whats my metric!
First, I would like to thank Nathan Hollenberg, Seth Partnow, and Benjamin Alamar. I didn’t talk to Mr.Partnow or Mr.Alamar about this metric or anything, but I got to talk to them a bit during the Vegas seminar (just about data as a whole) and they were just super smart and cool and insightful to listen to. I had a coffee chat with Mr.Hollenberg, I had a bit of a plan about the metric by then and he gave me some advice for how long RAPM should be and he gave me advice and I do think that the reassurance that I wasn’t just being insane and my thought process wasn’t absurd or anything was a big push of confidence I feel. The coffee chat was super cool and he was just a really nice guy and I learned a ton about how to approach all of this and it did kind of make me think hmm this might actually be a cool idea. Also helped me a ton at being better at my internship with the advice he gave me!
Would also like to thank (I feel werid calling him Cranjis and I learned his whole name by accident and there is ZERO percent chance im ever gonna reveal that information to anyone so Im going to say Tim), generally helped me out alot in terms of combining Data and Xs and Os stuff which i still think im probably better at than data stuff although doing this project over the past few days was pretty fun, and Jeremias Engelmann for being a fantastic resource through his postings on APBR, also for being the reason I found RAPM when I was 15 on his dropbox links lol.
Of course I have to thank Eli Horowitz because if I didn’t get this Sparks internship I probably would have had to shift gears by now, and the Sparks experience has been absolutely amazing and I’ve really fallen in love with the entire process of being in an analytics department. I could write a whole essay on how that’s been such life changer for me (Well unless im deported within the next 90 days) but this is already going to be crazy long lol.
So, the Results will be down below, but I would like to go into what is the value add of what I made based on what I said about all in ones previously. Like I said in my Linkedin post, this is a rough Draft, but overall it tested very well when I compared it to EPM and LEBRON. My testing on EPM and LEBRON was based on the methodology Krishna Narsu (LEBRON creator) did on twitter.
METRIC BREAKDOWN
So essentially, for now, my metric has 2 main innovations and a few other just small tweaks that I believe boost improvement, which I will split into the Box Score Component and the Impact Component
BOX SCORE COMPONENT
The Box Score Component uses typical of per 75 poss box score data (per 100 poss adjusted, per 36 minutes), and blends some Synergy data, and Tracking data.
Now, before you click away hearing the tracking data, I was fairly conservative. With tracking data, I came up with Points Saved at the Rim which I did just by doing the FG Differential with them contesting multiplied by Defended Attempts at the rim* 2, used Charges drawn (With 2015 and 2016 charges drawn coming from PBP stats instead of the wonderful NBA API). I also did Defensive Field Goals Attempted, which I thought of as being “How Often You were the closest defender.” Tracking there isn’t perfect, I can attest to that with the second spectrum stuff I still think its likely a barometer of activity, and it did help.
Offensively, Assists were replaced with Assist Points Created (So including free throws and 3pt shots too instead of just the raw assist totals), and Unassisted Field goals were a part of it as well
Synergy was only used here, and only for these two things. PLAYTYPEPointsAboveExpectation, and OVERALLPointsAboveExpectation. It essentially was a way to control for shot quality and shot diet in a sense, I got a players points, and subtracted “expected points” which was their playtype diet * league average PPP, if that makes sense. In that way, you can gauge how effecient players are based on their playtypes, or if they are finishing tough shots at a great that doesnt show up in the raw effeciency, Overall above expectation was just comparing it to overall halfcourt PPP (Not counting transition here, which in hindsight I probably should have done).
% of times Starting was something I used just as it or minutes is generally always a part of these things. I also added a component called OffInd and DefInd just from seeing PIPM priors, its just the teams offensive or defensive rating * the % of the minutes of the team the player played (So team played 4000 minutes, player plays 3000 minutes, net rating 10, 10* 3000/4000). might potentially change the latter or add an on off component (although then its not really a box prior anymore? Double counting impact???), just because I feel it might not capture a guy like Wemby and some good rim protectors on bad defensive rosters.
Main Innovations here is the Synergy thing which I think was honestly a big help (Particularly PlaytypePointsAboveExpecation, Overall is more wtv), and the tweaks would just be some pretty conservative usage of tracking data.
I think the Offensive Prior is quite solid, and Offense in particular the metric tested really well, I don’t love the results for defense, its a first draft but I feel it undershoots some guys on bad defensive teams
IMPACT COMPONENT
I used an “Adjusted Time Decayed” RAPM, First, what is Time Decayed RAPM? Time Decayed RAPM basically is a fancy way to say that you add more weight (Give more emphasis) the more recent games, and you give less emphasis to earlier games, and you can do this for years. TO BE CLEAR, since this is a seasonal metric meant to be descriptive as well as predictive like EPM and LEBRON, the current season isnt time decayed, the time decay starts from September 1st, so essentially it weighs the selected season fully, and then weighs earlier games by how further back they are in the past compared to the Start of the “Selected” season.
To be clear, I made sure that the year samples were always 3 years total so the decay never extended beyond the 2 seasons before the current season.
Of course, this raises 2 questions.
Why is it done this way? Simple, to take into account the offseason and off season workouts and development. It makes sense, in my opinion, to say month 1 of the last season is much. more important than the last month of the season before that to showcase the current year and take into account offseason development.
Why Would you do this?
This is likely where there might be some more pushback, because why would I use previous year data? Its simple, in practice, Time Decayed and multi year RAPM with less weight on earlier years, generally are similar to Prior Informed RAPM. Prior Informed RAPM is RAPM when the previous year RAPM is the Prior (You yelling at your friend) instead of being 0 . These results generally look much better than Raw single year RAPM, especially in the noise category.
J.E’s datasets of RAPM come from the end of the playoffs, but since this is a regular season metric I found a pastebin post of his 2014 NPI RAPM around the end of the regular season
https://pastebin.com/gT2aN0P5 - Yes, thats Miami Lebron at 36th lol. This was posted by J.E somewhere (No idea where tbh), whose like RAPM god, so it was done right. the Dropbox links on APBR, have playoff data so a larger sample, but in the PI Dropbox Lebron is 1st (compared to 20th in NPI). In general its just much more stable. I’m cherry picking a bit here, but you get the point.
Time Decayed RAPM is also far more predictive than single year RAPM, regardless of if you run it raw or do luck adjustments like BBI likes to do and Nate Walker did
Beyond that, I did very light luck adjustments (Cue the Booing) Luck adjustments are weird, some people love them (BBI, Ryan Davis), some people hate them (J.E), I don’t really know where I stand. On one hand, something like free throws I get, Otoh, stuff like the turnover/OREB luck adjustments might be a bit of a stretch. 3 point luck adjustments are the big controversial one though.
Offense I think everyone would agree you definitely can have an impact on your teammates shooting 3s even if there is some noise there, Defense is more the thing where its like, yeah there are statistical tests showing its mostly noise and if you’re talking individual players (not teams) I can buy in general player’s dont have a huge impact there. At the same time, 3 point defense on a team level clearly does exist, I do believe that its probably likely that there are at the very least individual seasons impact can partially show up through lowered opponent 3 point percentage, regardless of if that trend holds constant for that player year to year. Its similar to the idea of shot tracking defense on jump shots, its certainly mostly noise, but conceptually there is a clear difference between Trae Young closing out on a three by KD vs Herb Jones closing out on a three, even if their three point defensive FG% might not be super different. Its one of those slippery slopes where its like, ok what do we do about midrange jump shots then? and etc etc etc. At that point it might just be better to leave it be.
So I just went very conservative with it. FTs were fully adjusted, Ryan Davis did a 50% luck adjustment on threes back when the nbarapm site talked about it more, while BBI I think does a 50% one on offense and a 100% on defense. I did a 20% one on offense and a 40% one on defense, for threes, which likely didn’t do that much in either direction to be honest. In TDRAPM I ran in the WNBA it very slightly improved prediction but not in any sort of practical or honestly distinguishable sense where it makes up for the controversy of the assumptions in the first place. BBI does a more complex one based on research though, so theyre stamped I think, I know J.E hates it based off his APBR and he’s like RAPM god so I just decided to do this minor one that probably doesnt do anything. If anything i’d take out the offensive one but the defensive one I think is fair, at least a small one, although I see the argument against it.
Free Throws were fully adjusted because as much as doing the Hanamichi “youre gonna miss” strat worked for me in highschool I don’t think it works in the Professional leagues lol. Offensive Rebounds are only treated as a new possession if there is a lineup change, otherwise its treated as a continuation, while the expected points are added regardless. I could see the argument for only doing this for the first free throw, or only doing it if the second free throw was not rebounded by the offense, but I dont really mind giving it value even on the offensive boards, it definately isnt going to swing the fence one way or another and honestly I kind of like rewarding offensive rebounders, box score priors likely undershoot bigs a tad offensively imo because typically the top tier impact guys are guards and wings offensively (Jokic is an anomoly of course), and while that is of course the pattern, generally theyre likely slightly undershot offensively like perimeter players can be undershot defensively, taken as a whole. I feel only doing it on the second FT when there is a miss and a DREB is unfairly punishing OREB on ft misses, so I felt this was the best of both worlds.
I would probably argue that Time Decayed RAPM provides a more accurate look at the current season by incorporating the previous seasons for more information, think about a guy on how good a player is watching this year and last year, versus only watching this year. He knows to weight games less, but it will give a better photo (Lets say he’s new to basketball and he only can watch 10 games to replicate how noisy RAPM can be). However, fundamentally and in principle there are real issues and concerns here, which I will address
I’ll run a non-luck adjusted version at some point, but I have an interview in like an hour (Typing this right before I post it rereading it) and want to get this out so I can point to it lol
How do these things fit together
Now, I do agree that it is a clear concern that Time Decayed RAPM does take into account the previous year, and this is where Box Score Component comes in. The Box Score Component artificially reduces that past year Bias a bit because it only takes stats for the current year, its like in the UFC where your falling one way and they hit you back the other way, I guess.
But also, with regards to the Box Score Priors creating Bias, this type of methodology can also help with that for that too. You can kind of “Set” How much the model is going to listen to the priors (How much your friend is listening to your ratings), and based on the numbers I’ve seen, EPM and Lebron have a pretty strict number there. You can basically set how far the observations are likely to be away from the priors, or the kind of typical margin of error for your priors (Which is the box score evaluation for how good player X is) compared to the players “True Value.” It doesnt hard stuck it or anything though, I set it pretty close (with how the results are I said it would be 1 away on each end), but likely not nearly as close as EPM or LEBRON set it to, and it totally gave values way further than that at times, which is kind of the point.
I get more into why I think its not a huge concern that the previous year is accounted for, but Also, Shai is like, 2nd here, so its clearly not a nail in the coffin deal breaker but I’ll get into it more
So some positives and negatives, I’ll get into the big picture of what this does, how it mitigates some issues with All in Ones, and evaluating some concerns
Benefits
-1. Larger Set of possessions allows for more freedom around the Box Score Weight, helps get rid of that Bias when its incorrect, likely especially important for role players or “unconventional” impact type players.
-2. Decay from Start of the Current season means you get the current season fully weighted, and the previous seasons have less weight. Functionally/Practically TD RAPM does better on the current year than NPI RAPM, and this equates to combining Prior Informed RAPM (Which generally = better, or at least more stable results than NPI RAPM for the current year). I also just had a fairly strong decay rate
-3. Creating a Box Score Component only based on the current year allows you to shift more weight towards the current year too, further reducing the concern that the last year is a component in this
-4 A less Stringent needing Box Score Component meant that I had more freedom to explore trying to find a box score prior that could capture more unique connections and not neccesarily to a perfect job at ensuring the top guys were stable since the sample helped that (To rephrase this to sound less red flaggy, It got to have some more freedom, in general where you have to absolutely NAIL the top guys being super high no matter what so there’s alot of reliance on the box scores making it pass the sniff test which aligns with but doesnt always = accuracy, so there was potentially more freedom to capture more connections with the regression because longer samples = less noise = nicer sniffs, I still heavily focused on it to be used to stabilize the top guys though which does generally align with the goals, since, well, theyre the top.)
-5 The Box Score component itself uses some novel features with tracking that I believe only EPM might use, including some derived off tracking data (without being anything too crazy, Points Saved at the rim is similar to Assist points created in a way), but the Synergy Points Above Expectation probably is the coolest and most novel idea.
--6 its almost like a fusion between an all in one and Prior Informed RAPM instead of NPI RAPM and a box score, think theres a mutual enhancement there somewhere where they both help each other
Negatives
-1. I do think that there are individual cases where the fact that the prior year is a factor may hurt, while most times in can capture growing starts
In terms of mitigating those worries though, I did a brief “analysis” (I just got the rankings of the MIP from 2016-2024 lol) for LEBRON, EPM, and Mine
To be clear, this isn’t to say green = good in the sense the higher on these guys the better (well I guess kind of but that wasn’t the point of this lol), its more to show the point the last year bias really isnt that huge of an issue, as a whole it was higher than LEBRON on these MIP guys (As in thought these guys were better than LEBRON did) and tad lower than EPM , but fundamentally, you would expect it to be far lower on these guys than the other metrics if this was giant glaring issue to capture sharp improvements between seasons. However, there may be some results that are wonky from that if there are extremes, although technically MIP should be the most extreme you can go (Of course from a raw impact perspective maybe some jumps are higher or maybe these guys were climbing and had “silent” high impact beforehand too, but overall I think it shows its not a “nail in the coffin” issue at all.
METRIC TESTING
So how did it test? and how did I test it? The way I tested it was through “retrodiction testing” which sounds super confusing but essentially is the same way EPM and LEBRON creators tested metrics amongst each other. Twitter thread is here https://x.com/knarsu3/status/1763321501766627328
(You can see old, new and regular LA vs non LA RAPM. LA = luck adjusted)
Basically, you get a Player’s All in One data for Year X (Lets say 2022), * by minutes played in Year X+1 (2023), sum up all teams this way by players, and then get the correlations to wins. EPM creator did it by predicting net rating IIRC but this was easier and quicker to do and seemed like something more explainable. (it was just faster to do and I could check with these to see if I made a huge mistake somewhere just incase)
I did essentially the same thing, only thing I changed was for rookies I gave them a -1.5 instead of replacement values (-2.5), and of course I used actual minutes because I dont have Kevin Pelton Projected minutes with me. also I think if a player didn’t play the previous season I gave them their season before that value if they played over 1000 minutes that year, mainly for KD and Curry.
So Below I have the R^2 for every Season, which is how well they explain the variance (For the sake of this, if you’re unfamiliar with it view it as a “Score”). from 2016 to 2024. 2022 to 2024 would be the out of sample years for my metric. (The Box Score part was trained on 2015-2021 Data). Overall, decent results. LEBRON does some really cool stuff with padding for low sample players, and its barely below EPM with that being somewhat taken away from it, so even though there’s alot of red in that column that is probably why
of the 8 years in the dataset, My metric First place in 5/9 years, including 2/3 out of sample years. Its only last place finish was 2021, which was the year after the bubble (NOTE: I did not change the decay rate for that year to account for that. not sure if it makes too much of a difference that it weighs those 8 games for some teams heavily, but still, could be a factor), and the Overall R^2 was a good deal better at 0.683.
If you look on the twitter thread with some of the different metrics for reference, a 0.03 gap in R^2 seemingly pretty decent among these metrics, and my numbers on EPM and LEBRON mostly aligned with his in his testing. (Multi Year Predictive metrics on his twitter thread, was getting years of that metric, like LEBRON 2020, LEBRON 2019, Lebron 2018, and using that to project LEBRON 2021, and LEBRON 2021 would be the number, + adjusting for age), not nearly as good as Predictive EPM and not as good as Predictive LazyLebron (They had a LEBRON metric using tracking data that tested well but some players were funky according according to him on twitter so they don’t really use it or like it).
A multi year version of this metric with the same methodology once it is complete could be interesting.
Below are the results, color coded by which year did the best. Overall, “MAMBA” did very well, and was also very consistent and did great in its out of sample years.
Now As Cool as it would be to be able to replicate this:
I will make this very clear. This is a first Draft of a metric. More than that, I do have much more appreciation towards what goes into making All in Ones and that there is a balance between “Predictive accuracy” and “Players have to make sense.” To be fully transparent, here are the general things I think need to be improved on:
I Think the defensive priors likely are not great even as a reference point. I think weighting the Team Defense might be a bad call, a guy like Wemby should be higher on defense IMO, incorporating ON-OFF in some sort of way to mitigate this(Obviously not in its raw form) seems like cheating? Still unsure here.
I wanted to make sure that strong perimeter defenders were represented well, but I do think that maybe more emphasis on bigs would help mitigate some issues. Maybe splitting defense into groups? Not sure if by height or by position or some statistical factor would work better because there will always be guys put in the wrong ones, and I dont want to run a KNN or something for this that sounds dumb lol
It gets AD Wrong, AD should be better. Giannis too, I think it fails to capture a certain archetype of defender, or maybe I should incorporate blocks into the Rim points saved category somehow, because right now adding them both together as separate predictors creates some wonky out of sample predictions because multicollinearity shenanigans, at least defensively
Offensive Priors are good and correlated extremely well to OFF RTG (Defense was about the same as the other two, Offense was crazy good if I recall), But while synergy play type over expectations can take into account shot and play quality at certain things in a way, I want some emphasis than I have on players who finish high value opportunities at a great rate too like AD
It will generally struggle to Identify really good players on teams that can be elite in the regular season without them. KD on the Warriors and Kawhi on the Raptors, becuase it is less reliant on box scores than some other all in ones (I guess thats somewhat of a niche it fills lol), certain players like that might be undervalues. this kinda is true for all metrics though
I’ve played around with 4 separate “Weightings” on the box score so far (By weightings i mean telling the model how much to listen) and so far the closer has generally. been the better (Like literally the one Im going to put below is tabbed as “Closest”). I'‘ll see if that trend continues
So some general thoughts there. I’ll go through some of the weird overall results but the dataframe will be below. No one will listen to this but 1. Its regular season only so Lebron is a bit undersold some years, 2. its only available 2015 onwards, and 3. its not meant to be compared across different years, although practically I guess its fine.Some Weird Results
Gonna break down some weird results on here and show if they’re unique to mine or in alot of them. Not making any conclusions from if theyre in all of them this is more just demonstrating some “weird” results are universal, and some ones are just in mine
2015:
Lebron at 6, george hill at 7.
(EPM Lebron at 5, George Hill at 7, )
(LEBRON, lebron at 4, george hill at 15. )
Note(2015 to 2017 was interesting because Lebron shot up the less I weighed the box score, so kind of the “This type of stuff undersells him” vibe, he was overall #1 taking the 3 years together in the impact part of it by alot (Curry 15-17 was the #2 stretch from 2014 to 2024 I think too, or something like that)
2016:
Lebron at 4
(EPM 4)
(LEBRON 2)
2017:
(Durant at 8 )
(EPM 14, )
(LEBRON 9) Should be higher of course, but obviously its universal here, mostly from how good the team coulld still be without him
2018:
AD 15, KD 16
(EPM AD 3, KD 15,)
(LEBRON AD 8, KD 12). I think AD should be 1 or 2 personally this year, Yeah my thing just sucks at getting AD right. All of them hate KD again
2019:
Kawhi 15,
(EPM Kawhi 15,)
(LEBRON Kawhi 12), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
2020:
Kemba 8
(EPM 43,)
(LEBRON 23) Listen, if EPM is allowed to get Nurk and Zubac top 10 outta no where, I get to have Kemba lol. LEBRON honestly does a great job at not having random guys way too high sometimes, although maybe its more certain guys being low sometimes I hear people complain about
2022:
Luka 18
(LEBRON 8),
(EPM 17), Mostly low on his defense, but yeah obviously Luka was higher than all of this if im remembering the year right
2023:
Luka 11,
(LEBRON 7,
(EPM 7)
2024:
Giannis 7,
EPM 4,
LEBRON 2: This was just bad imo.
And AD being off in alot of these, It has jokic way too low on 2021 but has him #1 every year since
Now obviously I’m literally going through my list to see dumb stuff that pops out, probably missed some stuff, but you could go through other lists and do the same for example (I think 2024 Curry is like 25th in LEBRON? 12th on mine and on EPM, but LEBRON generally I think looks really solid at the top for sure) The point isnt to disparage anyone or any number, but to say these things will always have some individuals that make you go ???, as long as its not completely absurd, as in random guy X at 1,2,3, I think its reasonable
I think All in ones are fantastic tools but they aren’t like a “how good is this guy” metric, a guy being ranked way lower than expected on a team that functions well without him isn’t necessarily a bad sign on that player, because impact comes just as much from “They get way better when you’re there” To “They Suck when you sit”…
There’s a bit of a tradeoff between going super predictive vs accuracy unless you go for those multiple years of metric predictive versions, I think, LAZYLEBRON was something made where it predicted a tad better than EPM and LEBRON, but it had like, Steven Adams, Caruso, Delon Wright and Capela all top 10 in 2022 so it just wasn’t as practical so it never got released (All this info is on twitter btw), so in that context I dont think the wierd results for some of mine are too rough considering the accuracy seemingly being a bit better. if the testing and everything was all good.
Also quick note: I have made a version for the WNBA that obviously I’m not going to post publically, I made that before this actually, without tracking data since that doesn’t exist in a large enough sample, and without the Synergy Points Over Expectation as there isn’t an API for that that I have access too, but honestly I might just click download as csv like 100 times, but honestly compared to things like positive residual and SPI like it already clears, LEBRON for the WNBA is definitely better because of the padding things they use, I do plan on learning all that stuff though this was pretty fun
FINALLY THE IMPACT METRIC
WEBSITE FOR INTERACTIVE TABLE (Preview Below) https://timotaij.github.io/LepookTable/
https://docs.google.com/spreadsheets/d/1ZMR47Z8MDX9Tt7oQy5p5vzkwLznt9ROc/edit?gid=147787302#gid=147787302 < Spreadsheet Format
NOTE: IIRC, 0 might not have been the average for defense
NOTE: Players who played under 200 minutes in a season may not be shown correctly, but that was not a problem for the metric testing
So what does this mean? Did I create some new super metric or whatever that towers over the competition?
NO
Testing and Out of sample testing is cool and all, but at the end of the day it isn’t the same as legitimate real world results after it was made. Now to be clear, this isn’t a case where I kept building the model running it over and over again until I got good correlations, This is very much the first run (Or at least the first batch) and all of them performed relatively similarly, with the ones “weighted” more towards box score doing better (. Note: I did not take this to its logical conclusion, I did not weight box scores more once I saw they tested better the more I weighted it, and I do plan to do that but it just takes a long time to run, and I want to focus on actually making the box scores priors better, defensively especially
When I ran correlations with Offensive RTG, Defensive RTG, and Net RTG, (which is kind of Wonky to use instead of RMSE I guess but it was just faster), it did have some clearance offensively but was a good deal worse than EPM and about where LEBRON was defensively.
Offensively I do quite like it at this point, I think the TD RAPM aspect helps both but also the Synergy Play Type above expectations aspect I think actually is a pretty strong innovation here., but I should improve the box score component still.
For the Defense, I’m pretty dissapointed with my results. LEBRON results are a bit unfairly represented for reasons before + Its weighting of bigs being higher is likely more practical in the sense of an actual evaluation vs directly measuring their next year impact, I think that its practically far more useful than mine defensively and more comparable to EPM, I think LEBRON will miss individually on certain bigs more and less and undershoot some standout perimeter players defensively, and even if it is by design I do somewhat disagree with it, but thats a personal grip more than an objective one and there is a ton of practical value in the way it evaluates bigs and if anything comparing them amongst themselves bigs amongst bigs solves most of the issues. For the record, it was very slightly better than D-LEBRON, but given the low minute sample size padding LEBRON would likely clear defensively I would assume (Although you could maybe make the argument that time decayed RAPM is a very strong way to handle alot of low sample guys indirectly that wouldn’t be represented on this test?)
With that tradeoff of accuracy and “Sniff test at the top” I certainly wouldn’t say that I the top 10 of mine year to year look the best, but given the predictive accuracy is as strong as it is, the fact that they are comparable is honestly a pretty decent in my opinon, When I set out to do this, my main goal was to try to improve on the defensive side, so to see it didn’t really do that is somewhat disheartening, although given this is just a first draft its not too surprising either.
While I do think this is at least a reasonable All in One that is at least competetive with LEBRON and EPM assuming I didn’t have some sort of awful error testing it I wouldn’t take the testing at face value to say anything drastic, and id probably just say tentitively it might be a interesting alternative or new kid on the block in its current state
I think the key thing is while I think it already seem’s pretty solid and is presentable, I think it also best serves as a proof of concept almost. I think that LEBRON and EPM, for what they set out to do, are essentially optimal metrics given their respective innovations. LEBRON does a ton of really cool things I think, between their luck stuff and the role for low sample stuff, I think that the luck stuff I can get why some people have some concerns over that but if it helps their model it helps, although I do wonder if they run RAPM on R instead of Python when I saw their luck adjusted RAPM stuff on their website. I don’t know quite as much about EPM but he uses tracking data, and im basically sure he’s box score components are likely the best of the bunch all around. I know I’ve heard some people worry about some aspects of tracking data can be noisy at times, at the same time he’s literally a former NBA Head of analytics lol there’s 0% chance hes including something that hurts it especially with how good EPM looks for something that AFAIK . I actually learned about 10 seconds ago clicking on a different tab it does padding as well for its prior stats. Krishna Narsu was an analytics consultant for the Mavs and obviously is a crazy smart dude being alot of the data behind BBI. Like those things are like very fine tuned over the course of a much longer time than I’ve tuned this metric, most of the time was parsing stuff out with other commitments I’ve had, realistically I’d want to spend a week or two on the priors and a week or two on the weighing of Sigmas vs Strength of the Decay rate and then a week to bring those things together coherently, so far its been a day on the priors and a day on the Sigmas and about 3-4 days between collecting the data, organizing it, dealing with WIFI issues, and writing this up.
While those are essentially the optimized versions of what they are trying to do, I don’t believe mine is currently there yet (Or really close). I would say I’ve put some work into this, but at the end of the day this is pretty much day 5 of really grinding for this although I had prepped some things indirectly because I’ve ran this before in the WNBA just with being a bit busy lately. This is just a draft, but I think it served as a strong proof of concept for this type of framework for a metric.
Beyond that, looking into the validity of High Decay Rates could be nice for midseason projections as a high decay rate remaining at the same level might mean theres a level of sample where it levels off, or that mid season it might be a stronger predictive metric than other ones if the sample required to truly stabilize at a level rate with these priors isnt that large, or comparing midseason numbers there. But Fundamentally, gotta improve the box score priors as thats the area with the most room for improvement I think, particularly on defense. Offensively I think the POE stuff might actually be a really strong innovation there offensively that that one is in a good place (not to say its not to be improved upon, but happier with the state that one is at).
Probably nerding out a bit right now, but I guess my main point is I do think there are pretty interesting applications for this and theres some room for creativity here potentially, but for now this is more of a very strong framework that isnt yet optimized compared to a framework that has been optimized like some of the other ones, so excited to do get back into it when I can.
Now, my internship is coming to an end and that is VERY much my focus right now, I just had some free time recently and got some work done early so this was a nice little project I had in mind for awhile that I could finally get done, so I’m not sure how much time I have to really finish this up in the near future, so here’s draft one Day 5 I guess
Obviously this wasn’t the most formal post, but yeah any questions comments concerns or if you just wanna reach out. timothycwijaya@gmail.com, Timotaij on instagram, Teemohoops on twitter, and my linkedin of Timothy Wijaya are probably the best places to reach me.
Note: All in Ones arent a ranking of how good players are
Note: Caveats over All in Ones on a more philosophical standpoint are beyond the scope of this post, but thats a very interesting discussion
Note: This list does not represent how I would rank players
NOTE: As I said, this is a first draft of a metric.
Im sure literally every team has a better version of this type of all in one stuff
Cant remember if I already mentioned it, but the WNBA version of this (Without synergy playtypes, no API but ill do that manually I had it before) isn’t something I can share publically (and if it is you should feel bad for me because that means I took an L) but as a whole it worked pretty well, but that one is gonna have to have alot more testing with smaller samples. It tested very well in comparison to other metrics out there, but I think my gut feeling is something like WNBA LEBRON, which tested essentially at the same level, is likely better right now because the ability to handle low sample players with padding and stuff is important and I haven’t implemented that yet, + the WNBA in individual play can be more volatile and I should explore in depth what that means for it. That being said, compared to the other ones out there for the WNBA…. I’ll say that Lebron was very, very comparable and is a very good metric in the WNBA, no comment on the other impact stuff I tested.